[PATCH 2/8] Files: add wrappers for open(), fopen(), sqlite3_open()
thiago at macieira.org
Wed Dec 18 15:47:16 UTC 2013
On quarta-feira, 18 de dezembro de 2013 15:36:56, Thiago Macieira wrote:
> In the worst case, conversion from UTF-8 to UTF-16 results in the same
> number of characters, or double the number of bytes. That's actually the
> US-ASCII case: each byte becomes one 16-bit word. For everything else,
> UTF-16 takes fewer number of characters.
> You multiply by 3 when you convert from UTF-16 to UTF-8 for the worst case
range UTF-8 UTF-16
U+0000 to U+007F 1 byte 1 word
U+0080 to U+07FF 2 bytes 1 word
U+0800 to U+FFFF 3 bytes 1 word
U+10000 to U+10FFFF 4 bytes 2 words
That's why it works. The worst case scenario is that we allocate a buffer that
is 3x as big as it needs to be, when converting text from U+0800 to U+FFFF.
Unfortunately, that's character count, not byte count, which means doing in-
place conversions like I want to do for Qt aren't going to be easy. In-place
conversions from UTF-16 to UTF-8 work for ASCII text (shrinks by half), U+0080
to U+07FF text and non-BMP text (same memory usage), but it increases by 50%
when encoding U+0800 to U+FFFF.
I'll still try because there's a lot of ASCII text in Qt applications and even
for CJK text, it might work if there were buffer gains from previous ASCII text
in the same string ("hello こんにちは" can be converted in-place)
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 190 bytes
Desc: This is a digitally signed message part.
More information about the subsurface