[PATCH] Fix potentially broken white space truncation on certain Windows versions

Dirk Hohndel dirk at hohndel.org
Thu Mar 7 11:22:32 PST 2013


Dirk Hohndel <dirk at hohndel.org> writes:

> Linus Torvalds <torvalds at linux-foundation.org> writes:
>
>> On Thu, Mar 7, 2013 at 10:44 AM, Lubomir I. Ivanov <neolit123 at gmail.com> wrote:
>>>
>>> In Subsurface, usages of string trimming are present in multiple
>>> locations, so to make this work try to use GLib's g_unichar_isspace(),
>>> given most of the application text is UTF-8. g_ascii_isspace() also
>>> works, as it is a locale agnostic version of isspace().
>>
>> Hmm. I think g_unichar_isspace() is the wrong thing to do. It's not a
>> unicode character, it's a byte. So it gives the wrong value for bytes
>> that are possibly part of bigger unicode characters (if there is any
>> space in the 128-255 area, I didn't check).
>>
>> g_ascii_isspace() is the right function to use. Or we could just do
>> our own and trim all bytes 0 <= x <=32 and be done with it that way
>> (but g_ascii_isspace() is likely better)
>
> It is my understanding that regardless of locale all space code points
> are single byte with high bit zero. So g_ascii_isspace should work
> everywhere (as all multibyte encodings have the high bit set...)

BZZZZT. Wrong. Dang

U+00A0 / U+C2A0 are non-breaking space and U+FEFF is a zero width
no-breaking space.

So at least in parse-xml.c/utf8_string() we need to handle this
differently...

/D



More information about the subsurface mailing list