[PATCH] Fixed potential, rare corruption of unicode characters

Tue Oct 2 12:31:08 PDT 2012

"Lubomir I. Ivanov" <neolit123 at gmail.com> writes:

> On 2 October 2012 20:09, Dirk Hohndel <dirk at hohndel.org> wrote:
>>
>> I wonder if there are other spots where we do similar things (of
>> assuming ascii characters). Would you have the time to do a quick scan
>> of our use of strlen and friends in places where we could have UTF-8
>> strings?
>>
>> Thanks
>>
>
> good call,
> just tried loading a XML file with a unicode (cyrillic) name and there
> are some problems with that e.g. cannot open the file with file->open
> and also the app titlebar filename does not show when opening the file
> as command line argument.
>
> perhaps this should be fixed for 2.0 ?

I'd love to see that fixed for 2.0.

> other than that, on a quick look most instances of the strlen() +
> malloc() combination are accurate, since there isn't any buffer
> truncation.

Good.

>> BTW: documentation of g_utf8_strncpy has this to say:
>>
>> The src string must be valid UTF-8 encoded text. (Use g_utf8_validate()
>> on all text before trying to use UTF-8 utility functions with it.)
>>
>> I guess that's something we should do?
>
> i was using the function to test around. here is that it does, code wise:
> http://www.koders.com/c/fid6C6D63D4127AFC48E2D736A37791329C93CAF290.aspx#L177
>
> but i think things might blow up even before the calls to get_string()
> as GTK uses strictly UTF-8 everywhere.
> i could add it in a later patch if you wish as it might be needed
> elsewhere as well.

As you may have seen, I pushed your code already. Even without the
validity check this should be a huge improvement over what we have
today, so I'm not saying this is a requirement - I was mainly asking if
this was something you had considered

/D