[PATCH] Fixed potential, rare corruption of unicode characters

Tue Oct 2 11:28:29 PDT 2012

On 2 October 2012 20:09, Dirk Hohndel <dirk at hohndel.org> wrote:
>
> I wonder if there are other spots where we do similar things (of
> assuming ascii characters). Would you have the time to do a quick scan
> of our use of strlen and friends in places where we could have UTF-8
> strings?
>
> Thanks
>

good call,
just tried loading a XML file with a unicode (cyrillic) name and there
are some problems with that e.g. cannot open the file with file->open
and also the app titlebar filename does not show when opening the file
as command line argument.

perhaps this should be fixed for 2.0 ?

other than that, on a quick look most instances of the strlen() +
malloc() combination are accurate, since there isn't any buffer
truncation.

> BTW: documentation of g_utf8_strncpy has this to say:
>
> The src string must be valid UTF-8 encoded text. (Use g_utf8_validate()
> on all text before trying to use UTF-8 utility functions with it.)
>
> I guess that's something we should do?

i was using the function to test around. here is that it does, code wise:
http://www.koders.com/c/fid6C6D63D4127AFC48E2D736A37791329C93CAF290.aspx#L177

but i think things might blow up even before the calls to get_string()
as GTK uses strictly UTF-8 everywhere.
i could add it in a later patch if you wish as it might be needed
elsewhere as well.

lubo
--