fixed buffer lengths with snprintf are not utf-8 safe

Dirk Hohndel dirk at hohndel.org
Wed Oct 17 11:26:12 PDT 2012


"Lubomir I. Ivanov" <neolit123 at gmail.com> writes:

> when we found some issues that were related to truncation of utf-8
> strings, i was skeptical that fixed buffer lengths like in
> info.c:show_dive_info() + snprintf may be problematic.
> apparently this is true for longer strings with unicode chars.
> another example is divelist.c:date_data_func().
>
> each "xx" couple is a utf-8 char, while a singe "x" is ansi char.
>
> "FDDABBCDDEE"
> if truncate position is the second B char for example we are splitting
> the BB char in half resulting in incorrect string.
> if its at the first C or B we are good.
>
> another thing that can happen, which is (somehow) worse if the string
> is last in the literal pool is splitting before the \0 char, which
> will try to print what follows.
>
> perhaps:
> g_utf8_strlen()
> g_utf8_strncpy()
> malloc(sizeof(gunichar) * n)
>
> malloc() / realloc() would be better than some sort of a smart
> truncation technique i.e checking the unichar at truncate position.
>
> may be the correct way to go here.
> i will try to send a patch later on.

I'm interested to see that. My thought had been to make the buffers long
enough that the space constraints in the UI would stop translators
before the buffer length becomes an issue, but I'll admit that that's a
somewhat naive / simplistic approach to things.

/D


More information about the subsurface mailing list