AP Vision import

Linus Torvalds torvalds at linux-foundation.org
Sat Apr 15 14:01:03 PDT 2017


On Sat, Apr 15, 2017 at 1:54 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
>
> It's actually harder to check "is this valid utf-8" than it is to just
> convert random latin1 code to valid utf-8.

The real problem here  is that our I'm pretty sure that our stupid XML
libraries are unhappy about bad utf-8, I think.

Libraries that care about the string encoding are a disgrace.
Particularly if they are utf-8, because the whole *point* of utf-8 was
that you can treat it as a standard C string and just pass it through
without ever caring.

Otherwise we could just make our "utf8_string()" parser assume that
bad utf-8 is Latin1 and just convert it. That's what git does for
commit messages, and it works very very well.

But as mentioned, the pain point isn't actually the "convert latin1 to
utf8", but the whole "check whether it's well-formatted utf8 in the
first place".

That's not technically *hard* either, it's just a bother and not  as
mindlessly trivial as the latin1 coversion.

                Linus


More information about the subsurface mailing list