entering GPS coordinates
torvalds at linux-foundation.org
Mon Jan 28 07:25:51 PST 2013
On Jan 28, 2013 6:24 AM, "Dirk Hohndel" <dirk at hohndel.org> wrote:
> When I implemented the utf8 based parser that's now committed, this was
really easy to do.
Ugh. The whole utf8 part of that parsing makes me very sad.
The *whole* point of utf8 is that you can just treat it as a stream of
bytes. There is absolutely no reason to do the whole "decode each character
as a Unicode character", because that is missing one of the basic reasons
why utf8 is so much better than the abortions that are other Unicode
Seriously, the whole "decode every character" thing is a disease. It's a
common disease, but that doesn't make it less so. Please don't encourage
So utf8 should be treated as just a sequence of bytes. The only time you
want to look up the Unicode points is if you want to validate it as strict
utf8 (and you should generally never do that anyway, unless the user asked
you to) or if you are going to look up the glyphs for each character (iow
if you are going to *render* the characters).
Even things like "count number of Unicode characters" do not need to decode
the characters, you can - and *should* - do it based on the bit patterns in
the bytes. Of course, you'd use the library function to do it, but the
point is, you should never do it by actually decoding each code point.
So treat Unicode as plain text. The degree character is multi-byte, yes,
but you can still treat it as bytes. Utf8 does not have character boundary
issues, that's very much part of the whole design! There is no way the
sequence of bytes that is the degree character could ever be misinterpreted
because of character boundary issues, ie you could never have two other
characters that contain the degree character encoding "in the middle".
Anyway, end of rant. Utf8 really is a thing of beauty, and it is sad to see
it being used mindlessly like all the other crappy Unicode encodings, when
the whole point of it is that you don't need to.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the subsurface