[PATCH] Check if DLD contains non-ascii characters

Miika Turkia miika.turkia at gmail.com
Tue Mar 26 13:27:16 PDT 2013


On Tue, Mar 26, 2013 at 10:15 PM, Dirk Hohndel <dirk at hohndel.org> wrote:
> Miika Turkia <miika.turkia at gmail.com> writes:
>
>> Valid divelogs.de export might contain non-ascii characters in CDATA
>> fields as long as these characters are found in iso-8859-1. So we'll
>> have to test to make sure the content is fully ascii before calling
>> xmlStringLenDecodeEntities to decode possible character references.
>
> So what happens if we have both ä and #&1023; in the CDATA section?

Good point. My first assumption was that the whole XML file would be
encoded, but now that I tested it, it is not the case. Each CDATA is
treated independently and it is possible to have öä in one CDATA and
another CDATA in Cyrillic and thus encoded. I would guess this to be
unlikely but certainly possible. So this patch would display the
Cyrillic CDATA in such a mixed case with character references
(ϫ).

miika


More information about the subsurface mailing list