[PATCH] Check if DLD contains non-ascii characters
Miika Turkia
miika.turkia at gmail.com
Tue Mar 26 13:27:16 PDT 2013
On Tue, Mar 26, 2013 at 10:15 PM, Dirk Hohndel <dirk at hohndel.org> wrote:
> Miika Turkia <miika.turkia at gmail.com> writes:
>
>> Valid divelogs.de export might contain non-ascii characters in CDATA
>> fields as long as these characters are found in iso-8859-1. So we'll
>> have to test to make sure the content is fully ascii before calling
>> xmlStringLenDecodeEntities to decode possible character references.
>
> So what happens if we have both ä and #&1023; in the CDATA section?
Good point. My first assumption was that the whole XML file would be
encoded, but now that I tested it, it is not the case. Each CDATA is
treated independently and it is possible to have öä in one CDATA and
another CDATA in Cyrillic and thus encoded. I would guess this to be
unlikely but certainly possible. So this patch would display the
Cyrillic CDATA in such a mixed case with character references
(ϫ).
miika
More information about the subsurface
mailing list