[PATCH] DLD upload

Lubomir I. Ivanov neolit123 at gmail.com
Fri Mar 15 09:25:27 PDT 2013


On 15 March 2013 18:05, Dirk Hohndel <dirk at hohndel.org> wrote:
> "Lubomir I. Ivanov" <neolit123 at gmail.com> writes:
>>>>>> 1. XML declares iso-8859-1 charset
>>>>>> 2. All Cyrillic characters are represented and numeric references.
>>>>>> Unfortunately Subsurface imports them as-is.
>>>>>
>>>>> Looks like the CDATA around any free form fields was critical. The
>>>>> patch I just sent should take care of this. (Having the
>>>>> cdata-section-elements declared seems to imply also that the content
>>>>> of the mentioned elements is to be pure ascii, so no need for further
>>>>> character set hacking.)
>>>>
>>>> How does this mesh with UTF-8 encodings? ASCII is 7 bit...
>>>
>>> The non-ascii characters are represented in character references (Н).
>>>
>>> A problem we currently have in our import of divelogs.de is that these
>>> character references are not converted to utf-8. And so far I have not
>>> figured a way to do that conversion.
>>>
>>
>> since we are already using libxml2 this appears to work, but i have no
>> idea how reliable it is for our needs:
>>
>> #include <libxml/parser.h>
>> #include <libxml/parserInternals.h>
>>
>> ...
>>
>> char buf[] = "АБВГДЕЖ +
>> something else in ASCII";
>> xmlParserCtxtPtr ctx = xmlCreateMemoryParserCtxt(buf, sizeof(buf));
>> char *res = xmlStringDecodeEntities(ctx, buf, XML_SUBSTITUTE_REF, 0, 0, 0);
>> if (res) {
>>       gtk_window_set_title(GTK_WINDOW(main_window), res);
>>       free((void *)res);
>> }
>>
>> http://www.xmlsoft.org/html/libxml-parserInternals.html#xmlCreateMemoryParserCtxt
>
>
> I think that's the way to go
> - stop escaping in xslt (we had a broken patch earlier in this thread)
> - convert on the C side; I still need to implement this whole "act
>   differently when things are imported from XSLT"
>

i forgot to include an xmlFreeParserCtxt(ctx) at the end.

> As I expect to be able to connect to other web sites as well in the
> future, I think the need to decode the things that websites send us will
> be important.
>
> Now that leads me to a different question: are we doing the right thing
> in the other direction? i.e., encoding everything in a format that
> divelogs.de cam consume?
>
>
> Miika - I look at you as the maintainer of the XSLT part; can you look
> at the patch that was sent, clean things up and send this to me?
>
> Lubomir, would you like to hook this up on the parser side? I seem to
> remember you saying that you were busy with other things so a "no, I
> don't have the time" is fine, too... I just want to avoid duplicate
> work.
>

yep, quite busy at the moment. i can try helping with snippets and
reviews if i can.

lubomir
--


More information about the subsurface mailing list