[PATCH] DLD upload

Dirk Hohndel dirk at hohndel.org
Fri Mar 15 09:05:05 PDT 2013


"Lubomir I. Ivanov" <neolit123 at gmail.com> writes:
>>>>> 1. XML declares iso-8859-1 charset
>>>>> 2. All Cyrillic characters are represented and numeric references.
>>>>> Unfortunately Subsurface imports them as-is.
>>>>
>>>> Looks like the CDATA around any free form fields was critical. The
>>>> patch I just sent should take care of this. (Having the
>>>> cdata-section-elements declared seems to imply also that the content
>>>> of the mentioned elements is to be pure ascii, so no need for further
>>>> character set hacking.)
>>>
>>> How does this mesh with UTF-8 encodings? ASCII is 7 bit...
>>
>> The non-ascii characters are represented in character references (Н).
>>
>> A problem we currently have in our import of divelogs.de is that these
>> character references are not converted to utf-8. And so far I have not
>> figured a way to do that conversion.
>>
>
> since we are already using libxml2 this appears to work, but i have no
> idea how reliable it is for our needs:
>
> #include <libxml/parser.h>
> #include <libxml/parserInternals.h>
>
> ...
>
> char buf[] = "АБВГДЕЖ +
> something else in ASCII";
> xmlParserCtxtPtr ctx = xmlCreateMemoryParserCtxt(buf, sizeof(buf));
> char *res = xmlStringDecodeEntities(ctx, buf, XML_SUBSTITUTE_REF, 0, 0, 0);
> if (res) {
> 	gtk_window_set_title(GTK_WINDOW(main_window), res);
> 	free((void *)res);
> }
>
> http://www.xmlsoft.org/html/libxml-parserInternals.html#xmlCreateMemoryParserCtxt


I think that's the way to go
- stop escaping in xslt (we had a broken patch earlier in this thread)
- convert on the C side; I still need to implement this whole "act
  differently when things are imported from XSLT"

As I expect to be able to connect to other web sites as well in the
future, I think the need to decode the things that websites send us will
be important.

Now that leads me to a different question: are we doing the right thing
in the other direction? i.e., encoding everything in a format that
divelogs.de cam consume?


Miika - I look at you as the maintainer of the XSLT part; can you look
at the patch that was sent, clean things up and send this to me?

Lubomir, would you like to hook this up on the parser side? I seem to
remember you saying that you were busy with other things so a "no, I
don't have the time" is fine, too... I just want to avoid duplicate
work.

Thanks

/D


More information about the subsurface mailing list