[PATCH] DLD upload
Lubomir I. Ivanov
neolit123 at gmail.com
Fri Mar 15 09:25:27 PDT 2013
On 15 March 2013 18:05, Dirk Hohndel <dirk at hohndel.org> wrote:
> "Lubomir I. Ivanov" <neolit123 at gmail.com> writes:
>>>>>> 1. XML declares iso-8859-1 charset
>>>>>> 2. All Cyrillic characters are represented and numeric references.
>>>>>> Unfortunately Subsurface imports them as-is.
>>>>>
>>>>> Looks like the CDATA around any free form fields was critical. The
>>>>> patch I just sent should take care of this. (Having the
>>>>> cdata-section-elements declared seems to imply also that the content
>>>>> of the mentioned elements is to be pure ascii, so no need for further
>>>>> character set hacking.)
>>>>
>>>> How does this mesh with UTF-8 encodings? ASCII is 7 bit...
>>>
>>> The non-ascii characters are represented in character references (Н).
>>>
>>> A problem we currently have in our import of divelogs.de is that these
>>> character references are not converted to utf-8. And so far I have not
>>> figured a way to do that conversion.
>>>
>>
>> since we are already using libxml2 this appears to work, but i have no
>> idea how reliable it is for our needs:
>>
>> #include <libxml/parser.h>
>> #include <libxml/parserInternals.h>
>>
>> ...
>>
>> char buf[] = "АБВГДЕЖ +
>> something else in ASCII";
>> xmlParserCtxtPtr ctx = xmlCreateMemoryParserCtxt(buf, sizeof(buf));
>> char *res = xmlStringDecodeEntities(ctx, buf, XML_SUBSTITUTE_REF, 0, 0, 0);
>> if (res) {
>> gtk_window_set_title(GTK_WINDOW(main_window), res);
>> free((void *)res);
>> }
>>
>> http://www.xmlsoft.org/html/libxml-parserInternals.html#xmlCreateMemoryParserCtxt
>
>
> I think that's the way to go
> - stop escaping in xslt (we had a broken patch earlier in this thread)
> - convert on the C side; I still need to implement this whole "act
> differently when things are imported from XSLT"
>
i forgot to include an xmlFreeParserCtxt(ctx) at the end.
> As I expect to be able to connect to other web sites as well in the
> future, I think the need to decode the things that websites send us will
> be important.
>
> Now that leads me to a different question: are we doing the right thing
> in the other direction? i.e., encoding everything in a format that
> divelogs.de cam consume?
>
>
> Miika - I look at you as the maintainer of the XSLT part; can you look
> at the patch that was sent, clean things up and send this to me?
>
> Lubomir, would you like to hook this up on the parser side? I seem to
> remember you saying that you were busy with other things so a "no, I
> don't have the time" is fine, too... I just want to avoid duplicate
> work.
>
yep, quite busy at the moment. i can try helping with snippets and
reviews if i can.
lubomir
--
More information about the subsurface
mailing list