[PATCH] DLD upload

Miika Turkia miika.turkia at gmail.com
Fri Mar 15 10:02:14 PDT 2013


On Fri, Mar 15, 2013 at 6:05 PM, Dirk Hohndel <dirk at hohndel.org> wrote:
> "Lubomir I. Ivanov" <neolit123 at gmail.com> writes:
>>>>>> 1. XML declares iso-8859-1 charset
>>>>>> 2. All Cyrillic characters are represented and numeric references.
>>>>>> Unfortunately Subsurface imports them as-is.
>>>>>
>>>>> Looks like the CDATA around any free form fields was critical. The
>>>>> patch I just sent should take care of this. (Having the
>>>>> cdata-section-elements declared seems to imply also that the content
>>>>> of the mentioned elements is to be pure ascii, so no need for further
>>>>> character set hacking.)
>>>>
>>>> How does this mesh with UTF-8 encodings? ASCII is 7 bit...
>>>
>>> The non-ascii characters are represented in character references (Н).
>>>
>>> A problem we currently have in our import of divelogs.de is that these
>>> character references are not converted to utf-8. And so far I have not
>>> figured a way to do that conversion.
>>>
>>
>> since we are already using libxml2 this appears to work, but i have no
>> idea how reliable it is for our needs:
>>
>> #include <libxml/parser.h>
>> #include <libxml/parserInternals.h>
>>
>> ...
>>
>> char buf[] = "АБВГДЕЖ +
>> something else in ASCII";
>> xmlParserCtxtPtr ctx = xmlCreateMemoryParserCtxt(buf, sizeof(buf));
>> char *res = xmlStringDecodeEntities(ctx, buf, XML_SUBSTITUTE_REF, 0, 0, 0);
>> if (res) {
>>       gtk_window_set_title(GTK_WINDOW(main_window), res);
>>       free((void *)res);
>> }
>>
>> http://www.xmlsoft.org/html/libxml-parserInternals.html#xmlCreateMemoryParserCtxt
>
>
> I think that's the way to go
> - stop escaping in xslt (we had a broken patch earlier in this thread)
> - convert on the C side; I still need to implement this whole "act
>   differently when things are imported from XSLT"

I totally agree. However, our whole problem would disappear if Rainer
could change the current output to not contain the CDATA around
already escaped content. I would not be surprised if others importing
the current "double escaped" content is having trouble because of
that.

> As I expect to be able to connect to other web sites as well in the
> future, I think the need to decode the things that websites send us will
> be important.
>
> Now that leads me to a different question: are we doing the right thing
> in the other direction? i.e., encoding everything in a format that
> divelogs.de cam consume?

As far as I know we are doing things correctly on the upload side. At
least uploading one sample dive with Cyrillic notes and cylinder
description is properly uploaded.

> Miika - I look at you as the maintainer of the XSLT part; can you look
> at the patch that was sent, clean things up and send this to me?

I have attached my version of the patch (that is effectively exactly
same what Sergey sent, but that is what I already had in my xslt file
at the time).But as said I am not sure if this is the right way to go.
At least I would like to get input from Rainer first to hear his
opinion about the kind of double-encoding issue.

miika
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Do-not-escape-entity-references-on-.DLD-import.patch
Type: application/octet-stream
Size: 1857 bytes
Desc: not available
URL: <http://lists.hohndel.org/pipermail/subsurface/attachments/20130315/548e945f/attachment.obj>


More information about the subsurface mailing list