[PATCH] DLD upload

Lubomir I. Ivanov neolit123 at gmail.com
Fri Mar 15 02:06:40 PDT 2013


On 15 March 2013 07:01, Miika Turkia <miika.turkia at gmail.com> wrote:
> On Thu, Mar 14, 2013 at 5:53 PM, Dirk Hohndel <dirk at hohndel.org> wrote:
>> Miika Turkia <miika.turkia at gmail.com> writes:
>>>>> I have one question. How should we handle languages with e.g. Cyrillic
>>>>> letters? They are not supported in divelogs.de and display as question
>>>>> marks currently in there. The current encoding of the XMLs in .DLD is
>>>>> iso-8859-1 but utf-8 is not any better. Of course if divelogs.de would
>>>>> support utf-8 we would not have to worry about it...
>>>>
>>>> Actually divelogs.de allows you to input Cyrillic chars. It works for me at
>>>> least when in full edit mode (press "edit dive" at the bottom). When using
>>>> in-place mode it shows questions marks.
>>>> The problem arises when you export dives from divelogs.de:
>>>>
>>>> 1. XML declares iso-8859-1 charset
>>>> 2. All Cyrillic characters are represented and numeric references.
>>>> Unfortunately Subsurface imports them as-is.
>>>
>>> Looks like the CDATA around any free form fields was critical. The
>>> patch I just sent should take care of this. (Having the
>>> cdata-section-elements declared seems to imply also that the content
>>> of the mentioned elements is to be pure ascii, so no need for further
>>> character set hacking.)
>>
>> How does this mesh with UTF-8 encodings? ASCII is 7 bit...
>
> The non-ascii characters are represented in character references (Н).
>
> A problem we currently have in our import of divelogs.de is that these
> character references are not converted to utf-8. And so far I have not
> figured a way to do that conversion.
>

if there isn't support for this type of conversation in GLib, the
algorithm seems trivial. obviously a better solution would be to use
real UTF-8 everywhere instead.

the character offsets are a decimal representation of the UTF-8 code
point, unless there is a 'x' in front of the number, in which case
it's hexadecimal:
http://www.utf8icons.com/character/1053/cyrillic-capital-letter-en

lubomir
--


More information about the subsurface mailing list