[PATCH] Need to warn about some more characters on CSV import

Miika Turkia miika.turkia at gmail.com
Fri Jan 9 01:47:29 PST 2015


On Thu, Jan 8, 2015 at 5:38 PM, Robert Helling <helling at atdotde.de> wrote:

> Hi,
>
> On 05.01.2015, at 20:24, Miika Turkia <miika.turkia at gmail.com> wrote:
>
> Signed-off-by: Miika Turkia <miika.turkia at gmail.com>
> ---
> Documentation/user-manual.txt | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
> index 9cbc4cc..e7c5d96 100644
> --- a/Documentation/user-manual.txt
> +++ b/Documentation/user-manual.txt
> @@ -1094,7 +1094,7 @@ an introduction to CSV-formatted files see
> xref:S_CSV_Intro[A Diver's Introducti
> [icon="images/icons/important.png"]
> [IMPORTANT]
> The CSV import has a couple of caveats. You should avoid some special
> characters
> -like ampersand (&) and double quotes ("), the latter if quoting text
> cells. The
> +like ampersand (&), less than (<), greater than (>) and double quotes
> ("), the latter if quoting text cells. The
> file should use UTF-8 character set, if having non-ASCII characters. Also
> the
> size of the CSV file might cause problems. Importing 100 dives at a time
> (without dive profile) has worked previously, but larger files might exceed
>
>
> (sorry again for being so late)
>
> of course there could be millions of sources of csv-files (an many of them
> broken in the sense that they produce non-parable output), but the issue is
> not completely hopeless: I just had a look at what LibreOffice does when
> asked to save a spread sheet with challenging characters as a CSV and it
> does as attached
>
>
> . As you can see, double quotes (up and down) are a different character
> than the one used as a field separator
>
> 00000000: 2254 6869 7320 6365 6c6c 2063 6f6e 7461  "This cell conta
> 00000010: 696e 7320 6368 616c 6c65 6e67 696e 6720  ins challenging
> 00000020: 6368 6172 6163 7465 7273 206c 696b 6520  characters like
> 00000030: 7175 6f74 6573 2064 6f77 6e20 e280 9e20  quotes down ...
> 00000040: 616e 6420 7570 e280 9c2c 2063 6f6d 6d61  and up..., comma
> 00000050: 732c 2061 706f 7374 726f 7068 7320 2720  s, apostrophs '
> 00000060: 616e 640a 4e65 7720 6c69 6e65 732e 222c  and.New lines.",
> 00000070: 5468 6973 2069 7320 7468 6520 6e65 7874  This is the next
> 00000080: 2063 656c 6c2e 0a                         cell..
>
> When the apostroph is used as a field separator and it appears inside the
> cell, it is just repeated.
>
> I have no idea how general are these rules (wikipedia says quotation
> repetition is in an RFC) but maybe we should support them. What is the
> reason for warning about xml-special characters like & and <>?
>

CSV is parsed with XSLT as XML, so the XML specific characters are therefor
an issue. And that is also a reason why any special cases are quite tricky.
At least I am not that fluent in XSLT to be properly able to take the
special cases into account. Quotation should (hopefully) be done properly,
but multi-line I cannot promise to work that well.

miika
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150109/fff263d2/attachment-0001.html>


More information about the subsurface mailing list