[PATCH] Need to warn about some more characters on CSV import

Robert Helling helling at atdotde.de
Thu Jan 8 07:38:58 PST 2015


Hi,

> On 05.01.2015, at 20:24, Miika Turkia <miika.turkia at gmail.com> wrote:
> 
> Signed-off-by: Miika Turkia <miika.turkia at gmail.com>
> ---
> Documentation/user-manual.txt | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
> index 9cbc4cc..e7c5d96 100644
> --- a/Documentation/user-manual.txt
> +++ b/Documentation/user-manual.txt
> @@ -1094,7 +1094,7 @@ an introduction to CSV-formatted files see xref:S_CSV_Intro[A Diver's Introducti
> [icon="images/icons/important.png"]
> [IMPORTANT]
> The CSV import has a couple of caveats. You should avoid some special characters
> -like ampersand (&) and double quotes ("), the latter if quoting text cells. The
> +like ampersand (&), less than (<), greater than (>) and double quotes ("), the latter if quoting text cells. The
> file should use UTF-8 character set, if having non-ASCII characters. Also the
> size of the CSV file might cause problems. Importing 100 dives at a time
> (without dive profile) has worked previously, but larger files might exceed

(sorry again for being so late)

of course there could be millions of sources of csv-files (an many of them broken in the sense that they produce non-parable output), but the issue is not completely hopeless: I just had a look at what LibreOffice does when asked to save a spread sheet with challenging characters as a CSV and it does as attached . As you can see, double quotes (up and down) are a different character than the one used as a field separator

00000000: 2254 6869 7320 6365 6c6c 2063 6f6e 7461  "This cell conta
00000010: 696e 7320 6368 616c 6c65 6e67 696e 6720  ins challenging
00000020: 6368 6172 6163 7465 7273 206c 696b 6520  characters like
00000030: 7175 6f74 6573 2064 6f77 6e20 e280 9e20  quotes down ...
00000040: 616e 6420 7570 e280 9c2c 2063 6f6d 6d61  and up..., comma
00000050: 732c 2061 706f 7374 726f 7068 7320 2720  s, apostrophs '
00000060: 616e 640a 4e65 7720 6c69 6e65 732e 222c  and.New lines.",
00000070: 5468 6973 2069 7320 7468 6520 6e65 7874  This is the next
00000080: 2063 656c 6c2e 0a                         cell..

When the apostroph is used as a field separator and it appears inside the cell, it is just repeated.

I have no idea how general are these rules (wikipedia says quotation repetition is in an RFC) but maybe we should support them. What is the reason for warning about xml-special characters like & and <>?

Best
Robert


--
.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oO
Robert C. Helling     Elite Master Course Theoretical and Mathematical Physics
                      Scientific Coordinator
                      Ludwig Maximilians Universitaet Muenchen, Dept. Physik
                      Phone: +49 89 2180-4523  Theresienstr. 39, rm. B339
                      http://www.atdotde.de

Enhance your privacy, use cryptography! My PGP keys have fingerprints
A9D1 A01D 13A5 31FA 6515  BB44 0820 367C 36BC 0C1D    and
DCED 37B6 251C 7861 270D  5613 95C7 9D32 9A8D 9B8F





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150108/b192cb37/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: specialchars.csv
Type: text/csv
Size: 135 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150108/b192cb37/attachment.csv>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150108/b192cb37/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: specialchars2.csv
Type: text/csv
Size: 136 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150108/b192cb37/attachment-0001.csv>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150108/b192cb37/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20150108/b192cb37/attachment.sig>


More information about the subsurface mailing list