load-git and event parsing

Linus Torvalds torvalds at linux-foundation.org
Thu Feb 13 14:12:51 PST 2020


On Thu, Feb 13, 2020 at 1:25 PM Robert Helling <helling at atdotde.de> wrote:
>
> Maybe somebody with a better understanding of the code (Linus?) can help me out with this.

Hmm. That

   event 40:00 type=8 divemode="OC" name="modechange"

line actually violates the original git save format thing - each line
is supposed to have only one string in it. That's because the parser
of each line is really simple, and just breaks on the next whitespace.

See the comment above "parse_one_string()":

 * We have a very simple line-based interface, with the small
 * complication that lines can have strings in the middle, and
 * a string can be multiple lines.
 *
 * The UTF-8 string escaping is *very* simple, though:
 *
 *  - a string starts and ends with double quotes (")
 *
 *  - inside the string we escape:
 *     (a) double quotes with '\"'
 *     (b) backslash (\) with '\\'
 *
 *  - additionally, for human readability, we escape
 *    newlines with '\n\t', with the exception that
 *    consecutive newlines are left unescaped (so an
 *    empty line doesn't become a line with just a tab
 *    on it).
 *
 * Also, while the UTF-8 string can have arbitrarily
 * long lines, the non-string parts of the lines are
 * never long, so we can use a small temporary buffer
 * on stack for that part.
 *
 * Also, note that if a line has one or more strings
 * in it:
 *
 *  - each string will be represented as a single '"'
 *    character in the output.
 *
 *  - all string will exist in the same 'membuffer',
 *    separated by NUL characters (that cannot exist
 *    in a string, not even quoted).

And what happens is that now that there are *two* strings on that
line, the string buffer will have that second case of the membuffer
rule:

  - all string will exist in the same 'membuffer',
    separated by NUL characters (that cannot exist
    in a string, not even quoted).

and so now the 'str' membuffer actually contains the buffer
"OC\0modechange\0", but the event parsing code assumes that there was
at most one string, and that one string was the name.

The "OC" string confuses it, and makes it use "OC" as the name.

Very unfortunate. This problem was introduced when Willem Ferguson
started saving the dive mode in commit b9174332d ("Read and write
divemode changes (xml and git)"), and nobody realized that "oops, now
it's corrupting the event name".

The only thing that used to have multiple strings is the "dive flags"
thing, and that one carefully walks all the strings in the membuffer.
See parse_dive_tags().

Ho humm.

Does this extremely ugly patch fix it for you?

               Linus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 1048 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200213/394519f8/attachment.bin>


More information about the subsurface mailing list