[PATCH 1/3] Don't allocate an intermediate buffer for most parsed xmlNode contents

Lubomir I. Ivanov neolit123 at gmail.com
Fri Dec 28 08:18:23 PST 2012


On 28 December 2012 18:04, Linus Torvalds <torvalds at linux-foundation.org>wrote:

> I like the patch, but I think it's subtly buggy as-is, and will cause
> problems with non-subsurface xml files.
>
> On Fri, Dec 28, 2012 at 5:43 AM, Lubomir I. Ivanov <neolit123 at gmail.com>
> wrote:
> > @@ -1253,25 +1236,13 @@ static void visit_one_node(xmlNode *node)
> >         if (!content)
> >                 return;
> >
> > -       /* Trim whitespace at beginning */
> > -       while (isspace(*content))
> > -               content++;
> > -
> > -       /* Trim whitespace at end */
> > -       len = strlen(content);
> > -       while (len && isspace(content[len-1]))
> > -               len--;
> > -
> > -       if (!len)
> > -               return;
> > -
>
> This basically deletes the "ignore empty nodes" logic.
>
> Which means that if we ever see an empty node - say temperature or
> whatever - it will now feed that empty string into our number parsing,
> which will make the parser unhappy.
>
> So I actually think you should keep the "strip space at beginning"
> logic, not because we need to strip it, but because we want to ignore
> empty nodes.
>
> So I'd leave the code something like
>
>     while (isspace(*content))
>         content++;
>     if (!*content)
>         return;
>
> which keeps us ignoring empty xml nodes.
>
> Empty xml nodes really are *very* common in various crap dive computer
> XML. You can see the old Suunto XML files in the git history, for
> example: they have empty BOOKMARK nodes in every sample (and then have
> some data in some of them). The other XML files I've seen do similar
> insane things (ie iirc, oddf tends to always have the insane xml nodes
> for the weird stuff it supports, but they're generally empty)
>
> Witht hat simple addition to keep existing semantics, ACK on this patch.
>

aha, i see,

here is a small update for this patch - attachment.
it now uses a method i've noticed yesterday xmlIsBlankNode():
http://www.xmlsoft.org/html/libxml-tree.html#xmlIsBlankNode

commit message update:
-----------------------------------------
visit_one_node(), now also has a xmlIsBlankNode() check, where if
1 is returned, the node contains only whitespace or is empty and should
not be processed.
-----------------------------------------

lubomir
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.hohndel.org/pipermail/subsurface/attachments/20121228/fe900f75/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Don-t-allocate-an-intermediate-buffer-for-most-parse.patch
Type: application/octet-stream
Size: 6738 bytes
Desc: not available
URL: <http://lists.hohndel.org/pipermail/subsurface/attachments/20121228/fe900f75/attachment-0001.obj>


More information about the subsurface mailing list