[PATCH 3/5] Move the memory allocation of entry buffers to a separate function

Thu Dec 27 12:51:02 PST 2012

On 27 December 2012 20:39, Linus Torvalds <torvalds at linux-foundation.org>wrote:

> On Thu, Dec 27, 2012 at 8:14 AM, Dirk Hohndel <dirk at hohndel.org> wrote:
> >
> > I'm not sure I like this change. It seems to make the code more
> > complicated in order to improve memleak debugging. I am a little less
> > worried about memleaks in the parser and more concerned about
> > readability and debuggability of the code. But could be convinced
> > otherwise.
> >
> > Linus, what do you think?
>
> I agree.
>
> If we start doing things like this, I'd rather go *much* further, and
> remove the allocation entirely, and create a new
>
>    struct xmlcontent {
>       int size;
>       const char *buf;
>    };
>
> thing, and pass a pointer to that in to all the parsing functions (and
> all the way up into visit_one_node() which is where the buffer gets
> created). The parsing functions generally don't need the allocation
> anyway, and will only do a "free()" on it immediately after parsing
> it.
>
> We know the buffer is NUL-terminated anyway, although there may be
> extra whitespace after 'size'. So 99% of all the parsing functions
> don't really need to re-allocate the buffer. The ones that do are
> string things, and doing a malloc there, rather than earlier, should
> be simple.

i've been browsing the libxml API, but can't find any (auto-)normalization
methods and flags on parser level, perhaps there are none.

would it matter much if instead of storing the length and preserving the
trailing white space, we fill the trailing white space with zeroes?
i mean, as long as the long the xmlNode->content pointer is preserved we
should be able to safely modify the XML object in memory, given we don't
write outside of the buffer.
this will make the "len" parameter redundant and we could go with a simple
pointer that skipped the leading whitespace of xmlNode->content, as far as
i can see it.

then the only thing that will be to needed is to allocate memory
in utf8_string() and don't free the "buffer" in other parser functions.

lubomir
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.hohndel.org/pipermail/subsurface/attachments/20121227/e08b78d6/attachment.html>