Improve git save format

Linus Torvalds torvalds at
Fri Mar 7 17:54:08 PST 2014

On Fri, Mar 7, 2014 at 4:55 PM, Dirk Hohndel <dirk at> wrote:
> One thing I notice is that our put_<some term>() functions (so
> put_temperature() or put_pressure(), but not put_milli()) quite
> carefully return whether something was written or not (they are all of
> type int and return 0 or 1).
> Yet I don't see any code that would consume that information.

So originally, I had some multi-line code (not in this git format
version, but in my earlier trial at just getting rid of xml), where
basically you'd write multiple entries and then end it with a newline.

But you don't want to write the newline if you never actually wrote
any entries. So the return value made sense.

However, for the XML code, you can't use the return value, because the
xml code doesn't just have a final thing at the end, it has that
header at the beginning. So the xml code needs to check up-front
whether it will write anything or not. So the xml code can't use it.

And the newer git save code tries very hard to make everything
line-based: there's a few things that do multiple entries per line
(cylinders and weights, and the samples), but they are still going to
have the unconditional part, so the "did we write anything at all"
question never comes up.

So we could get rid of the return value.

> I'm a bit unclear about when you append the shortened SHA to a tree node
> name... or more specifically, when you DON'T. And why not use the
> standard 12 digits of the SHA? I'm guessing you think no one will ever
> have enough nodes to make this worth while?

Yeah, even the seven digits we print out now is overkill. I started
with four, and decided that that was so short that it was actually
visually distracting (it could look like the dive number or similar).
So the seven characters is *way* too much, but it doesn't hurt, and it
makes it distinct.

You seldom see it, because the shortened SHA is only written if it is
needed for disambiguoating filenames, and most of the time that
doesn't happen. Dives have the date and the time in them, so you'd
only need to disambiguate if there are two dives with the exact same
timestamp. The format needs to be able to handle it, but it's not
going to happen in practice.

The only case it really happens for right now is the dive computer
entry: the dive computer files are all called just "Divecomputer", so
if you have multiple dive computers (which we do), the second and
third etc entries will have the SHA1 disambiguation.

But even that turns out to be something of a mistake: I'll want to
sort the dive computers anyway so that it is unambiguous not just
which one is the first one, but the ordering between second and third
etc. So I suspect I'll call things "Divecomputer-01" etc, and then you
won't normally get the SHA's at the end of the names even for that

> static int tree_insert(git_repository *repo, git_treebuilder *dir,
>         const char *name, int mkunique, git_oid *id, unsigned mode)
> have a 'repo' argument that isn't used at all?

Ahh, that function got split up, and now doesn't ever actually write
anything to the repo. Yes, feel free to remove that argument.

> Cool - that will make it SO MUCH more useful (but I'll admit that I love
> poking around in the structures it stores right now :-) )

It was actually pretty fun just looking at git diffs etc, and trying
to make things clean. I like the directory structure, try

  git ls-tree -rt --name-only <branch-name>

to see things like that. I aimed for it looking nice and useful, while
still being easy and unambiguous to parse.


More information about the subsurface mailing list