RFC: Initial git save format

Thu Mar 6 13:54:56 PST 2014

On Thu, 2014-03-06 at 13:28 -0800, Linus Torvalds wrote:
> So Dirk knows about this effort, and it's been rattling around in my
> head for a long while, but it took some time for me to digest how to
> actually do it.

When I saw your G+ post I was trying to remember when we first started
talking about this. In Florida we still talked about a new file format
in the context of just a text file. So this hasn't been THAT long that
we talked about using git instead. I think it was in Palau when we
discussed how one could merge dive files edited independently...

>  The attached patch is a *rough* initial
> implementation.
> 
> I say rough, because:
> 
>  (a) I didn't do the git configuration part, particularly not the UI
> to set the save file.
> 
> Right now, to try this patch out, you need to re-use an old git
> repository or create a new one ("git init") outside of subsurface. You
> then also need to create a one-line "git pointer" file, which can be
> anywhere and that points to that repository and the specific branch
> you want to use for saving.
> 
> The one-liner "git pointer" file is just the string "git", some
> whitespace, and then "repository:branch".
> 
> For example, I already had a git repository to track my (and Dirks)
> XML file in my home directory (~/scuba). It had an existing
> checked-out master branch that contains various XML files, and I'm not
> going to touch that. But I want to tell subsurface to use the "linus"
> branch to save things in the new format, so I do:
> 
>     echo git /home/torvalds/scuba:linus > git-test
> 
> and now if I tell subsurface to save to that file, it will actually
> save to that branch in my scuba repository. It will happily create the
> branch if it doesn't exist, and if it does exist, it will save the
> changes as a new commit (with the previous commit as a parent).

Neat :-)

how do you intend the actual UI to work?
Ideally this would be a "no user interaction UI" in most cases. I don't
want divers to have to wonder what we might mean by "branch" and other
strange terminology...

>  (b) the code to actually *load* the data from a git repository does
> not exist. This is purely a write-only "save as" operation right now,
> so that people can comment on the file formats etc.

I always loved write-only file systems... I may have implemented a few
of them myself...

>  And I need to be
> able to save things in order to test loading. So right now, the way I
> test things is:
> 
>  - start up subsurface with the regular xml file
>  - make random changes
>  - do a "save as" into the git-test file
>  - "git log -p linus" to see the end result
> 
>  (c) the file format *will* change. Right now the file format is a
> tree, with each trip getting a subdirectory of its own (remember: not
> actually checked out, so you won't see any subdirectories - but they
> are tracked as such inside git), and then within that a file for each
> dive.

That I like. And it addresses the issue that first got us to talk about
git... you use two different computers / tablets / what not to edit your
dive log and need to merge the result. With a dive based structure this
ends up being much easier.

> The file format for each dive is fairly sane, and might not need much
> changing (it looks a lot like the current XML, except it's a
> file-per-dive, and it lacks all the crazy XML syntax). But I don't
> save the dive trip notes etc right now at all, so the trip data itself
> isn't there, and I am pretty sure that I want to do a deeper directory
> hierarchy with at the very least each year getting its own
> subdirectory.

what happens to dive trips that straddle New Year's Eve? The date of the
first dive determines the location?

> So for example, right now I can do not just "git log -p linus" to see
> the changes, but can do things like
> 
>   git show linus:trip040/2014-01-15-11:12:00-474-36f102b7
> 
> to see one particular dive in my last trip, and I get something like
> 
>     duration 62:05 min
>     gps -10.441307 105.554471
>     location "North West Point"
>     divemaster "Hama"
>     buddy "Dirk"
>     suit "2/3mm wetsuit"
>     cylinder vol=12.0l workpressure=200.0bar description="12L 200 bar"
>     weightsystem weight=4.082kg description="Integrated"
>     divecomputer model="Mares Icon HD Net Ready" deviceid=e59d50b9
> diveid=86356adf
>       depth max=26.2m mean=13.43m
>       temperature water=28.2°C
>       surface pressure=1.0bar
>         0:05min 5.1m 28.4°C
>         0:10min 5.9m
>         0:15min 6.1m 28.5°C
>         0:20min 5.9m 224.0bar
>         0:25min 6.4m 28.4°C
>         0:30min 6.3m 28.5°C
>         0:35min 6.6m 28.6°C
>         0:40min 6.6m 28.7°C 222.4bar
>         0:45min 6.7m
>         0:50min 6.7m
>         0:55min 7.0m 28.8°C

Thanks for switching back to full time stamps. I like this much better
than the delta based thing you were playing with earlier.

>     ...
> 
> and
> 
>    git show --stat linus
> 
> shows my last change, that just added a fake new dive in a fake new trip:
> 
>     commit 8eadfaa4144cfef2f39a3f2c252f6827c3022489
>     Author: Subsurface <subsurface at hohndel.org>
>     Date:   Thu Mar 6 12:45:34 2014 -0800
> 
>         subsurface commit
> 
>      trip041/2014-03-06-12:45:08-475-fa53548b | 12 ++++++++++++
>      1 file changed, 12 insertions(+)

Can you make it add the subsurface version number to the commit message?
I can't put my finger on it exactly WHY I'd want it to be there, but I'm
thinking we might later want to be able to tell if a commit was made by
a specific version. Maybe if there's a bug and we know how to undo it or
something? As I said, not completely sure, but it's a gut feeling that
this might be useful.

> so things work, and you can use git to examine things even if
> subsurface itself can't read the end result yet.
> 
> So on the whole I think it's a reasonable starting point, and while it
> is not actually useful for real work, it *is* useful for testing and
> commenting.

Yes

> Dirk, while I think this is good enough to apply (and it has my
> sign-off), the upsides aren't big until you can load things too.

What's your realistic expectation how long it will take for this to
shake out? I had made the assumption that this would go into 4.2, not
4.1, so maybe I should put it in a branch for now?
Or should it be in master and we just disable it when making a release
(assuming it's not ready)?

> Please comment, though.

I will, err, am.

> A few questions:
> 
>  - This has been tested with libgit2 in current Fedora 20 (which is
> version 0.19). How does it work on Windows/OSX? I know libgit2 works
> on those platforms, but I don't know how much pain it adds to the
> build requirements.

I have played with this when you first mentioned libgit2 to me a couple
of weeks ago and it seems to be fairly easy to make this work both for
cross builds to Windows (I have a libgit2.dll ready here) and for Mac
builds.

>    On F20, all you need to do is "yum install libgit2-devel".
> 
>  - should I make a directory per dive, and make each dive computer be
> a file of its own? Right now it's "one file per dive", but I could
> make it "one directory per dive (or perhaps per day) and then a "dive"
> file for the core data, and separate files for each dive computer.

Since this is not a checked out format, I think this makes a lot of
sense. I assume that would mean that there is a "special" file for the
dive data and then individual files for each DC that are identified by
the DC deviceid?

I would definitely want one "unit" per dive. So either one file or one
directory by itself. Let's not have the "day" be a special unit that
contains more than one dive (without sub-structure)...

>  - any particular other comments about the save format?

Some comments for the others who haven't been part of our
conversation... correct me where I'm misremembering things

- this is intended as our internal format
- we continue to allow to save to XML (for export, transport, etc)
- we can use all the neat networking capabilities of git - so it's easy
to then put the data store on a server and use it from different clients
- this trivially gives us backup and versioning, but that of course
requires a sane UI to deal with it (that alone makes this 4.2 material)
- this does not by itself solve the problem of merging multiple edits of
the SAME dive. But it trivially deals with edits (and adding) of
different dives independently on different machines

>  - is somebody willing to write the Qt GUI to pick a git repo and
> branch, so that the hacky git-pointer file can be removed?

Yes, that would be great. All you need is a path and a branch name,
right? Ideally this would allow the user to point to a path, check that
there's a git repo there and list the branches that exist and allow the
user to use one of those or add a new one, correct?

I'm sure that we have a couple of GSoC candidates who know enough Qt to
whip this out in no time. Assuming Tomaz doesn't beat them to it :-)

/D