RFC: Initial git save format

Linus Torvalds torvalds at linux-foundation.org
Thu Mar 6 16:20:59 PST 2014


On Thu, Mar 6, 2014 at 3:20 PM, Dirk Hohndel <dirk at hohndel.org> wrote:
>
> The problem with that is that it exposes terminology to Joe and Jane
> Diver that I'd prefer to hide from them.

Actually, with the git save-file, my *preference* would be that the
default filename configuration would basically become a non-issue
entirely for a "normal user", but we need to have it for existing
people.

So what I'd *like* to do is:

 - turn the current subsurface directory into a git repository
(~/subsurface/ on Linux, ~/Library/Application Support/Subsurface on
MacOS, and CSIDL_APPDATA/Subsurface/ on Windows)

 - make the default save branch be that git repo, with the user name
as the branch.

and then the whole "configure default save area" would be only used by
people who for some reason or other want to use a different git tree.

We basically already have the private subsurface directory on all
platforms already, and we already have the user name that we use for
the config file name. So this doesn't really introduce anything new,
it's just expanding on the current situation.

> - we pick a default location and by default create a repo there and save
> to git on all three platforms
> - Export XML always saves to an XML file
> - Save As allows the user to specify a different location for a git repo

Yes, pretty close to that. So "Save as" would always save in git
format, and the old XML format would always be through "export as
XML".

But one issue with that is that right *now* (and for existing users),
we have that whole "oops, people have their old data in XML format,
and the new git save format is a bit experimental".

So I think we need some way to migrate cleanly from one to the other.

It *could* be as simple as:

 - on startup, see if the user subsurface subdirectory is already a
git tree, and the branch for the username exists. If so, use that.

 - otherwise, use the XML file from the "Default User filename" thing.

If that's acceptable, I can do that without any new UI at all once I
do the reading part, although we migth *eventually* want a UI just to
allow saving elsewhere.


>> The reading part isn't really any harder, I expect to do that over the weekend.
>>
>> The directory layout and file format might change a bit, but assuming
>> I get the parsing done, I'd expect next week to be close to a final
>> format. I need to get this done before next kernel merge window.
>
> Linus "Fast" Torvalds.

It's the offline planning that is slow. The patch I sent was mostly
written today, with just some libgit2 skeleton code yesterday to find
out the problems (ie "oops, git_treebuilder is too limited for what I
want to do").

A few hundred lines of code is not a big deal - it's literally mulling
over "ok, how does this need to work" that takes time.

> But I still worry about too little testing being possible for 4.1.
> Let me ponder this. I have added this to master for now, we can always
> disable it for 4.1

Agreed.

>> So you *can* use it as a checked-out tree, it just wouldn't be
>> anything subsurface cares about. For subsurface, you would likely
>> mostly use a so-called "bare" git repo (and that's what I'd do by
>> default for the "create repo" case when creating a repository from
>> within subsurface, see above about the lack of UI for that, though)
>
> So what else is needed here from a UI perspective? Anything beyond what
> is discussed above?

*Eventually* we definitely want to have a way to do the network syncing part.

I do *not* believe that we want to have people doing "git pull/push"
to sync to some repository in the cloud, but I do think that one of
the big advantages of the git model is that it will make that syncing
much easier. And we'll need some gui thing to set that up etc.

I think some of those interfaces will inevitably be outside of
subsurface (ie setting up an account on github or whatever), but I
suspect there's a few things we'd want to do.

But that's definitely not a short-term thing.

> That said, I see nothing wrong with year/month hierarchy - but I wonder
> how trip and day would nest...

Note that I'm going to very consciously try to make the file layout be
unimportant, and the "read git tree" part will mostly be about "let's
just recursively find random dive files" without their location being
all that important. So the layout would be something like

 - dive that isn't in a trip: saved dive #390 into into "2014/03/Sat-04-390"

 - same dive that is in a trip: "2014/03/trip041/Sat-04-390"

So the "hierarchy" is actually not meaningful for subsurface itself,
and has no data structure meaning (except that I'd be incrementally
picking up date hints from the filenames: right now the filename has
the full date, but when I do per-year subdirectories, I'd drop the
year from the divename if it is the same as in the directory structure
etc).

So it's not really so much a hierarchy thing, as a convenience issue:

 - make the file layout sensible so that 'git log -p' output is
human-readable, and dives group naturally together

 - when merging git trees, I want the file layout to make it trivial
to generally merge automatically, and if there are conflicts, I want
the file layout to be so simple that people don't screw it up.

Now, that merging issue should be *really* rare, but let's assume
people sometimes go on dive trips with different laptops and without
having synced up in some central place, so you get a real merge. I'd
want there to preferentially never really be conflicts within any
files, so I want to spread the files around into different places
based on trip/date information. Then any merge resolution will
generally be "ok, remove this dive that already exists in another
place because I ended up creating a trip for it when I was on that
other device without internet access".

                        Linus


More information about the subsurface mailing list