RFC: Initial git save format

Dirk Hohndel
Thu Mar 6 16:29:40 PST 2014

On Thu, 2014-03-06 at 16:20 -0800, Linus Torvalds wrote:
> On Thu, Mar 6, 2014 at 3:20 PM, Dirk Hohndel <dirk at hohndel.org> wrote:
> >
> > The problem with that is that it exposes terminology to Joe and Jane
> > Diver that I'd prefer to hide from them.
> Actually, with the git save-file, my *preference* would be that the
> default filename configuration would basically become a non-issue
> entirely for a "normal user", but we need to have it for existing
> people.
> So what I'd *like* to do is:
>  - turn the current subsurface directory into a git repository
> (~/subsurface/ on Linux, ~/Library/Application Support/Subsurface on
> MacOS, and CSIDL_APPDATA/Subsurface/ on Windows)
>  - make the default save branch be that git repo, with the user name
> as the branch.
> and then the whole "configure default save area" would be only used by
> people who for some reason or other want to use a different git tree.
> We basically already have the private subsurface directory on all
> platforms already, and we already have the user name that we use for
> the config file name. So this doesn't really introduce anything new,
> it's just expanding on the current situation.
> > - we pick a default location and by default create a repo there and save
> > to git on all three platforms
> > - Export XML always saves to an XML file
> > - Save As allows the user to specify a different location for a git repo
> Yes, pretty close to that. So "Save as" would always save in git
> format, and the old XML format would always be through "export as
> XML".
> But one issue with that is that right *now* (and for existing users),
> we have that whole "oops, people have their old data in XML format,
> and the new git save format is a bit experimental".


> So I think we need some way to migrate cleanly from one to the other.
> It *could* be as simple as:
>  - on startup, see if the user subsurface subdirectory is already a
> git tree, and the branch for the username exists. If so, use that.
>  - otherwise, use the XML file from the "Default User filename" thing.
> If that's acceptable, I can do that without any new UI at all once I
> do the reading part, although we migth *eventually* want a UI just to
> allow saving elsewhere.

Yes, that is acceptable. Not for 4.1, but for 4.2

> >> The reading part isn't really any harder, I expect to do that over the weekend.
> >>
> >> The directory layout and file format might change a bit, but assuming
> >> I get the parsing done, I'd expect next week to be close to a final
> >> format. I need to get this done before next kernel merge window.
> >
> > Linus "Fast" Torvalds.
> It's the offline planning that is slow. The patch I sent was mostly
> written today, with just some libgit2 skeleton code yesterday to find
> out the problems (ie "oops, git_treebuilder is too limited for what I
> want to do").
> A few hundred lines of code is not a big deal - it's literally mulling
> over "ok, how does this need to work" that takes time.

I still didn't expect to have a fairly complete implementation in a
couple of days.

There is of course still the question of "talking to a backend server"
part that you haven't mentioned in your plans

> > But I still worry about too little testing being possible for 4.1.
> > Let me ponder this. I have added this to master for now, we can always
> > disable it for 4.1
> Agreed.
> >> So you *can* use it as a checked-out tree, it just wouldn't be
> >> anything subsurface cares about. For subsurface, you would likely
> >> mostly use a so-called "bare" git repo (and that's what I'd do by
> >> default for the "create repo" case when creating a repository from
> >> within subsurface, see above about the lack of UI for that, though)
> >
> > So what else is needed here from a UI perspective? Anything beyond what
> > is discussed above?
> *Eventually* we definitely want to have a way to do the network syncing part.


> I do *not* believe that we want to have people doing "git pull/push"
> to sync to some repository in the cloud, but I do think that one of
> the big advantages of the git model is that it will make that syncing
> much easier. And we'll need some gui thing to set that up etc.

No, I don't want users to EVER have to interact with git.
Any git functionality we might need needs a Subsurface interface.
But push/pull turns into "send / receive" + setup of the account in
question (which we can delegate to a web site).

> I think some of those interfaces will inevitably be outside of
> subsurface (ie setting up an account on github or whatever), but I
> suspect there's a few things we'd want to do.

Yes, that part would be outside. Everything else should be inside

> But that's definitely not a short-term thing.

4.2 :-)

> > That said, I see nothing wrong with year/month hierarchy - but I wonder
> > how trip and day would nest...
> Note that I'm going to very consciously try to make the file layout be
> unimportant, and the "read git tree" part will mostly be about "let's
> just recursively find random dive files" without their location being
> all that important. So the layout would be something like
>  - dive that isn't in a trip: saved dive #390 into into "2014/03/Sat-04-390"
>  - same dive that is in a trip: "2014/03/trip041/Sat-04-390"
> So the "hierarchy" is actually not meaningful for subsurface itself,
> and has no data structure meaning (except that I'd be incrementally
> picking up date hints from the filenames: right now the filename has
> the full date, but when I do per-year subdirectories, I'd drop the
> year from the divename if it is the same as in the directory structure
> etc).

So what about 'trip membership' - is the location in the tree relevant
for that?

> So it's not really so much a hierarchy thing, as a convenience issue:
>  - make the file layout sensible so that 'git log -p' output is
> human-readable, and dives group naturally together
>  - when merging git trees, I want the file layout to make it trivial
> to generally merge automatically, and if there are conflicts, I want
> the file layout to be so simple that people don't screw it up.
> Now, that merging issue should be *really* rare, but let's assume
> people sometimes go on dive trips with different laptops and without
> having synced up in some central place, so you get a real merge. I'd
> want there to preferentially never really be conflicts within any
> files, so I want to spread the files around into different places
> based on trip/date information. Then any merge resolution will
> generally be "ok, remove this dive that already exists in another
> place because I ended up creating a trip for it when I was on that
> other device without internet access".


Dirk "no UI is the best UI" Hohndel

