[PATCH 0/1] More efficient git writeback patch

Linus Torvalds torvalds at linux-foundation.org
Mon Apr 4 16:32:07 PDT 2016


So this is the cover letter for a patch-series that is a single patch, 
which I normally wouldn't do a cover letter for, but I thought I'd write 
some background and explanation for why (and how) it would be really good 
if people were to test this on their systems..

Part of the idea of the whole git save format was always that it would 
split your divelog into multiple pieces, so that in theory you can load 
and save the data incrementally, and avoid one of the problems that the 
XML file had, which was that you had this one huge file that contained 
everything and was really nasty for both people and computers to parse.

At the same time, our actual *model* for handling of the dive list in 
subsurface has always been to just have all the dives available at all 
times, so in practice we treated the git data the same way as we treated 
the xml file: we'd load and save things all in one go.

However, that does end up not scaling wonderfully well as the divelog 
grows. It's especially noticeable on mobile, since the CPU's (and the 
system around the CPU's) are weaker: Dirk saw his git save times be around 
12-15 seconds for his divelog on an iPhone 6 (?). Now, that's not even 
with _that_ many dives in the end, although one issue is that Dirk has a 
lot of dive computers, so each dive often contains two or more dive 
computers, so that doubles or triples the data.

So what takes a second on a laptop basically easilyy takes ten times as 
long on a phone.

Anyway, 10+ second save times (and this is not network time, this is the 
git save itself) is obviously not good, and some profiling (on the laptop, 
not the phone) showed that a lot of that time was just generating the nice 
textual representation - the vsnprintf() string engine and the low-level 
buffering functions from stdio.

We'd then generate the hash for the end result, and find the object 
already in the git database (since 99% of the time you don't actually 
*change* most dives, and even when you change a dive detail, you won't be 
changing the divecomputer file associated with that dive), and go on to 
the next dive.

Anyway, the following fairly small patch does the obvious git 
optimization: just remember the git ID associated with the git save area 
for a dive, and when saving the dive, don't spend time re-generating all 
those objects if they haven't changed.

That apparently brought the 12 seconds down to about 2.6s on Dirk's phone.

The changes to the git loading and saving are actually fairly simple, and 
while those need testing too, the *big* issue with this is that now we 
need to make sure to invalidate the dive cache when we change a dive. So 
now there's a new "invalidate_dive_cache()" function that needs to be 
called whenever you edit a dive. 

And we don't edit a dive in just one place. Sure, there's the normal dive 
editing that happens when you start editing notes or buddy information or 
cylinders. But we edit dives in lots of other ways: there's the mobile 
case, of course, but there are more subtle cases too.

You can add gas switch events (and other events) in the profile, you can 
renumber dives in the divelist, you can merge and split dives, and you can 
add and remove pictures from them etc. All of those cases now need to have 
that "invalidate_dive_cache()" associated with them. Because if they 
don't, it looks like the change happens, but then when you save things 
back, you end up saving that old stale data that you loaded originally.

Now, I added all of the invalidations that I could think of, and I even 
tested it. But maybe I missed something. So testing by others would be 
appreciated.

So if you want to help test this patch, remember a few things:

 - it only ends up changing the git saves. It has no effect on the XML 
   saving.

 - testing is easiest with a local git repo, but a cloud save ends up 
   working too. You can see that the change actually took place by 
   quitting and restarting subsurface, but it's much easier to just look 
   at the git repository with the usual git tools, and just verify that ^S 
   actually saved the change.

   For cloud storage, you can find the repo (at least on Linux, I didn't 
   check the others) in your ~/.subsurface/cloudstorage/<hash> 
   subdirectory, and just go "git log -p" to see the changes.

 - Remember that the problem is that if you edit something and there's a 
   missing invalidation, the edit won't "stick" in the save. But that also 
   means that you'd want to test this by making a small change, then 
   saving, make another different small change to another dive, and then 
   save, and see in the git logs that both of those changes were made. 

   Because if you make two or more changes to the same dive, if one of 
   them properly invalidates the cache, that may hide the fact that the 
   other one didn't.

But it really would be lovely to have people review the patch and help 
test. Especially if you do odd small edits or use pictures (which I 
don't).

NOTE! Long-term, we could do this on a larger scale with whole dive trips 
etc. And in theory, we could try to do similar "lazy access" optimizations 
for reads too - not actually read and parse all dives at startup, and 
delay until somebody actually starts looking at it. That would be a much 
bigger and more fundamental change, this is just a small step to avoid one 
particular problem.

Anyway, that was a long cover-letter for a single patch that isn't even 
all that big in the end. Without further ado, I present to you:

Linus Torvalds (1):
  Don't write back dive data that hasn't changed in git

 desktop-widgets/maintab.cpp       |  4 +++-
 desktop-widgets/simplewidgets.cpp |  8 ++++++--
 profile-widget/profilewidget2.cpp |  4 ++++
 qt-mobile/qmlmanager.cpp          |  1 +
 subsurface-core/dive.c            | 15 +++++++++++----
 subsurface-core/dive.h            | 13 +++++++++++++
 subsurface-core/load-git.c        | 11 +++++++++--
 subsurface-core/save-git.c        | 37 ++++++++++++++++++++++++++++++-------
 8 files changed, 77 insertions(+), 16 deletions(-)

-- 
2.8.0.rc4.303.g3931579



More information about the subsurface mailing list