[PATCH 0/1] More efficient git writeback patch
Linus Torvalds
torvalds at linux-foundation.org
Mon Apr 4 16:32:07 PDT 2016
So this is the cover letter for a patch-series that is a single patch,
which I normally wouldn't do a cover letter for, but I thought I'd write
some background and explanation for why (and how) it would be really good
if people were to test this on their systems..
Part of the idea of the whole git save format was always that it would
split your divelog into multiple pieces, so that in theory you can load
and save the data incrementally, and avoid one of the problems that the
XML file had, which was that you had this one huge file that contained
everything and was really nasty for both people and computers to parse.
At the same time, our actual *model* for handling of the dive list in
subsurface has always been to just have all the dives available at all
times, so in practice we treated the git data the same way as we treated
the xml file: we'd load and save things all in one go.
However, that does end up not scaling wonderfully well as the divelog
grows. It's especially noticeable on mobile, since the CPU's (and the
system around the CPU's) are weaker: Dirk saw his git save times be around
12-15 seconds for his divelog on an iPhone 6 (?). Now, that's not even
with _that_ many dives in the end, although one issue is that Dirk has a
lot of dive computers, so each dive often contains two or more dive
computers, so that doubles or triples the data.
So what takes a second on a laptop basically easilyy takes ten times as
long on a phone.
Anyway, 10+ second save times (and this is not network time, this is the
git save itself) is obviously not good, and some profiling (on the laptop,
not the phone) showed that a lot of that time was just generating the nice
textual representation - the vsnprintf() string engine and the low-level
buffering functions from stdio.
We'd then generate the hash for the end result, and find the object
already in the git database (since 99% of the time you don't actually
*change* most dives, and even when you change a dive detail, you won't be
changing the divecomputer file associated with that dive), and go on to
the next dive.
Anyway, the following fairly small patch does the obvious git
optimization: just remember the git ID associated with the git save area
for a dive, and when saving the dive, don't spend time re-generating all
those objects if they haven't changed.
That apparently brought the 12 seconds down to about 2.6s on Dirk's phone.
The changes to the git loading and saving are actually fairly simple, and
while those need testing too, the *big* issue with this is that now we
need to make sure to invalidate the dive cache when we change a dive. So
now there's a new "invalidate_dive_cache()" function that needs to be
called whenever you edit a dive.
And we don't edit a dive in just one place. Sure, there's the normal dive
editing that happens when you start editing notes or buddy information or
cylinders. But we edit dives in lots of other ways: there's the mobile
case, of course, but there are more subtle cases too.
You can add gas switch events (and other events) in the profile, you can
renumber dives in the divelist, you can merge and split dives, and you can
add and remove pictures from them etc. All of those cases now need to have
that "invalidate_dive_cache()" associated with them. Because if they
don't, it looks like the change happens, but then when you save things
back, you end up saving that old stale data that you loaded originally.
Now, I added all of the invalidations that I could think of, and I even
tested it. But maybe I missed something. So testing by others would be
appreciated.
So if you want to help test this patch, remember a few things:
- it only ends up changing the git saves. It has no effect on the XML
saving.
- testing is easiest with a local git repo, but a cloud save ends up
working too. You can see that the change actually took place by
quitting and restarting subsurface, but it's much easier to just look
at the git repository with the usual git tools, and just verify that ^S
actually saved the change.
For cloud storage, you can find the repo (at least on Linux, I didn't
check the others) in your ~/.subsurface/cloudstorage/<hash>
subdirectory, and just go "git log -p" to see the changes.
- Remember that the problem is that if you edit something and there's a
missing invalidation, the edit won't "stick" in the save. But that also
means that you'd want to test this by making a small change, then
saving, make another different small change to another dive, and then
save, and see in the git logs that both of those changes were made.
Because if you make two or more changes to the same dive, if one of
them properly invalidates the cache, that may hide the fact that the
other one didn't.
But it really would be lovely to have people review the patch and help
test. Especially if you do odd small edits or use pictures (which I
don't).
NOTE! Long-term, we could do this on a larger scale with whole dive trips
etc. And in theory, we could try to do similar "lazy access" optimizations
for reads too - not actually read and parse all dives at startup, and
delay until somebody actually starts looking at it. That would be a much
bigger and more fundamental change, this is just a small step to avoid one
particular problem.
Anyway, that was a long cover-letter for a single patch that isn't even
all that big in the end. Without further ado, I present to you:
Linus Torvalds (1):
Don't write back dive data that hasn't changed in git
desktop-widgets/maintab.cpp | 4 +++-
desktop-widgets/simplewidgets.cpp | 8 ++++++--
profile-widget/profilewidget2.cpp | 4 ++++
qt-mobile/qmlmanager.cpp | 1 +
subsurface-core/dive.c | 15 +++++++++++----
subsurface-core/dive.h | 13 +++++++++++++
subsurface-core/load-git.c | 11 +++++++++--
subsurface-core/save-git.c | 37 ++++++++++++++++++++++++++++++-------
8 files changed, 77 insertions(+), 16 deletions(-)
--
2.8.0.rc4.303.g3931579
More information about the subsurface
mailing list