Feedback and testing of the dive statistics

Dirk Hohndel dirk at hohndel.org
Sun Jan 10 12:14:19 PST 2021


Obviously I hope that Willem and Berthold will chime in as well, but I want to respond to that excellent detailed input, too :-)

> On Jan 10, 2021, at 7:27 AM, Peter Zaal via subsurface <subsurface at subsurface-divelog.org> wrote:
> 
> Warning: long mail!

Those are the best, aren't they? :-)
 
> I think I have a very interesting log of my real dives. Of my currently 1900+ dives I have now about almost 800 dives from the last 8-9 years in Subsurface. In the long run I want to put in all my dives, but this takes a lot of time, since I have to add a lot them manually from paper because I don’t have them digitally. But I think this is quite a nice set of data, with all kind of different type of dives, recreational, technical, lots of dive sites, lots of buddies, all year around. Yes even during corona we can go out and diving is not prohibited with us and I usually make 1 or sometimes 2 dives a week (80 dives in 2020).

I am somewhat envious... but then again, this is diving in the winter in the Netherlands... so maybe not as much :-)
I only have about 650 dives - but I have every single one in Subsurface, so our data set size is somewhat similar it seems.

> Ofcourse I had some peek previews of the statistics in this mailing list and knew a little about what was possible. But what I did was *before* going through all the options myself, I made a list of the things I would like to know about my dives. And then just see if the statistics can provide that data.

That's actually a really good way to evaluate what we have. Instead of starting from what's there, starting from what you wish were there.

> So first impression: before starting on my list, I did play around a little. The first thing I noticed was that the filter was also active. This is great! I think I didn’t see any screen previews with that, but this makes it so easy to zoom in some specific data that is presented, by filtering on that specific data. E.g. some of my charts had some unexpected labels in the legend, because I made some errors in data. Never noticed this before, but this now comes forward and by filtering I could easily find these dives and correct them. I do have an extra suggestion about this, but more on that later on. But having the filter on screen available right away is so much easier than having to close the statistics, filter the dives, reopen the statistics etc. Great!

Yeah, that was one of those breakthrough ideas as we thought about filtering this time around. I also consider this a game changer.

> My first impression was that the statistics module is absolutely amazing. I can’t say enough about the work that has gone into this. Brilliant. Lots and lots of kudos to Berthold and Willem (if I am correct). It is also blazing fast. Even with 800 dives it feels like every selection I make is instantly drawn on the screen.
> I am not a statistics person, but I found it quite easy to understand how it works and getting useful data. So you also have to keep in mind that my comments are from a noob statistical point of view. Things that might be obvious to me might actually not be statistically-wise.

The implementation kudos go 99% to Berthold. The concept / idea / statistical understanding kudos go 90% to Willem. The things that break credits go 80% to me :-)

> So, now on with the information I tried to get out of Subsurface with the statistics module.
>  
> - How many buddies do I dive with each  year?
> Base=Date yearly, Data=Buddies, Chart=stacked
> This sort of gave me the information, but each bar is -ofcourse- composed of all the buddies in that year. The height of the bar is the total number of buddies for that year (what I wanted to know), but I cannot see how much that is exactly. E.g. in 2013 this number is between 180 and 200, but I don’t know how much exactly. And yes, I understand this is not the number of dives, because lots of dives were with multiple buddies. For other data this is more relevant, but it would be great to somehow see the total of dives of a bar.
> A thing I noticed (because of the legend actually) is that Buddies are not only the people from the Buddy field, but also includes the Divemaster(s). This is not consistent with the Filter where this is called ‘people’.

Great observation. Yes, we should call this people here as well. And the question on whether people want separate statistics for buddies vs. dive guides is also worth considering.
As for the total number? I like this idea. I don't know how hard it would be to implement this (I expect the answer is 'not at all') - I hope Berthold will respond to that.

> - Number of dives per buddy
> Base=Buddies, Data=none, Chart=vertical
> So yes this gives me ofcourse the data I wanted. But looking at this, with about 80 different buddies, what I realized is that what I really wanted to see with who I dive the most, then second, etc. The bars are sorted by Buddy name, but it would be extremely helpful if you could sort by Number of dives for bar charts. A very little like with the piechart where only the top 5 are shown (+ other).

Also an interesting idea. That does kinda change the logic of how things are laid out and I'm not sure how easy this would be to do, but conceptually I think this would be super useful.

> - Dive time per buddy (overall)
> Base=Buddies, Data=Duration Sum
> Data is provided, but same request: it would be so handy to have this sorted by Duration
>  
> - How many dives did I do in each country I dived?
> Not possible, the Country is not available as a variable

Ugh. This is the whole dive site taxonomy thing all over again. I think this will require more thought. Because in the end what I think many will ask for is a full text search, but restricted to the location text plus taxonomy strings.
Berthold, this sounds a lot more complex. Thoughts?

> - How many dives per max. depth
> Base=Max. depth in 5m steps, Data=none
> Easy one
>  
> - Number of dives per max. depth, but now over time/period
> Base=Date yearly, Data=Max.Depth
> I was looking for a simple chart with a bar for each year with the total number of dives. What I got was much more than this. The box-whisker gave me all the information, and much more.
> What I came to realize is that with this query (and others to come), I was actually looking for a way to select an operation of Max (in other cases sometimes Min), so that I could see the max of max. depth per year or whatever period.
> I can select Mean, Median and Sum, but why not Min and Max? This seems very easy to provide, the same as with mean, median and sum, but just another math.

The operations add a level of complexity to the interpretation of the data that often surprised me. I can't tell you why Min and Max aren't available as operations - Berthold?

> Then I looked at the data in the infobox, and it showed Min, Q1, Mean, Q3 and max. At first I thought this the Q1 and Q3 meant 1st quarter and 3rd quarter of the year, which made no sense to me at all. Where is Q2 and Q4. But it also didn’t change when selecting Quarterly or Monthly. Then I figured this is probably some statistics thing (I told you I am  a noob in statistics), and indeed it is. I do understand a little about the difference between mean and median, but is Q1 and Q3 something ‘normal’ people are interested in. Is this really something useful information, have not seen it anywhere else ever, and to me it just clutters the information.
> What I did miss in the infobox is the number of dives it is about. If I select an operation like Mean, it does show an infobox with the Count. I would really like to see the Count also in the infobox on box-whiskers. And btw, vice-versa: when I select an operation I would like to see the Min and Max in the infobox on a bar

Yeah... when listening to some of the discussions during development, I appreciated how much more familiar Willem is with statistical terms. Mind you, I have a masters in math and understand how all this works, but the (English) terms and lingo threw me a few times. So Q1 and Q3 in this context are the boundaries of the first and third quartile and apparently these are indeed the standard terms that people use to refer to them. So I'm not sure how to communicate this better. Obviously the user manual will be able to help.
As to whether this is useful information? This was certainly something that I wanted to have in the result because it tells me a lot more about how the data are distributed when I look at things. So yeah, I do believe this is useful.

> - SAC vs suit
> Base=Suit type, Data=SAC
> Yes this provides the information, but again: I miss a way to see the exact totals per bar

As above - this should be fairly easy.

> - Total (and average) dive time over time (yes, I say average instead of mean 😉)
> Base=Date yearly, Data=Duration Sum/Mean
> But again: missing the min and max numbers in the infobox of a bar
> Funfact: my average dive time is over the years quite steady around 70 minutes per dive. Even though I make much longer dives in the last couple of years (2-4 hours), obviously this is just a small number of dives that don’t have much effect on the total average.
>  
> - Number of dives over time
> Base=Date, Data=none
> Easy one
>  
> - Dive time on oc and cc
> Base=Dive mode, Data=Duration sum
> Easy one. Sometimes, in cases like this it would be more nice to see the duration in hours instead of minutes.
>  
> - Number of dives per location
> Base=Dive site, Data=none, Chart=vertical
> At first I was confused because I didn’t find the Location variable, but then found out this is Dive site.
> With 100’s of dive sites this chart is a bit… cluttered. As before in another query, it would be so nice if this can be sorted by number of dives. And even better if you could select an ‘Only top X’ (like the piechart that only show the top 5).
> I also noticed that, in contrary to other variables, the bars are not sorted by Dive site name, but in some random order? 

I wonder what they are ordered by. I haven't looked at the code, but it could be the dive site id (which is indeed a random number). I think your idea of having them sorted by bar size and doing a cut off would indeed be great.

> - SAC vs depth
> Base=SAC, Data=Max.Depth
> The whiskers show the information, but again, I am missing the Count in the infobox (or Number of dive and %)

By the way - this is much more interesting as scatter plot.

> - Number of dives in each water type (fresh, brackishe, salt)
> Not possible, water type is not a variable you can select
>  
> - Number of dives vs temperature, over time
> Base=Date monthly, Data=Water temperature
> What I was expecting to see was the temperature rise and fall over the months. But this is ofcourse the *minimum* water temperate, and -again- unfortunately it is not possible to select a Min (or Max) operation. The box-whisker does provide this (but missing the Count), but I was looking for a simple bar chart.
> Also a binning of 20 degrees temperature seems a bit much and unuseful.

That's because we use the same raw number in binning for F and C. And binning in 20F (aka 11C) is definitely useful for me.

> - Dive time vs. Temperature
> Base=Water temperature, Data=Duration
> Same, the box-whiskers provide the information, but missing the Count. And on the operation missing the Max operation to create an simple bar chart to view my maximum dive time with temperatures.

I much like scatterplot for this one as well. Which doesn't give me the max - but gives me a much better visual representation of the data

> - Number of cave and tech dives
> Not possible, since there is no Tags variable.
> I have all my cave and tech dives marked with C1, C2, T1 and/or T2 tags. So what I wanted to do is filter all dives with these tags, and then create statistics on these dives. So that I can easily see how many dives I made of these types e.g. every year.
> So it would be great to add the Tags as a variable. Probably more fun statistics can made with that, depending on what you use the tags for.

You can filter by those tags and then get individual statistics. But you can't select a subset of tags and use them as categories for statistics - which would also be an interesting extension of what we have. Cool idea.

> So this concludes my testing. I did find some things that are bugs imho, but it did not crash a single time and overall I am extremely impressed by the statistics!!

Kudos to Berthold.
 
> Summary
>  
> Feature requests
> - Sometimes looking at the charts, you want to zoom in on some piece. What would be extremely nice is that if a part of a chart is selected (pointed at), e.g. one piece of a bar, you could right-click and select ‘Filter on this’ or something. Then these dives would be automatically be selected. This ofcourse would change the statistics again, but it would be very useful to find a specific number of dives.

This is conceptually easy - as a human the whole "...you know what I mean" thing that our brain can do is stunning. In programming actually implementing 'easy' requests like this can be insanely hard.

> - The dive list on the lower left is disabled. The scrollbars are also disabled which is not very handy. Even better, if would be great if one could just double-click a dive to go to the edit-mode of that dive!

Ugh. I'm not sure I agree with that. It adds a ton of complexity. Unless you are ok with us leaving statistics mode for that...

> - On stacked charts, provide a way to see the total number of dives per bin. This is really useful information.

agreed

> - Separate the Buddies variable in ‘real’ Buddies and People (Buddies plus Divemaster(s)). For me, a divemaster is not a ‘real’ buddy and s/he should not count in my statistics. I think most people will see it that way.
> Also make the naming consistent with the Filter.

valid point

> - Provide a way (checkbox?) to sort by No. of dives/Duration/etc (Y-axis value) to make is easier to find the most interesting data.
> - Provide a way to only show the Top X bars; very useful if you have a lot of data/bars

Both good ideas

> - Add Min and Max to the Operations, this can provide a lot of easy/simple charts with useful insights
> - Add Count to the infobox on box-whisker
> - Add Min and Max to the infobox on bars when an operation is selected.
> - Add Country variable
> - Add Water type variable
> - Add Tags variable

all good ideas. Likely in total a lot of work

> - Remove the Q1 and Q3 information, I don’t think this is something the users are interested in or have knowledge about.

I disagree

> - Consistently use ‘Dive site’ and not Location. So maybe in the Notes tab, it should be renamed?

Consistency is hard. But yes, that makes sense.

> - Binning for Water temperature: add 1 degree, remove 20 degrees

Disagree on both. 1ºC is realistically not consistently correctly measured by your DC - and worse, on basically every single dive that I have I always have multiple dive computers with me and they ALWAYS disagree by more than 1ºC. And in F it's of course even worse.
And 20F is indeed a reasonable binning.
 
> Bugs
> - Binning for Max. Depth has a double ‘in 10 m steps’
> - Binning for Dive # has no unit, I think it should be ‘dives’?, e.g. ‘in 5 dives steps’
> - Dive sites are not sorted by name on the chart
> - When switching between chart types, or hovering over parts that have an extreme large infobox, sometimes parts of the old chart stays visible
> - When the dive statistics view is visible, and then selecting View -> All, the Info view is not shown, but instead the dive statistics is (compressed) in the upper-left
>  
> Some observations / discussion points:
> - The yellow warning icon on some charts is strange. I did read it means ‘it’s not the best chart’, but actually sometimes it provides useful information. I feel the warning icon should not be shown.

Willem should speak out on that as he has VERY strong feelings about this :-)
Suffice to say that while these charts have useful information in them, they typically are marked as 'undesirable' because they can mislead in the way they represent the data. But Willem can do a much better job responding to that.

> - For the chart types there is a grouping of Histogram and Categorical. As said before, I don’t know much about statistics, and I am not really interested in this grouping, I just want to select a type.

But of course, the people who are most likely most interested in this feature are also the people to whom this likely makes a lot of sense. So we are trying to target the right audience here.
One option would be to have a 'simple' UI mode... but that of course adds a ton of development work and testing overhead and... so I'm not sure if that's a good investment.

> Even more, in the beginning I could not see any difference in the Vertical / Horizontal / Box-whisker of the Histogram and the Categorical one. When switching between the two, the chart just shifted a bit, but nothing else changed. But then suddenly I had some chart where there was a difference between the two, and it seems this is because there was no data for certain periods. I think that makes sense? Histogram always shows all data, also for bins that has no data, whereas Categorical only shows bars with data (no ‘empty’ bars). If that’s the case I would rather have 1 type and an extra option ‘show empty data’. But this is just my simple view ofcourse 😉

I think this is something where a good user manual really will help.

Thank you so much for the long and exhaustive test and all the feedback. Most of it I agree with, some of it will need input from Berthold or Willem, some of it my thoughts differ from yours. Doesn't mean you are wrong, just means I come at this with a different set of expectations.

All of it is extremely welcome and extremely helpful.

THANK YOU

/D

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20210110/e2940ef0/attachment-0001.htm>


More information about the subsurface mailing list