RFC: Statistics in Subsurface

Dirk Hohndel dirk at hohndel.org
Sat May 16 10:59:13 PDT 2020



> On May 16, 2020, at 10:23 AM, Hartley Horwitz via subsurface <subsurface at subsurface-divelog.org> wrote:
> 
> In the above diagram, the different types of variables have different colours.
> 
> 1) The yellow ones are just totals (Total # dives, Total no. minutes/hours) that are unlikely to have any associated minimum or maximum.
> 
> 2) The blue ones are variables defined in terms of categories. Date : day, week, month, etc; Trip : trip locality; suit: type of dive suit; tags : tag text. There is no dive suite value inbetween wetsuit and semidry suit because they are two distinct categories. 
> 
> 3) The white ones are continuous numeric variables. Duration can potentially have any arbitrary number of minutes or hours. The same goes for Max_depth, Min_temp, SAC, and all the other white ones. Inbetween any two arbitrary depths there are innumerable intermediate depths and depth only becomes a value along a continuous scale.
> 
> I would categorize "Dive Type" in blue.   It isn't a continuous variable, and choices are distinct: free, OC, CCR, etc.

Yep :-)

> One thing to think about that applies to tag and suit -- those fields allow the user to provide a comma-separated list.  Suit may be something like: 5mm, 5/3mm hoodie, gloves.   Tags could be: "wreck, night, deco"

Careful, those two are different. Tags are a comma separated list of individual tags. Suit is a free form text field with no Subsurface implied structure to it.
So I view suit as a single string. But as I mentioned in my reply, for tags we will likely need a way to pick which ones to show.


> how will filtering handle this?  I hope it will separate the comma-separated list to allow filtering.  For example I may want to look only at night dives but it is rare that this is the only tag used.

In theory it already does that - but as you found out on mobile that's broken. On desktop it appears to work as expected. I just tested that to be sure. You can chose any/all/none of the tags that you are filtering for
> Agree that bar graph makes the most sense here.  Does this become a graphics/UI issue if there are many distinct items in the X axis? 
> 

That's why I brought up horizontal vs. vertical

> I don't use QML/QT.   I either get stats packaged up in an expensive stats/database tool (TIBCO Spotfire) or I use Python for lab stuff.  In python, the simple dot graphs repeated points are over-written. Bee swarm style plots preserve the data collection and I have those choices in Spotfire and python, but unfortunately a quick google search doesn't show a QML package that supports swarm plots.   Hopefully I'm wrong because it seems that there's general support from 3-4 of us on this style of plot.  

None of them do as far as I can tell. My experience with using toolkits for QML has been extremely poor. They usually are very narrow and often I think that the design sense of the people creating them is... odd. Or maybe it's mine.
But if I look at the charts that QtCharts can create via QML... wow, the 90s were good to some people...

> What about plotting a Blue (categorical) variable against a White (continuous) variable? For our case the order in which the blue and white variables are selected probably does not matter and the dot graph shown above (or some derivative of it) should suffice. 
> 
> Is that planned?   Based on the user interface at the top of this discussion I don't think a user can plot a categorical variable against a continuous.   I must be misunderstanding what you mean.  For example we CAN plot duration by date, but plotting date by duration makes no sense.  Plotting duration by date and by depth is possible.  I'm not sure how we'd deal with the X axis.  Obviously in advanced stats tools we can add a 3rd dimension for surface plots, or use colors and other visualization aids.  I don't want to complicate this for subsurface so I'd suggest this is not supported.

I asked similar questions.

> What about Blue/Blue?
> 
> I'd suggest  that this is another example of a feature that will take time to write and support with limited appeal.  Just an unsubstantiated opinion so take it for what its worth.

Ditto


/D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/b5505748/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gfnhmollagimhhko.png
Type: image/png
Size: 141121 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/b5505748/attachment-0001.png>


More information about the subsurface mailing list