RFC: Statistics in Subsurface

Hartley Horwitz hhrwtz at gmail.com
Sat May 16 10:23:37 PDT 2020


---------- Forwarded message ----------
> From: Willem Ferguson <willemferguson at zoology.up.ac.za>
> To: Dirk Hohndel <dirk at hohndel.org>
> Cc: Subsurface Mailing List <subsurface at subsurface-divelog.org>
> Bcc:
> Date: Sat, 16 May 2020 16:13:52 +0200
> Subject: Re: RFC: Statistics in Subsurface
>
> This is just an attempt to enumerate how many types of graphs one is
> likely to need, given the discussion until now. As a basis I use Dirk's
> Proposal for selecting a appropriate graph.
>
Very good to try to organize things, so thanks for initiating this...

> In the above diagram, the different types of variables have different
> colours.
>
> 1) The yellow ones are just totals (Total # dives, Total no.
> minutes/hours) that are unlikely to have any associated minimum or maximum.
>
> 2) The blue ones are variables defined in terms of categories. Date : day,
> week, month, etc; Trip : trip locality; suit: type of dive suit; tags : tag
> text. There is no dive suite value inbetween wetsuit and semidry suit
> because they are two distinct categories.
>
> 3) The white ones are continuous numeric variables. Duration can
> potentially have any arbitrary number of minutes or hours. The same goes
> for Max_depth, Min_temp, SAC, and all the other white ones. Inbetween any
> two arbitrary depths there are innumerable intermediate depths and depth
> only becomes a value along a continuous scale.
>
I would categorize "Dive Type" in blue.   It isn't a continuous variable,
and choices are distinct: free, OC, CCR, etc.

One thing to think about that applies to tag and suit -- those fields allow
the user to provide a comma-separated list.  Suit may be something like:
5mm, 5/3mm hoodie, gloves.   Tags could be: "wreck, night, deco"

how will filtering handle this?  I hope it will separate the
comma-separated list to allow filtering.  For example I may want to look
only at night dives but it is rare that this is the only tag used.



> The type of graph that best depicts a relationship between two types of
> variables depends on the colour that each of the variables above has. I
> need to emphasize that the graphs below are totally open to discussion. The
> purpose here is to assess how many types of graphical elements one would
> need for a basic statistics tab in Subsurface.
>
> Plotting a yellow variable against a blue variable is probably best
> represented by a simplistic bar graph like:
>
> Agree that bar graph makes the most sense here.  Does this become a
graphics/UI issue if there are many distinct items in the X axis?

> There are no min and max values to indicate. The different suit categories
> are indicated along the horizontal axis. There is no need to specify a
> degree of "granularity" or increment along the horizontal axis and no min
> or max values are involved.
>
> If one plots a yellow variable against a white (continuous) variable, then
> a granularity/increment needs to be specified. In the image below, an
> increment of 20m was used.
>

Based on some earlier discussions on this topic, the user may want to
choose the granularity, but that complicates the user interface.
Depending on the choice of white (continuous) variable, will there be fixed
granularity?  I don't have a good answer for this question. For the
recreational diver it is easy to choose fixed increments in metric and
imperial units that make sense.  For tech divers that go really deep, the
increments would need adjusting.

> .....snip.......
>


> The above graphs deal with yellow variables in Dirk's proposal. Now about
> the other categories. Plotting a White variable against a Blue variable has
> several options, including box and whisker plots that are not popular in
> this discussion. My proposal two days ago was something like this and there
> was some discussion around it:
>
>
> Here SAC is a white (continuous) variable and Suit is a blue (catagorical)
> variable. A graphical element that is likely to differ sharply from the bar
> graphs used above. Here again, because the horizontal axis comprises
> categories, there is no need to specify a granularity/increment. For lack
> of a better name (there is actually a esoteric statistical name for this
> graph) I call this a dot graph.
>
I don't use QML/QT.   I either get stats packaged up in an expensive
stats/database tool (TIBCO Spotfire) or I use Python for lab stuff.  In
python, the simple dot graphs repeated points are over-written. Bee swarm
style plots preserve the data collection and I have those choices in
Spotfire and python, but unfortunately a quick google search doesn't show a
QML package that supports swarm plots.   Hopefully I'm wrong because it
seems that there's general support from 3-4 of us on this style of plot.

> What about plotting a Blue (categorical) variable against a White
> (continuous) variable? For our case the order in which the blue and white
> variables are selected probably does not matter and the dot graph shown
> above (or some derivative of it) should suffice.
>
Is that planned?   Based on the user interface at the top of this
discussion I don't think a user can plot a categorical variable against a
continuous.   I must be misunderstanding what you mean.  For example we CAN
plot duration by date, but plotting date by duration makes no sense.
Plotting duration by date and by depth is possible.  I'm not sure how we'd
deal with the X axis.  Obviously in advanced stats tools we can add a 3rd
dimension for surface plots, or use colors and other visualization aids.  I
don't want to complicate this for subsurface so I'd suggest this is not
supported.

> What if a white (continuous) variable is plotted against another white
> variable (e.g. dive duration against dive depth). The most appropriate type
> of graph is probably as scatter diagram:
>
> The raw data are indicated on the graph. There is no need for specifying a
> granularity value because there in no grouping of values along the
> horizontal or vertical axes. If a clear relationship between the two
> variables exists, it is clearly visible on the graph as in this case.
>
I think most graphing packages support scatter plots and I believe they are
easily understood.

> We have now dealt with
>
> Yellow/white
>
> Yellow/Blue
>
> White/Blue and Blue/White
>
See comment above about Blue/White

> White/White
>
> What about Blue/Blue?
>
I'd suggest  that this is another example of a feature that will take time
to write and support with limited appeal.  Just an unsubstantiated opinion
so take it for what its worth.

> There is another type of graph that is potentially extremely useful :
> introduce a *third* variable to the graph. For instance, in the case of the
> second blue bargraph towards the start of this message (No.dives vs depth)
> one could ask what the distribution of a third category is. For instance,
> how long did I use various dive suits at different depths? Or how many
> dives did I use different dive suits at different depths? This is the above
> barchart, divided into the values for different dive suits. This is also
> useful to analyse variables used as tags, e.g. the use of air/nitrox/trimix
> during dives, the number of boat/shore dives, the number of training dives
> compared to fun dives, the number of dives using different dive modes as a
> function of depth, dive duration, temperature, or whatever white variable
> has been selected.
>
I showed an example of this earlier, but has a continuous connection
between the bars.  I think most people did not like it.   This is clearly
with individual bars.    If the developers think this is reasonably easy to
implement, then great.  If not, then again I'd say it is nice to have but
probably a seldom used feature.

> Since the horizontal axis corresponds to a white (continuous) variable,
> one would need to specify a granularity/increment. The UI cost for this
> would be an additional dropdown list/comboox to select the appropriate
> categorical variable to appropriately subdivide each bar of the graph
> (Dirk's Granularity??). This diagram handles cases of graphs with a
> blue(categorical) variable plotted against another blue (categorical)
> variable, although a third variable needs to be specified to form the unit
> of measurement (e.g. dive duration in the above graph). This can probably
> be selected using Dirk's Granularity Combobox in his proposal.
>
> This handles basically all the possibilities of the different combinations
> of Yellow, White and Blue variables in Dirk's proposal. There are
> fundamentally FOUR types of graphs that would be required, forming the
> basis of visual presentation of the Statistics tab.
>
> I hope this appears somewhat useful in the present discussion.
>
> Kind regards,
>
> willem
>


Thanks again for the details you've provided.

...Hartley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gfnhmollagimhhko.png
Type: image/png
Size: 141121 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hnljlgceengjopkb.png
Type: image/png
Size: 15773 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mengpnbdjbogjbcm.png
Type: image/png
Size: 14416 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jijkgdeoohbbfibe.png
Type: image/png
Size: 13009 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmdckpjgcadkbmmd.png
Type: image/png
Size: 18330 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omjglffkhhbefejk.png
Type: image/png
Size: 21213 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/da7e5742/attachment-0011.png>


More information about the subsurface mailing list