RFC: Statistics in Subsurface

Willem Ferguson willemferguson at zoology.up.ac.za
Sat May 16 07:13:52 PDT 2020


This is just an attempt to enumerate how many types of graphs one is 
likely to need, given the discussion until now. As a basis I use Dirk's 
Proposal for selecting a appropriate graph.

In the above diagram, the different types of variables have different 
colours.

1) The yellow ones are just totals (Total # dives, Total no. 
minutes/hours) that are unlikely to have any associated minimum or maximum.

2) The blue ones are variables defined in terms of categories. Date : 
day, week, month, etc; Trip : trip locality; suit: type of dive suit; 
tags : tag text. There is no dive suite value inbetween wetsuit and 
semidry suit because they are two distinct categories.

3) The white ones are continuous numeric variables. Duration can 
potentially have any arbitrary number of minutes or hours. The same goes 
for Max_depth, Min_temp, SAC, and all the other white ones. Inbetween 
any two arbitrary depths there are innumerable intermediate depths and 
depth only becomes a value along a continuous scale.

The type of graph that best depicts a relationship between two types of 
variables depends on the colour that each of the variables above has. I 
need to emphasize that the graphs below are totally open to discussion. 
The purpose here is to assess how many types of graphical elements one 
would need for a basic statistics tab in Subsurface.

Plotting a yellow variable against a blue variable is probably best 
represented by a simplistic bar graph like:

There are no min and max values to indicate. The different suit 
categories are indicated along the horizontal axis. There is no need to 
specify a degree of "granularity" or increment along the horizontal axis 
and no min or max values are involved.

If one plots a yellow variable against a white (continuous) variable, 
then a granularity/increment needs to be specified. In the image below, 
an increment of 20m was used.

Basically the same type of graph as the one used above. No need for 
min/max values. Of course, as was well argued previously, the bar graph 
can be horizontal in the case of long names on the horizontal axis, e.g. 
dive site names:

While I personally have no qualms with horizontal diagrams where needed, 
I would argue it is a regression to default to horizontal orientations 
for all bar graphs.

The above graphs deal with yellow variables in Dirk's proposal. Now 
about the other categories. Plotting a White variable against a Blue 
variable has several options, including box and whisker plots that are 
not popular in this discussion. My proposal two days ago was something 
like this and there was some discussion around it:


Here SAC is a white (continuous) variable and Suit is a blue 
(catagorical) variable. A graphical element that is likely to differ 
sharply from the bar graphs used above. Here again, because the 
horizontal axis comprises categories, there is no need to specify a 
granularity/increment. For lack of a better name (there is actually a 
esoteric statistical name for this graph) I call this a dot graph.

What about plotting a Blue (categorical) variable against a White 
(continuous) variable? For our case the order in which the blue and 
white variables are selected probably does not matter and the dot graph 
shown above (or some derivative of it) should suffice.

What if a white (continuous) variable is plotted against another white 
variable (e.g. dive duration against dive depth). The most appropriate 
type of graph is probably as scatter diagram:

The raw data are indicated on the graph. There is no need for specifying 
a granularity value because there in no grouping of values along the 
horizontal or vertical axes. If a clear relationship between the two 
variables exists, it is clearly visible on the graph as in this case.

We have now dealt with

Yellow/white

Yellow/Blue

White/Blue and Blue/White

White/White

What about Blue/Blue?

There is another type of graph that is potentially extremely useful : 
introduce a *third* variable to the graph. For instance, in the case of 
the second blue bargraph towards the start of this message (No.dives vs 
depth) one could ask what the distribution of a third category is. For 
instance, how long did I use various dive suits at different depths? Or 
how many dives did I use different dive suits at different depths? This 
is the above barchart, divided into the values for different dive suits. 
This is also useful to analyse variables used as tags, e.g. the use of 
air/nitrox/trimix during dives, the number of boat/shore dives, the 
number of training dives compared to fun dives, the number of dives 
using different dive modes as a function of depth, dive duration, 
temperature, or whatever white variable has been selected.

Since the horizontal axis corresponds to a white (continuous) variable, 
one would need to specify a granularity/increment. The UI cost for this 
would be an additional dropdown list/comboox to select the appropriate 
categorical variable to appropriately subdivide each bar of the graph 
(Dirk's Granularity??). This diagram handles cases of graphs with a 
blue(categorical) variable plotted against another blue (categorical) 
variable, although a third variable needs to be specified to form the 
unit of measurement (e.g. dive duration in the above graph). This can 
probably be selected using Dirk's Granularity Combobox in his proposal.

This handles basically all the possibilities of the different 
combinations of Yellow, White and Blue variables in Dirk's proposal. 
There are fundamentally FOUR types of graphs that would be required, 
forming the basis of visual presentation of the Statistics tab.

I hope this appears somewhat useful in the present discussion.

Kind regards,

willem








-- 
This message and attachments are subject to a disclaimer.

Please refer to 
http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf 
<http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> for
full 
details.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gfnhmollagimhhko.png
Type: image/png
Size: 141121 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hnljlgceengjopkb.png
Type: image/png
Size: 15773 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mengpnbdjbogjbcm.png
Type: image/png
Size: 14416 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffkgopmmdoajfbbm.png
Type: image/png
Size: 13128 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jijkgdeoohbbfibe.png
Type: image/png
Size: 13009 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0011.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmdckpjgcadkbmmd.png
Type: image/png
Size: 18330 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0012.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omjglffkhhbefejk.png
Type: image/png
Size: 21213 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200516/1b925b17/attachment-0013.png>


More information about the subsurface mailing list