RFC: Statistics in Subsurface

Willem Ferguson willemferguson at zoology.up.ac.za
Thu May 14 00:21:39 PDT 2020


On 2020/05/13 23:11, Dirk Hohndel via subsurface wrote:
> The video that Pedro linked to seemed to indicate that the first chart 
> is most likely to be understood, and that the the second one was 
> harder to see trends in.
>
> Conflicting with that is the desire to more typically show bar graphs 
> sideways, as that makes it easier to deal with many data sets (think 
> labeling the columns vs. labeling rows)
>
> So all this is super helpful in figuring out how we should visualize 
> these things - but not necessarily all leading to the same answers, as 
> I'm not sure how well these line graphs work when turned 90 degrees :-)
>
> /D
>
>
>
>> On May 13, 2020, at 12:53 PM, Hartley Horwitz <hhrwtz at gmail.com 
>> <mailto:hhrwtz at gmail.com>> wrote:
>>
>> I"ve attached 3 graphs showing the statistics summary.  Once again I 
>> showed them to a work colleague.  He found the upper 2 graphs easiest 
>> to understand.
>>
>> ...Hartley
>>
>> On Wed, May 13, 2020 at 3:24 PM Dirk Hohndel <dirk at hohndel.org 
>> <mailto:dirk at hohndel.org>> wrote:
>>
>>     That is excellent input!
>>
>>     Your final point is one that I had kinda assumed - most of the
>>     "more interesting" data no one but a geek will look into. And to
>>     them either box and whiskers (so quartiles) or at least floating
>>     box with mean (or your version in the first SAC chart below with
>>     the 0 based box with the mean as height and with whiskers for
>>     min/max) should make sense. But it also makes sense to look for
>>     simper ways to give access to the same data. Can you give an
>>     example for the "line graph with 3 lines for min/mean/max"?
>>
>>     Thanks
>>
>>     /D
>>
>
>
> _______________________________________________
> subsurface mailing list
> subsurface at subsurface-divelog.org
> http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface

I must admit that I do not like any of these three representations. They 
are inappropriate and inaccurate, leading to misinterpretation.

The top graph is normally used to indicate trends in three *independent* 
variables that may or may not be correlated. In the dive the data 
represent a *single* variable with its min and max values.

The middle graph is a histogram that would normally also represent three 
*independent* variables that have been sampled on the same x-axis scale. 
Again, in the dive case the min and max values represent the *same* 
variable.

The bottom graph is normally used to indicate the proportion of a total 
that is formed by a specific component. In the case of this specific 
graph, the median would be indicated by the height of the orange bar 
(i.e. vertical distance between the grey-orange border and the 
orange/blue border). The max would be indicated by the height of the 
blue part of the graph, etc. Clearly this is not what is meant.

I want to make a call that, if we are dealing with representing 
statistics, we actually use the proper statistics representations that 
we are all used to. Most likely that is either some variant of a box and 
whiskers diagram or a vertical bar chart with error bars. If these 
diagrams have been shown once to an uninformed person, the 
interpretation will always be easy. Lets use diagrams for what they are 
meant to convey and not use a sports car to drive offroad. We do not 
want any statistics related to Subsurface to be presented in an 
unprofessional and inappropriate way.

As far as the horizontal graphs are concerned, they have a place, but we 
need to understand where they come from, and that is from the old days 
when we tried to print graphs on a mainframe line printer that could not 
print characters vertically. The conventional way to represent 
histograms or bar charts is in the vertical way *unless there is good 
reason to do otherwise*. These days there is no problem in printing 
labels vertically. To have a horizontal bar graph with depth 
measurements along the vertical axis is just totally unorthodox and not 
up to modern standards.

Kind regards,

willem








-- 
This message and attachments are subject to a disclaimer.

Please refer to 
http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf 
<http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> for
full 
details.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200514/bd7048ab/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple_stats.JPG
Type: image/jpeg
Size: 37710 bytes
Desc: not available
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200514/bd7048ab/attachment-0001.jpe>


More information about the subsurface mailing list