Fwd: Re: RFC: Statistics in Subsurface

Tue May 12 12:41:54 PDT 2020

-------- Forwarded Message --------
Subject: 	Re: RFC: Statistics in Subsurface
Date: 	Tue, 12 May 2020 21:41:08 +0200
From: 	Willem Ferguson <willemferguson at zoology.up.ac.za>
Reply-To: 	willemferguson at zoology.up.ac.za
Organization: 	University of Pretoria
To: 	Dirk Hohndel <dirk at hohndel.org>

On 2020/05/12 20:49, Dirk Hohndel wrote:
> Hi Willem,
>
> Thanks for responding... I wish more people got involved into these 
> conversations. But usually topics like this get two or three of the 
> 300+ people here to respond. And then ten more will complain after we 
> have done the next release and they notice for the first time that we 
> added a feature...
>
>> On May 12, 2020, at 12:59 AM, Willem Ferguson 
>> <willemferguson at zoology.up.ac.za 
>> <mailto:willemferguson at zoology.up.ac.za>> wrote:
>>
>> I understand Berthold's request with respect to temporal sequences. 
>> When developing such a temporal facility there is an important 
>> caveat. Emphatically, such temporal representations do not provide 
>> any clear *explanation* of anything: it is just a temporal pattern. 
>> For instance a decrease in SAC rate over time does not necessarily 
>> imply any improvement in physiological ability but may reflect 
>> adoption of new equipment or change in dive sites. Any explanation of 
>> a temporal trend is dependent on the understanding of the USER, not 
>> on the SOFTWARE. So, when dealing with temporal trends, one needs to 
>> consider carefully the intended type of use of it. I think Berthold 
>> is more concerned with continuous variables such as temperature, SAC, 
>> dive duration, depth, etc which could probably be reasonably easily 
>> implemented. To represent categorical variables such as tags, dive 
>> mode, people and suit (one could even add dive site) is a totally 
>> different issue requiring a totally different type of visual 
>> representation.
>>
>
> I was in complete agreement until the very last sentence. I don't 
> understand why this 'per se' requires a "totally different type of 
> visual representation".
> Let's say I am charting SAC over my criteria. Let's assume I'm using 
> box and whiskers charts to easily show the quartiles. The values on 
> x-axis have implications for the interpretation, of course, but 
> whether the x-axis is months of the year, the suit worn, the maximum 
> depth of the dive, the tags present on the dive (e.g., teaching dive 
> or non-teaching dive) has absolutely no impact on how this should be 
> visually represented...
>
>> It would help, in this discussion, if one were to distinguish between 
>> the filtering aspect and the statistics display aspect and state that 
>> with respect to the argument. In Dirk's artwork above, I am not sure 
>> how the constraints will be used. Are we talking of the filtering 
>> process or the stats display mechanism? Let's say "Suit" is a 
>> constraint and two dates are provided. I am not sure what the 
>> expected result of the operation would be. Ahh, the problems of 
>> communication.
>>
>
> What I was trying to describe was a way to create criteria that can be 
> used for columns in the visualization. You go through this filter 
> process, name the result, and that name becomes one of the available 
> labels over which you can chart the values.
> Again, as I said before, I may simply be over-engineering this.
>
>> In general, in my opinion, the existing filter layout is a good 
>> starting point (I would add the variables of dive depth and dive 
>> duration because they are the two variables that fundamentally define 
>> a dive). As a filtering mechanism the current implementation is 
>> ultra-flexible.
>>
>
> While I respect your opinion, let me politely state that personally I 
> believe that the current filter widget is a disaster and extremely 
> unintuitive to use. That's not a criticism of the original author, nor 
> of the people who have added to it - but yeah, that thing is a mess.
>
>> As far as UI for filter sets are concerned the minimum component 
>> count would include: Combobox of existing named filters within the 
>> set. Button: add current filter to filter set. These could 
>> potentially reside at the top right of the current filter panel. But 
>> there might be a need to give filter set a name as well. That would 
>> need a text box.
>>
>
> Making the current widget more convoluted and more confusing was not a 
> direction that I was envisioning us to go.
>
>
> Maybe we need to rain in the crazy German and go back to something 
> much more basic. Something like ten predefined sets of criteria. And 
> only apply them to the filtered dive list.
>
> So.
> (1) per month
> (2) per year
> (3) per trip
> (4) by max depth in 10m increments
> (5) by duration in 10min increments
> (6) by min temperature in 10F / 5C increments
> (7) by type (for people who track more than SCUBA)
> (8) by suit (that's likely a fairly small set for most people)
> (9) by tags (that one I'm unclear about - would likely need some more 
> ability to influence how this is drawn - but straight forward would be 
> to draw them in pairs of two, left one represents with the tag, right 
> one without the tag)
> (10) by people? (no idea how / why)
> (11) by full text? (no idea how / why)
>
> If we drop the last three this seems fairly obvious how to do.
>
> Next comes the question of visualization. That might depend on the 
> data (so the columns of the yearly statistics). At first glance I 
> thought that box and whiskers charts might be useful, or more 
> simplified min / avg / max charts (so floating bar with a circle for 
> the average)
>
>
> the 'candlesticks' plotMake an Avg-Max-Min Chart in Microsoft Excel
>
> Are there any columns that couldn't be visualized with that?
>
> /D
>
>
I am comfortable with your points of view, above. The 10m or 10min 
increments could easily be configurable. For instance a person with OW 
certification (dives to 18m only with almost all dives in the 10-18m 
range) would probably want at most  a 5m increment in depth. Unless I 
understand you wrongly (again). Normally with statistical software (like 
R) the default increment is determined by the (max-min) range of the 
data as well as the number of data points being plotted. Of course I 
would not like an increment of 3.674 m of depth as might be the case 
when increment is automatically calculated by machine. My only point is 
that a single fixed increment is possibly restrictive and it would help 
if there were a simple rule to do some adjustment of the increment.

As far as specifying categories like tags I like the present UI where 
one could specify a number of tags to be included in the filter, giving 
great flexibility. Again my impression of such a plot possibly differs 
from yours. I like your binary set idea (a set including compared to a 
set excluding). But I would more realistically often want to compare 
(e.g. SAC when comparing two tags "air" and "nitrox"), a use case which 
does not necessarily imply a binary comparison because it could compare 
3 or 4 tags. Does this make sense at all?

Lastly, I do not like candlestick graphs because the application in 
econometrics does not include the equivalent of a mean value. It is 
meant to indicate the limits and sometimes direction of change within a 
specific time period giving rise to the candle forming the central part 
of the graph. In my opinion a minimal box and whisker approach is more 
readily interpretable.

I am very excited that this discussion is actually happening that that a 
window of opportunity exists with people like Tomaz and Berthold 
interested in being involved.

Kind regards,

willem

-- 
This message and attachments are subject to a disclaimer.

Please refer to 
http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf 
<http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> for
full 
details.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200512/63c82214/attachment-0001.html>