Fwd: Re: RFC: Statistics in Subsurface
Willem Ferguson
willemferguson at zoology.up.ac.za
Tue May 12 12:41:54 PDT 2020
-------- Forwarded Message --------
Subject: Re: RFC: Statistics in Subsurface
Date: Tue, 12 May 2020 21:41:08 +0200
From: Willem Ferguson <willemferguson at zoology.up.ac.za>
Reply-To: willemferguson at zoology.up.ac.za
Organization: University of Pretoria
To: Dirk Hohndel <dirk at hohndel.org>
On 2020/05/12 20:49, Dirk Hohndel wrote:
> Hi Willem,
>
> Thanks for responding... I wish more people got involved into these
> conversations. But usually topics like this get two or three of the
> 300+ people here to respond. And then ten more will complain after we
> have done the next release and they notice for the first time that we
> added a feature...
>
>> On May 12, 2020, at 12:59 AM, Willem Ferguson
>> <willemferguson at zoology.up.ac.za
>> <mailto:willemferguson at zoology.up.ac.za>> wrote:
>>
>> I understand Berthold's request with respect to temporal sequences.
>> When developing such a temporal facility there is an important
>> caveat. Emphatically, such temporal representations do not provide
>> any clear *explanation* of anything: it is just a temporal pattern.
>> For instance a decrease in SAC rate over time does not necessarily
>> imply any improvement in physiological ability but may reflect
>> adoption of new equipment or change in dive sites. Any explanation of
>> a temporal trend is dependent on the understanding of the USER, not
>> on the SOFTWARE. So, when dealing with temporal trends, one needs to
>> consider carefully the intended type of use of it. I think Berthold
>> is more concerned with continuous variables such as temperature, SAC,
>> dive duration, depth, etc which could probably be reasonably easily
>> implemented. To represent categorical variables such as tags, dive
>> mode, people and suit (one could even add dive site) is a totally
>> different issue requiring a totally different type of visual
>> representation.
>>
>
> I was in complete agreement until the very last sentence. I don't
> understand why this 'per se' requires a "totally different type of
> visual representation".
> Let's say I am charting SAC over my criteria. Let's assume I'm using
> box and whiskers charts to easily show the quartiles. The values on
> x-axis have implications for the interpretation, of course, but
> whether the x-axis is months of the year, the suit worn, the maximum
> depth of the dive, the tags present on the dive (e.g., teaching dive
> or non-teaching dive) has absolutely no impact on how this should be
> visually represented...
>
>> It would help, in this discussion, if one were to distinguish between
>> the filtering aspect and the statistics display aspect and state that
>> with respect to the argument. In Dirk's artwork above, I am not sure
>> how the constraints will be used. Are we talking of the filtering
>> process or the stats display mechanism? Let's say "Suit" is a
>> constraint and two dates are provided. I am not sure what the
>> expected result of the operation would be. Ahh, the problems of
>> communication.
>>
>
> What I was trying to describe was a way to create criteria that can be
> used for columns in the visualization. You go through this filter
> process, name the result, and that name becomes one of the available
> labels over which you can chart the values.
> Again, as I said before, I may simply be over-engineering this.
>
>> In general, in my opinion, the existing filter layout is a good
>> starting point (I would add the variables of dive depth and dive
>> duration because they are the two variables that fundamentally define
>> a dive). As a filtering mechanism the current implementation is
>> ultra-flexible.
>>
>
> While I respect your opinion, let me politely state that personally I
> believe that the current filter widget is a disaster and extremely
> unintuitive to use. That's not a criticism of the original author, nor
> of the people who have added to it - but yeah, that thing is a mess.
>
>> As far as UI for filter sets are concerned the minimum component
>> count would include: Combobox of existing named filters within the
>> set. Button: add current filter to filter set. These could
>> potentially reside at the top right of the current filter panel. But
>> there might be a need to give filter set a name as well. That would
>> need a text box.
>>
>
> Making the current widget more convoluted and more confusing was not a
> direction that I was envisioning us to go.
>
>
> Maybe we need to rain in the crazy German and go back to something
> much more basic. Something like ten predefined sets of criteria. And
> only apply them to the filtered dive list.
>
> So.
> (1) per month
> (2) per year
> (3) per trip
> (4) by max depth in 10m increments
> (5) by duration in 10min increments
> (6) by min temperature in 10F / 5C increments
> (7) by type (for people who track more than SCUBA)
> (8) by suit (that's likely a fairly small set for most people)
> (9) by tags (that one I'm unclear about - would likely need some more
> ability to influence how this is drawn - but straight forward would be
> to draw them in pairs of two, left one represents with the tag, right
> one without the tag)
> (10) by people? (no idea how / why)
> (11) by full text? (no idea how / why)
>
> If we drop the last three this seems fairly obvious how to do.
>
> Next comes the question of visualization. That might depend on the
> data (so the columns of the yearly statistics). At first glance I
> thought that box and whiskers charts might be useful, or more
> simplified min / avg / max charts (so floating bar with a circle for
> the average)
>
>
> the 'candlesticks' plotMake an Avg-Max-Min Chart in Microsoft Excel
>
> Are there any columns that couldn't be visualized with that?
>
> /D
>
>
I am comfortable with your points of view, above. The 10m or 10min
increments could easily be configurable. For instance a person with OW
certification (dives to 18m only with almost all dives in the 10-18m
range) would probably want at most a 5m increment in depth. Unless I
understand you wrongly (again). Normally with statistical software (like
R) the default increment is determined by the (max-min) range of the
data as well as the number of data points being plotted. Of course I
would not like an increment of 3.674 m of depth as might be the case
when increment is automatically calculated by machine. My only point is
that a single fixed increment is possibly restrictive and it would help
if there were a simple rule to do some adjustment of the increment.
As far as specifying categories like tags I like the present UI where
one could specify a number of tags to be included in the filter, giving
great flexibility. Again my impression of such a plot possibly differs
from yours. I like your binary set idea (a set including compared to a
set excluding). But I would more realistically often want to compare
(e.g. SAC when comparing two tags "air" and "nitrox"), a use case which
does not necessarily imply a binary comparison because it could compare
3 or 4 tags. Does this make sense at all?
Lastly, I do not like candlestick graphs because the application in
econometrics does not include the equivalent of a mean value. It is
meant to indicate the limits and sometimes direction of change within a
specific time period giving rise to the candle forming the central part
of the graph. In my opinion a minimal box and whisker approach is more
readily interpretable.
I am very excited that this discussion is actually happening that that a
window of opportunity exists with people like Tomaz and Berthold
interested in being involved.
Kind regards,
willem
--
This message and attachments are subject to a disclaimer.
Please refer to
http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf
<http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> for
full
details.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200512/63c82214/attachment-0001.html>
More information about the subsurface
mailing list