RFC: Statistics in Subsurface

Dirk Hohndel dirk at hohndel.org
Tue May 12 14:24:28 PDT 2020



> On May 12, 2020, at 12:41 PM, Willem Ferguson <willemferguson at zoology.up.ac.za> wrote:
> I am comfortable with your points of view, above. The 10m or 10min increments could easily be configurable. For instance a person with OW certification (dives to 18m only with almost all dives in the 10-18m range) would probably want at most  a 5m increment in depth. Unless I understand you wrongly (again).  Normally with statistical software (like R) the default increment is determined by the (max-min) range of the data as well as the number of data points being plotted. Of course I would not like an increment of 3.674 m of depth as might be the case when increment is automatically calculated by machine. My only point is that a single fixed increment is possibly restrictive and it would help if there were a simple rule to do some adjustment of the increment.
> 

It's those details that tend to make something go from "straight forward" to "crazy tricky".
No, I definitely don't want 3.764m increments, but some people might want 3.04m increments (ten feet). Getting this almost right for most people is easy (ask the user how many data points they would like, and then round so in their unit system the result is marginally pleasant). Getting exactly what people would want is likely about as painful as my initial over-engineered idea.

> As far as specifying categories like tags I like the present UI where one could specify a number of tags to be included in the filter, giving great flexibility. Again my impression of such a plot possibly differs from yours. I like your binary set idea (a set including compared to a set excluding). But I would more realistically often want to compare (e.g. SAC when comparing two tags "air" and "nitrox"), a use case which does not necessarily imply a binary comparison because it could compare 3 or 4 tags. Does this make sense at all?
> 

It makes sense for people who are able to use sets of tags in a meaningful way - one could have a mutually exclusive set of three tags (say, air, nitrox, trimix) and create statistics over them. Of course the results become "strange" if dives potentially have multiple of those tags.
Again, to get this "mostly right" is fairly easy. To cover all the crazy corner cases is what's hard.

> Lastly, I do not like candlestick graphs because the application in econometrics does not include the equivalent of a mean value. It is meant to indicate the limits and sometimes direction of change within a specific time period giving rise to the candle forming the central part of the graph. In my opinion a minimal box and whisker approach is more readily interpretable.
> 

I keep saying "candlestick" when I mean to say "box and whiskers". My mistake. You are spot on correct, the error is mine.

> I am very excited that this discussion is actually happening that that a window of opportunity exists with people like Tomaz and Berthold interested in being involved. 

I am, too. And I am determined to have something that is so well specified and sketched that we will actually get to a result that fulfills our wishes.
Which is why I wish more people were engaged in the conversation.

/D

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.subsurface-divelog.org/pipermail/subsurface/attachments/20200512/7292e2a8/attachment.html>


More information about the subsurface mailing list