XML format change

Tue Dec 25 11:32:11 PST 2012

On 24-12-12 19:02, Linus Torvalds wrote:
> On Mon, Dec 24, 2012 at 2:12 AM, Jef Driesen <jefdriesen at telenet.be> wrote:
>>
>> Actually you should also take into account the libdivecomputer backend type,
>> because just the model number and serial number tuple might overlap with
>> those from other backends. But if you encode the model as the full device
>> name (e.g. Suunto Vyper Air) and not just the model number, then you are
>> already doing that of course.
>
> Yes. I originally saved the vendor/device information separately, so we had
>
>      dc->vendor = "Suunto";
>      dc->product = "Vyper Air";
>
> rather than the current
>
>      dc->model = "Suunto Vyper Air";
>
> setup.
>
> It ended up being just extra work (you had to always check both), and
> unlike libdivecomputer, there's no "backend type" for subsurface, so
> there was no upside.
>
> One thing I probably *should* have done is to make the "dc->deviceid"
> be the SHA1SUM of not just the libdivecomputer device ID string, but
> make it the SHA1 of the combination of model string and device ID
> string. That would have been easy to do, and then the deviceid really
> would be unique (well, modulo collisions in just the 32-bit truncated
> space - but when people tend to have a single dive computer, and
> having five would be considered unusual, there just isn't much point
> in worrying about collisions ;)

I wouldn't worry about collisions either :-)

> [ Background for Jef, who probably didn't look at what subsurface
> does: not only do we combine the libdivecomputer vendor/product into a
> single model thing, the dive and device ID strings are not kept as
> strings at all by subsurface. We create the dive ID by calculating the
> SHA1 of your "fingerprint" string, and we do the device ID by
> calculating the SHA1 over your model/firmware/serial numbers. In both
> cases we then just take the 20-byte SHA1 and use the first four bytes
> to create a 32-bit integer. So we've turned the arbitrary
> libdivecomputer information into two 32-bit opaque numbers. That makes
> things *much* easier to work with, and has the same amount of actual
> information. ]

Thanks for the quick overview. I did look at the subsurface code already, but 
without digging into all the details.

Just a small remark. I wouldn't include the firmware version into the SHA1. Many 
modern devices can be updated, and thus the firmware version isn't fixed. I 
think it's pretty annoying to have your device being recognized as a new device 
after a firmware update. Especially for devices like the OSTC that receive 
frequent updates. The reason why probably nobody ran into this yet is that very 
few backends fill in the firmware version.

The fact that the firmware version is included in the DC_EVENT_DEVINFO is a bit 
historic mistake. The idea was that some devices might have a data format that 
is dependent on the firmware version (e.g. a new firmware may introduce some new 
features). In that case the firmware would be necessary to parse the data. 
However, for this purpose, the firmware version from the DC_EVENT_DEVINFO is 
useless, because it contains the current firmware version, and not the firmware 
version at the time each dive was recorded. All devices that have multiple data 
format versions indeed store the version per dive. So the firmware version from 
the DC_EVENT_DEVINFO isn't used for anything by libdivecomputer.

BTW, the libdivecomputer serial number is primary intended to be used as an 
device ID. That's why it does not necessary match with the human readable serial 
number. Usually the human readable number uses some special encoding (little/big 
endian, BCD, ascii), but for libdc we don't really care. We don't want the 
serial number to change if we fix a bug in the serial number decoding :-)

Using a hash for the serial number (or the subsurface deviceid), is something I 
considered too. But so far all serial numbers nicely fit into a 32bit integer, 
and there was just no need for any hashing. Someday that may change, but since 
we make no promise that the serial number matches the human readable serial 
number, that should be no problem.

Calculating the fingerprint hash isn't really an option for libdivecomputer. 
There are some devices (e.g. Uwatec Memomouse and Smart/Galileo) where you have 
to send the fingerprint (which in this case is the device timestamp) to the 
device, and then the device implements the "download new dives only" feature 
internally. Calculating a hash wouldn't work here, because you can't go back 
from a hashed fingerprint to the raw timestamp. For others devices, using a hash 
is possible, but then there is no real advantage to just consider the 
fingerprint as some opaque piece of data. The application simply doesn't have to 
care what it actually is.

> But if we were to now change subsurface to mix in the model string too
> (not just the model number that libdivecomputer uses for
> DC_EVENT_DEVINFO) into the device ID in subsurface, the existing
> device ID's would change, so it would be slightly inconvenient. Also,
> I do think that it is likely a good idea to always have the model
> information in things like nickname tables, even if it would be
> redundant - just for the human readability. So while having "globally
> unique" device ID numbers (again, modulo collisions that we don't
> really care about) could have been a programming convenience, I
> suspect we're just as well off just always using the <model,deviceid>
> tuple.

Having a human readable name is indeed a bit more friendly, compared to some 
abstract model number or hash value.

Jef