Problems while using divecomputer-simulator for linux

Wed Apr 2 03:16:38 PDT 2014

On 2014-03-31 16:38, Dirk Hohndel wrote:
> On Mon, 2014-03-31 at 10:36 +0200, Jef Driesen wrote:
>> To be honest, I don't feel comfortable removing the error checking for
>> the syscalls, and not just because omitting error checking is usually
>> the wrong thing to do.
> 
> But instead of making the error fatal, how about you just log it and 
> try
> to keep going, anyway?
> 
> Think of it this way: "what's the worst that could happen?"
> a) the error is real, communication will fail (same result as today)
> b) the error is bogus, communication will succeed (winning!)

I get your point, but please read on...

>> First of all, this only seems to be a problem for pty's and not the 
>> real
>> thing. The use of pty's is basically a hack to make the simulator 
>> work.
>> For that purpose I'm happy with the --enable-pty option. Only a
>> libdivecomputer developer will ever need it.
> 
> That assumes there are no issues with the libraries used or the kernel
> used and the specific serial device.

True, but I think that's a reasonable assumption. If we can no longer 
rely on the kernel or device drivers to get things right, then I think 
we're in far more trouble. Of course there can and will be bugs (as we 
just discovered the hard way), but those are supposed to get fixed, 
right?

Linus' mail to the fedora/redhat developers made that pretty clear I 
think.

>> But what worries me a lot more is that trying to communicate with a 
>> real
>> device using the wrong settings may actually harm the device. This has
>> already happened in the past with the Mares Icon HD and Matrix. Both 
>> use
>> the same protocol, but trying to talk to a Matrix with the Icon HD
>> baudrate will cause the device to lock up completely. Luckily there 
>> was
>> no permanent damage and the device could be revived by removing the
>> batteries. But that's pretty scary and something I want to avoid at 
>> all
>> cost. This Mares problem is now gone, but that doesn't mean it can't
>> happen for others.
> 
> But that wasn't a problem of RTS/DTR or other serial parameters. That
> was a USB communication issue. And IIRC this wouldn't have been caught
> by an error return, either.

Yes and no.

The scenario where this particular problem happened, was that a user 
accidentally selected the Icon HD protocol variant to connect to a 
Matrix dive computer. That happens to be the default choice when using 
the libdivecomputer universal application with just the -b iconhd 
option. To select the Matrix variant you need the extra -m 0x0F option 
(or use the -n "Mares Matrix" option).

The key difference between the two protocol variants was the different 
baudrate. You are right that trying to set the wrong baudrate did not 
cause the tcsetattr call to fail. That's because the usb-serial 
driver/chip happens to support that baudrate too. But, assume for a 
moment we ignore tcsetattr errors, and try to proceed. How do we know 
the baudrate was changed correctly? We simply don't know. It could have 
failed before it even tried to change the baudrate. The only case where 
we do know, is when it returns without errors. The result is that trying 
to proceeding anyway after an error, may put the device at risk!

Note that this Mares issue is gone now, because we are no longer using 
different baudrates, but that's irrelevant for this discussion.

>> So I'm not convinced ignoring the return value of a syscall, and 
>> hoping
>> for the best, is the right solution here. For pty's I'm fine with 
>> that,
>> but not for real serial ports.
> 
> It's your library... I think it makes much more sense to try to keep
> going and fail if the responses of actual communication make no sense,
> not if something odd happens during the setup of the communication.

I'm aware that in practice this risk will probably be very small. But if 
I have to choose between keeping the strict error checking, and removing 
it as a workaround for a bug in one particular linux distribution (and 
which only affects developers but not end users!), then that's an easy 
choice for me.

For as long I have been working on the libdivecomputer project, I've 
always been very cautious and never caused any damage to any dive 
computer, and I would like to continue that trend. I can assure you that 
the incident with the Mares Matrix was pretty scary, and I never want to 
go through that again. If that causes some minor inconvenience here and 
there, then that's a price I'm willing to pay. I hope you'll understand 
that.

Note that I absolutely don't mind enabling the pty "hack" on the 
tcsetattr call too. I think that should already be sufficient to work 
around this glibc bug in fedora/redhat.

Jef