Android Subsurface mobile issues with pelagic data cable downloads

Bill Perry bperrybap at opensource.billsworld.billandterrie.com
Fri Nov 10 12:24:31 PST 2017


The Android Surbsurface mobile app is currently not able to communicate with or download dives from an Aeris Atmos AI.
I've been exchanging some emails with Dirk about this.

First a bit of background on me. Nearly 10 years ago I reversed engineered the download protocol for the Aeris Atmos AI.
I'm very familiar with the pelagic data cables and I have written my own downloading code that is very reliable.
I was collaborating with Jef Driesen when he was first starting libdivecomputer, we eventually parted ways over some differences.


Summary of Issues:
============
=== Read timeouts are way too long:

This patch fixes the read timeout issue - which is an issue for ALL dive computers that use serial_ftdi.c serial_ftdi_read()
not just the pelagic code for devices like the Atmos AI.

===============================

diff --git a/core/serial_ftdi.c b/core/serial_ftdi.c
index bfa23b8..4788b94 100644
--- a/core/serial_ftdi.c
+++ b/core/serial_ftdi.c
@@ -394,8 +394,11 @@ static dc_status_t serial_ftdi_read (dc_custom_io_t *io, void *data, size_t size
                timeout = 10000;

        int backoff = 1;
-       int slept = 0;
        unsigned int nbytes = 0;
+
+       struct timeval tvstart, tvnow;
+       gettimeofday(&tvstart, 0);
+
        while (nbytes < size) {
                int n = ftdi_read_data (device->ftdi_ctx, (char *) data + nbytes, size - nbytes);
                if (n < 0) {
@@ -404,12 +407,13 @@ static dc_status_t serial_ftdi_read (dc_custom_io_t *io, void *data, size_t size
                        ERROR (device->context, "%s", ftdi_get_error_string(device->ftdi_ctx));
                        return DC_STATUS_IO; //Error during read call.
                } else if (n == 0) {
-                       if (slept >= timeout) {
+                       gettimeofday(&tvnow, 0);
+                       // check to see if too much time has elapsed
+                       if( ((tvnow.tv_sec - tvstart.tv_sec) * 1000) + ((tvnow.tv_usec - tvstart.tv_usec)/1000) >= timeout) {
                                ERROR(device->context, "%s", "FTDI read timed out.");
                                return DC_STATUS_TIMEOUT;
                        }
                        serial_ftdi_sleep (device, backoff);
-                       slept += backoff;
                }

                nbytes += n;

===============================
=== Communication/Downloads not working.
There is some kind of initial ftdi or data-cable initialization issue that cause a data communication failure.
This occurs only once after the initial plugging in of the data cable and OTG adapter.
With added oceanic code to better reset the data cable, it can recover from this when the user clicks on [retry]; however,
I'm still trying to track down what is causing the the initial failure to allow it work as it should.

====

I believe all of this is interrelated.  In terms of the original code:
My suspicion is that there is some kind of issue with the initial initialization of the FTDI code
and this causes some kind of communication issue with the datacable.
This then causes a read timeout to occur as the PIC does not answer the initial command.
The timeouts were WAY too long.
Then because the oceanic code is not robust enough to recover from this type of issue, no amount of retries will ever make it work.

With the updated code, while the timeouts are now the proper length, and a retry of the initial data cable initialization will allow downloads to work,
the initial issue is still there that is something causing the initial initialization to fail.

Background Information and Detailed Information of the issues:
=======================================

=== Read timeouts are WAY too long.


The read timeout issue was due to some poor code in serial_ftdi.c in the
function serial_ftdi_read()
The code was spinning in a loop doing a read and then sleeping 1ms at a
time and when the total amount of 1ms sleeps done was beyond the
timeout it would exit with a timeout error.
That methodology simply doesn't work as it does not account for the
time of all the other code running.
And in fact the overhead of the other code is quite significant.
I saw that ftdi_read_data() can go away for as long as 40ms+.
But it averaged about 15ms so with the 1ms sleep each poll for data was
16ms, so the timeout would be exactly 16x longer than normal.
Since the normal timeout was 3000ms this resulted in a read timeout of
48 seconds.
Also, the pelagic code does 2 retries on commands so that increases the
overall timeout from what should be 9 seconds to 144 seconds.
This was fixed by refactoring the code to use gettimeofday() and
returning with a timeout error when the elapsed time is equal to the
timeout.
This code fix will resolve this timeout issue for ALL the dive
computers that use ftdi serial devices as the existing code breaks
timeouts for any code that uses serial_ftdi_read()


== Communication/Downloads not working

There are different types of exception/error recovery that must be employed to have robust recovery.

This is do to the design of the Pelagic data cable and the primitive interface with the dive computer.
There is a PIC inside the data cable.
The host talks to the PIC over the USB through the FTDI chip and the PIC talks to the dive computer.
Some of the commands sent to the PIC do not involve the dive computer but most do.
In some cases there are specific timings and control signals like RTS that must be wiggled in order to get the
attention of the PIC. If not done correctly, the PIC will not communicate.

Compounding matters is that over the past couple of decades I've seen that certain OSes or FTDI drivers have screwed up DTR and RTS signal control.
In some cases they have them reversed and in some cases DTR is not handled properly at open/close time - regardless of OS or driver settings.
Some of this may be due to the ability of the FTDI chip itself to be able to map and remap it's i/o pins used to control signals like DTR and RTS.
This can make it difficult to write host application code that can properly control something like a RTS signal.

Anyway, the RTS signal is used to reset the PIC in the Oceanic / Aeris data cables.

In terms of libdivecomputer pelagic data cable code, from code inspection I have noticed that the Pelagic data cable code is not doing certain things that would make it robust enough
to recover from various types of issues, so attempted retries from certain types of issues will be unable to recover.

>From the debug logs, the android host code is not receiving a response to its initial command to query the PIC inside the data cable.
>From using a h/w simulator with a standard FTDI cable, it could be seen that the phone was able to send and receive messages.
However my simulator does work exactly like the PIC inside the data cable.
The way the data cable works, if the PIC is not in the proper state it will not respond to this initial command, and  the only way to get it to respond is to reset it using the RTS signal.

The current libdivecomputer oceanic code does not toggle RTS so it will not reset the PIC to allow everything to start from a clean known state.
So if the PIC is not in its initial state, it will not respond and the libdivecomputer code will never recover.

When I added code to reset the PIC to the oceanic code, I was able to download dives; however there is still an issue.
(Note: there are other things that could and should also be done for more robustness, but this one is the main one)


While communication and downloads are now working with some added code teaks that
resets the PIC inside the data cable to start the data cable & dive
computer initialization from a known clean state,
the first attempt to download after plugging the data cable into the
phone is failing but if you press the retry button in the app, it will
always work.
After this first failure, it will always work, even on the first
attempt after you fully exit the app and restart it - as long as you
don't unplug the data cable from the phone.

This one may be tricky to find since there appears to be something that
is done to the FTDI state/connection or the download cable that remains
in place
even after the application exits and is restarted.

And the linux code uses a serial drain function that pushes out any tx
data that might be buffered, that is not available to the android code
using the serial_ftdi stuff.
That may be making a difference as there are many modes and data
queuing timeouts available in the FTDI chipset f/w and not being able
to call a function to push out the data may change some timing or cause
things to get stuck inside the FTDI chip for
longer than expected which can also change some timing.
And at this stage timing is important as you must wait certain amounts of time when wiggling RTS.



More information about the subsurface mailing list