Hashing videos

Berthold Stoeger bstoeger at mail.tuwien.ac.at
Tue May 22 22:22:19 PDT 2018


Hi Lubomir and Willem,

On Dienstag, 22. Mai 2018 23:06:03 CEST Lubomir I. Ivanov wrote:

> i was thinking about running hashes on the thumbnails but that has a
> couple of problems:

I think generating hashes of thumbnails is out of the question. Not only, as 
you note, may Qt's scaling algorithm change; extracting thumbnails from videos 
is at the moment not even supported. You can have different streams, embedded 
thumbnail(s), and other complexities. This is all very unstable.

> i guess the biggest question here is what are the hashes used for?
> if they are used to skip the generation of thumbnails for already
> existing media, then the above proposal is completely invalid.

Indeed, let's wait for Robert's assessment.
 
> > Or perhaps even remove the hashes? I found three users:
> > 1) In git storage. This is unsupported afaik.
> > 2) The "Find moved images" functionality. Perhaps searching for (case-
> > insensitive?) filenames is enough? Or perhaps match by metadata?
> > 3)  In current head it is also used for the thumbnail files, but this
> > could be changed before doing the next release.
> 
> something like hashing the date/time + metadata is a good option too i
> guess. depends on what we need a hash for.

We wouldn't even have to hash that, as we just store it unhashed. One scheme 
that came to mind (supposing the only point of the hashes is to find moved 
pictures): We consider two pictures as equivalent if
1) They have the same filename (modulo path and case)
2) They have the same length
3) They have the same meta-data in the case of JPEG
Finding two different pictures fulfilling 1-3 must be very bad luck. We 
currently don't store file-length, but that can be trivially rectified when 
opening an old log.

It would not find renamed pictures, but that also sounds like a case of "tough 
luck".

Berthold


More information about the subsurface mailing list