Hashing videos

Lubomir I. Ivanov neolit123 at gmail.com
Tue May 22 14:06:03 PDT 2018


i believe Robert should comment on this as he originally wrote the
first implementation of the profile photos.

On 22 May 2018 at 23:45, Berthold Stoeger <bstoeger at mail.tuwien.ac.at> wrote:
> Dear all,
> I'm currently implementing addition of videos to the dive photos. This happens
> to be dog-slow, because we calculate hashes of the file contents. As you can
> imagine, addition of multiple videos with a few GB each is a major CPU hog.
> Granted, the UI stays responsive, since this is done in background threads.
> Nevertheless, it gives a bad impression if the CPUs run at 100% for a few
> minutes.
> What are we supposed to do? Hash only the first MB? That would unfortunately
> not be backwards-compatible. Do different things for images and videos? Sounds
> hard to get right.

i was thinking about running hashes on the thumbnails but that has a
couple of problems:
1) if Qt changes the backend of the code we use for thumbnail
generation the hashes would stop matching
2) thumbnail generation for videos would need to happen not for the
first frame but rather for an arbitrary point of the video timeline -
e.g. thumbnail at 30% length.
(that's actually a good generic way of doing it, instead of using
always the first frame)
but if two thumbnails for two videos happen to have exactly the same
frame at those 30% of the length (e.g. consider a black screen
transition), we risk generating the same hash for two different videos
for the same frame.

that on the other hand might not be ever possible for compressed
video, as the compression adds noise which would essentially generate
thumbnails with slightly different bytes, unless it's uncompressed RAW
video in which case the thumbnails would match perfectly and therefore
the hashes too.

i would still consider this as an option if we really need hashes and
we want them to be fast.

i guess the biggest question here is what are the hashes used for?
if they are used to skip the generation of thumbnails for already
existing media, then the above proposal is completely invalid.

> Or perhaps even remove the hashes? I found three users:
> 1) In git storage. This is unsupported afaik.
> 2) The "Find moved images" functionality. Perhaps searching for (case-
> insensitive?) filenames is enough? Or perhaps match by metadata?
> 3)  In current head it is also used for the thumbnail files, but this could be
> changed before doing the next release.

something like hashing the date/time + metadata is a good option too i guess.
depends on what we need a hash for.


More information about the subsurface mailing list