Hashing videos

Berthold Stoeger bstoeger at mail.tuwien.ac.at
Wed May 23 06:23:36 PDT 2018


Hi Robert,

On Mittwoch, 23. Mai 2018 10:01:20 CEST Robert Helling wrote:
> Hi,
> 
> > On 23. May 2018, at 07:22, Berthold Stoeger <bstoeger at mail.tuwien.ac.at>
> > wrote:
> > 
> > 1) They have the same filename (modulo path and case)
> > 2) They have the same length
> > 3) They have the same meta-data in the case of JPEG
> > Finding two different pictures fulfilling 1-3 must be very bad luck. We
> > currently don't store file-length, but that can be trivially rectified
> > when
> > opening an old log.
> 
> we originally introduced the hashing to make the „find images“ thing
> possible so you don’t have to preserve paths (and filename conventions)
> between different computers. On the other hand, we want to notice when the
> user changed the image (for example by photoshopping, so I guess we have to
> take the content into account).

Under which circumstances do we note that the file changed? The only way I 
currently know of is when the thumbnail is recalculated.

> So my choice would be: Completely ignore filename and path, but maybe take
> into account length and creation date. I don’t have a lot of experience but
> why not hash 1MB of data after seeking to 30% of file size? I would guess
> that is a pretty good test. Or maybe there is an easy way to take internal
> meta date into account as well?

I fear that any such change would not be backwards-compatible with the current 
hashes. What we could do is for <10 MB files hash all and for >10 MB hash 
filesize + metadata or some such scheme. I hope the <10 MB rule would catch 
nearly all current pictures (we're currently not supporting RAW images, are 
we?).

I think a combination of file-length + meta-data would in principle be good 
enough for most cases. For PNGs we already get the created time-stamp as a 
replacement for the missing metadata. But unfortunately, I was wrong in a 
previous mail: We're currently not saving the metadata timestamps - we only 
save an "offset", which may be changed by drag&dropping to the profile. :(

One fundamental problem with the metadata is of course that we might change 
the metadata extractor in the future to e.g. support XMP, which would 
invalidate all old stored metadata.

Dirk, any opinion?

Berthold


More information about the subsurface mailing list