Hash functions are at the core of many, many technologies, including ours. We rely on SHA-512 to create a hash chain to guarantee the integrity of logs.
Hash functions have one very important property – collision resistance. If two different inputs generate the same hash, it may mean the hash function is broken. It may also mean that someone got lucky and found an accidental collision in which case the hash function is probably fine (collision resistance does not mean no collisions exist, it only means they are extremely hard to find, let alone find deliberately).
Today we were investigating an issue with one of our clients – they were trying to verify whether a hash is present in the hash chain, but got an error instead. We tracked the error to our core data access layer and realized that the
getSingleResult(..) method would purposefully fail if there was more than one result fetched.
We have obviously used the
getSingleResult(..) method for finding entries based on their hash because despite the fact that we can ingest millions of log entries, we anticipated that hashes will always be unique.
It turns out that wasn’t the case. Two records were found to have exactly the same hash. An important note – this doesn’t seem to be reproducible behaviour, but it might have to be looked into, as every collision is a potential problem.
The hash used was
The two messages that produced it were:
April Fool’s day joke
No, we didn’t find a collision
Bozhidar Bozhanov is a senior software engineer and solution architect with 15 years of experience in the software industry. Bozhidar has been a speaker at numerous conferences and is among the popular bloggers and influencers in the technical field. He’s also a former government advisor on e-government, transparency, and information security.