Log Integrity: How SIEMs Address the Issue and Is It Enough?

Log Integrity Capabilities of SIEMs

Log integrity and non-repudiation are key properties of audit logs. As SIEMs are usually the way to collect audit logs (among many other things) in large organizations, we have to make sure they give us those properties. We have discussed previously that it’s not necessarily the case, unfortunately.

In this article, we will provide a more structured assessment of the log integrity/tamper protection features of popular SIEMs. We will review IBM QRadar, Splunk, ArcSight, McAfee, RSA NetWitness, AlienVault, SolarWinds SIEM, and LogRhythm. We have combined our research with that of CISOPlatform to give a good overview of the issue.

Log Integrity Protection Assessment

SIEM

LOG INTEGRITY PROTECTION

PROTECTION METHOD

SOURCE OF INFORMATION

IBM QRadarYes*File hashingQRadar admin guide, p.118
SplunkYes*Bucket hashingSplunk data integrity control
McAfee SIEMYes*File hashingELM Log integrity
ArcSightYes*Log hashingArcSight WhitePaper
RSA NetWitnessNo CISOPlatform
LogRhythmYes*File hashingCommonCriteria portal
SolarWinds SIEMNo CISOPlatform
AlienVault USMYes*Log and block singingUSM documentation
NetIQ SentinelYes*Log signingNetIQ Product Brief

Log Integrity Assessment Approaches

There are three approaches here – hashing of files (or blocks/chunks/buckets), signing of logs (or blocks), and relying on organizational/segregation approaches. Let’s discuss why those solve only part of the problem (and why there’s an asterisk after each “Yes” above):

  • Hashing – hashing of files or chunks of files (depending on the internal representation of data) is the minimum that should be available. It is not turned on by default anywhere, probably because for many of the logs connected data integrity is not of paramount importance. But if you use the same tool to store audit logs, that’s a different story. Even when turned on, this does not make sure your data is not tampered with. It only ensures that the log data hash matches the stored hash. The hash is usually stored in either a separate file or in a database. Tampering with that separate file or database is still easy, and regenerating a new hash after the original data is tampered with is also easy (some of the solutions even provide tools for regenerating the hashes so an attacker/insider doesn’t even have to reverse engineer anything). And while modifying logs may require hash regeneration, deleting logs doesn’t even have to do that (with some exceptions, where multiple hashes are hashed together). If you delete a file/block/chunk and delete the corresponding hash, there’s no way to tell if there was something there. In some cases sequence numbers may be used to address that, but these sequence numbers can also be modified or fake files with the right sequence number can be generated and hashed. Furthermore, the hashes don’t give you any non-repudiation because you can’t prove to anyone that the logs have not been tampered with by providing a hash of the current logs.
  • Signing – signing is slightly better, as it requires access to a private key, and that access can be further protected. Signing, however, suffers from similar issues as hashing – you can delete a signed chunk altogether, and then nobody will know there was something there in the first place. Sequence numbers can be spoofed – e.g. delete an old file/chunk, and create a fake one in its place with its sequence number. Re-signing is possible (although harder than rehashing). The signing also requires additional infrastructure (PKI), and ultimately hardware (HSM) to store the private keys. Managing the access to the connected HSM for the log signing process is also not trivial, and if signing is implemented by just throwing a key somewhere where the log collector has easy, unaudited access, then signing is no better than hashing. It does, however, give some non-repudiation – only one with access to the private key could have signed the logs.
  • Organizational – some vendors rely on the fact that their solutions are “in the cloud” and therefore the organization staff does not have access to the underlying infrastructure that would give them access to modify logs. Other variations of that exist in the form of “chain of custody”, where multiple people have to approve or monitor access to the infrastructure where logs are stored. These organizational measures should exist, but they do not give non-repudiation (how do you prove that your procedures were followed) and do not protect from coordinated insider attacks. The “cloud” is certainly a strong argument, and many audits (e.g. PCI-DSS) have passed just on that ground, but the same issues above are faced by the cloud provider. And if you can prove somehow that your organization has followed the procedures that exist on paper, it’s harder to do that for a 3rd party.

Instead, we at LogSentinel employ what the scientific literature recommends:

Hash chaining

Every log entry has a hash of itself and the hash of the previous log entry. That way they form a chain that cannot be disrupted. You can’t delete something and then replace it with something else without anyone noticing.

Merkle trees

Log entries also form a Merkle tree, where proofs to third parties can be provided, based on previously witnessed data (see anchoring)

Trusted timestamping

Groups of logs are timestamped. This is better than signing, as in addition to the signature, it includes the signed time of the signing process, i.e. you can’t backdate entries unless you compromise the TSA time. And we support qualified TSAs, which apart from being third parties, have the obligation to store the hashes that were signed, serving as anchors (see below)

Anchoring

We anchor the latest hash in the chain as well as the latest Merkle root periodically to an external source. It can be a public blockchain (Ethereum), paper stored in a safe, AWS Glacier, a QTSP (qualified trust service provider), or an email sent to recipients using multiple email providers. All of those are ways to distribute the trust in the anchor so that nobody can tamper with it. If you have a protected anchored value, you can prove the integrity of your logs as well.

Log Integrity Challenges

Doing all of that is complicated by the fact that it requires serializing all events in large setups, where data flows constantly and concurrently. Deciding the right time to anchor something publicly or to timestamp it also complicates things. We believe we have made the right set of design and implementation decisions to make all these techniques work towards solving all log integrity issues.

Are the other approaches “good enough”? If the audit logs are non-critical – probably. If getting them tampered with cannot have serious negative effects on the organization – probably. But we bet no CISO, or CEO for that matter, wants to view their audit log as non-critical. So partial measures are rarely “good enough”.

We’ll finish by summarizing the different approaches and what they give you:

Log Integrity Approaches: Summary

TECHNIQUEMODIFICATION PROTECTIONDELETION PROTECTIONNON-REPUDIATION
HashingPartialNoNo
SigningPartialNoPartial
Hash chainingYes (eventual)YesNo
Secure timestampingYesNoYes
Merkle trees with public anchorYes (eventual)YesYes

Using the right tools for the job is important. SIEMs are the right tools for the job of log monitoring, correlation, and detecting potential attacks. We have carefully designed SentinelTrails to be the right tool for the job of protecting the audit logs of mission-critical systems.