Preventing Various Types of Data Breaches

Data breaches happen practically every day. Personal, including financial and medical data leak to cyber criminals as well as intelligence agencies. Some notable breaches include the Equifax breach, where dozens of personal data fields were leaked, and the recently announced Marriott breach, where passports, credit cards and locations of people at a given time were breached.

Based on our GDPR expertise, we released SentinelDB – a privacy by design database.  It can help prevent various types of data breaches, but for the tech savvy organizations we decided to classify the types of data breaches and give recommendations on how they can be addressed, including with the help of SentinelDB. We don’t always get to know how exactly the popular breaches happen, but from what is published in news articles and post-mortems, we can have a good overview on the breach landscape.

Data Breach Prevention: Types

Control over target server

If an attacker is able to connect to a target server and gains full or partial control over it, they can do anything, including running SELECT * FROM ... , copying files, etc. How do attackers gain such control? In many ways, most notably RCE (remote code execution) vulnerabilities and weak admin authentication.

How to prevent it?

Follow best security practices – regularly update libraries and software to get security patches, do not run native commands from within the application layer, open only necessary ports (80 and 443) to the outside world, configure 2-factor authentication for administrator login. Aim at having an intrusion detection / prevention system. Encrypt your data, and make the encryption as granular as possible for the most sensitive data (e.g. for SentinelDB we utilize per-record encryption) to avoid SELECT * breaches.

How does SentinelDB prevent it?

With SentinelDB there’s no way to do SELECT *, we prevent large dumps of data, including with the use of anomaly detection. Even if someone gets the raw data, it is encrypted per record, and the keys form a hierarchy that ultimately goes to a hardware-protected master key. Since it’s a cloud datastore, we protect our infrastructure with all possible measures, including the ones listed in the paragraph above.

SQL injections

SQL injections are a rookie mistake that unfortunately still happens. It allows attackers to manipulate your SQL queries and inject custom bits in them that allows them to extract more data than they are supposed to.

How to prevent it?

Use prepared statements for your queries. Never ever concatenate user input in order to construct queries. Run regular code reviews and use code inspection tools to catch such instances.

How does SentinelDB prevent it?

We support a limited subset of the SQL syntax that doesn’t allow subqueries. Additionally, we scan the parameters for potential injections, and also provide prepared-statement like endpoints.

Unencrypted backups

The main system may be well protected, but attackers are usually after the weak spots. Storing backups might be such – if you store unencrypted backups that are accessible via weak authentication (e.g. over FTP via username/password), then someone may try to attack this weaker spot. Even if the backup is encrypted, the key can be placed alongside it, which makes the encryption practically useless.

How to prevent it?

Encrypt you backups, store them in a way that’s as strongly protected as your servers (e.g. 2FA, internal-network/VPN only), and have your decryption key in a hardware security module (or equivalent, e.g. AWS KMS).

How does SentinelDB prevent it?

We take care of the backups, they are automatically encrypted. Additionally, even if a backup is somehow obtained and decrypted, the data inside won’t be readable because of our per-record encryption.

Personal data in logs

Another weak spot other than the backups may be your logs. They usually lie on separate servers, and are not as well guarded. That’s usually okay, since logs don’t contain personal information, but sometimes they do. We recently discovered a large company’s website that had their directory structure unprotected and they kept their access logs files alongside their static resources. In addition to that, they passed personal information as GET parameters, so you could get a lot of information by just getting the access logs. Needless to say, we did a responsible disclosure and the issue was fixed, but it was a potential breach.

How to prevent it?

Don’t store personal information in logs. Avoid submitting forms with a GET method. Regularly review the code to check whether personal data is not logged. Make sure your logs are stored in a way as protected as your production servers and your backups. It could be a cloud service, it could be a local installation of an open source package, but don’t overlook the security of the log collection system.

How does SentinelDB prevent it?

It doesn’t prevent it per se, as the responsibility for holding proper logs lies in the application layer. On our end we don’t log personal data. Our other product, SentinelTrails, does have ways to protect logs, including client-side encryption, but regular review of your logs is still necessary.

Data pushed to unprotected storage

А recent Alteryx/Experian leak was just that – data placed on a (somewhat) public S3 bucket was breached. If you place personal data in weakly protected public stores (AWS S3, file sharing services, FTPs), then you are waiting for trouble to happen.

How to prevent it?

Don’t put personal data publicly. How to prevent that from happening – always review your S3 buckets and FTP servers policies. Have internal procedures that disallow sharing personal data without protecting it with at least a password shared by a side-channel (messenger/sms).

How does SentinelDB prevent it?

At first it may seem that a datastore can’t do anything about putting bulks of data in unprotected storage. However, SentinelDB tackles the underlying problem. The need to share bulks of personal data with third parties exists and no matter how secure your database is, you’ll eventually need to share some of the data with third parties. This is why we support pseudonymization (a term used explicitly in GDPR). It allows you to export batches of data by stripping or replacing identifying data. That way you’ll end up with data that’s useful for sharing but does not expose users. Of course, if the goal is to share the identifying data itself, then think twice if that’s a good idea. SentinelDB will try prevent the extraction as much as possible, though.

Unrestricted API calls

That’s what caused the Facebook-Cambridge Analytics issue. No matter how secure your servers are, if you expose the data through your API without access restriction, rate-limiting, fraud-detection, audit trail, then your security is no use – someone will “scrape” your data through the API.

How to prevent it?

Do not expose too much personal data over public or easily accessible APIs. Vet API users and inform your users whenever their data is being shared with third parties, via API or otherwise.

How does SentinelDB prevent it?

SentinelDB tries to identify breaching patterns and block certain queries. So even if your API does not take the necessary measures, SentinelDB will be a further hindrance to leaks. Of course, it’s always best to have the right protection mechanisms on all layers.

Internal actor

All of the woes above can happen due to poor security or due to internal actors. Even if your network is well guarded, an admin can go rogue and leak the data. For many reasons, including financial. A privileged internal actor has access to perform SELECT *, can decrypt the backups, can pretend to be a trusted API partner.

How to prevent it?

Good operational security. A single sentence like that may sound easy, but it’s not. I don’t have a full list of things that have to be in place to guard against internal breaches – there are technical, organizational and legal measures to be taken. Have unmodifiable audit trail. Have your intrusion prevention system (or logging solution) also detect anomalous internal behaviour. Have procedures that require two admins to work together in order to log in (e.g. split key) to the most critical systems. If the data is sensitive, do background checks on the privileged admins. And many more things that fall under the “operational security” umbrella.

How does SentinelDB prevent it?

We utilize our blockchain-enabled audit trail to track data access and modifications in a way that cannot be modified even by privileged actors. This serves as a deterrent to internal actors, and in combination with blocking large dumps of data or suspicious patterns, we reduce the risk of internal actors extracting and leaking large amounts of data from the datastore.

Man-in-the-middle attacks

MITM can be used to extract data from active users only. It works on website without HTTPS, or in case the attacker has somehow installed a wildcard certificate on the target machine (and before you say that’s too unlikely – it happens way too often to be ignored). In case of a successful MITM attack, the attacker can extract all data that’s being transferred.

How to prevent it?

First – use HTTPS. Always. Redirect HTTP to HTTPS. Use HSTS. Use certificate pinning if you control the updates of the application (e.g. through an app store). The root certificate attack unfortunately cannot be circumvented. Sorry, just hope that your users haven’t installed such problematic software. Fortunately, this won’t lead to massive breaches, only data of active users that are being targeted may leak.

How does SentinelDB prevent it?

MITM attacks happen between the users and your system, so the datastore can’t do much about it. What it can do is have an audit trail that can later be used in court proceedings.

JavaScript injection / XSS

If somehow an attacker can inject javascript into your website, they can collect data being entered. This is what happened in the recent British Airways breach. There was a potential attack on NSW (Australia) elections, where the piwik analytics script was loaded from an external server that was vulnerable to a TLS downgrade attack which allowed an attacker to replace the script and thus interfere with the election registration website.

How to prevent it?

Follow the XSS protection cheat sheet by OWASP. Don’t include scripts from dodgy third party domains. Make sure third party domains, including CDNs, have a good security level (e.g. run Qualys SSL test).

How does SentinelDB prevent it?

Since data is captured before it even reaches the datastore, SentinelDB does not address this particular type of issue.

Leaked passwords from other websites

One of the issues with incorrect storage of passwords is password reuse. Even if you store passwords properly (e.g. using bcrypt/scrypt), a random online store may not and if your users use the same email and password there, an attacker may try to steal their data from your site. Not all accounts will be compromised, but the more popular your service is, the more accounts will be affected.

How to prevent it?

There’s not much you can do to make other websites store passwords correctly. But you can encourage the use of pass phrases , you can encourage 2-factor authentication in case of sensitive data, or you can avoid having passwords at all and use an external OAuth/OpenID provider (this has its own issues, but they may be smaller than those of password reuse). Also have some rate-limiting in place so that a single IP (or an IP range) is not able to try and access many accounts consecutively.

How does SentinelDB prevent it?

We support 2-factor authentication enrollment. That way at least the most sensitive data can be protected from leaked passwords. We also support custom password rules.

Employees sending emails with unprotected excel sheets

Especially non-technical organizations and non-technical employees tend to just want to get their job done, so they may send large excel sheets with personal data to colleagues or partners in other companies. Then once someone’s email account or server is breached, the data gets breached as well.

How to prevent it?

Have internal procedures against sending personal data in excel sheets, or at least have people zip them and send passwords through a side channel (messenger/sms). You can have an organization-wide software that scans outgoing emails for attachments with excel sheets that contain personal data and have these email blocked.

How does SentinelDB prevent it?

We covered pseudonymization in a previous section, and it can be utilized in this case as well – if you have to allow exports of excel sheets, make sure that they do not contain identifying data. In SentinelDB this means querying the database with pseudonymization turned on.


Data breaches are prevented by having good information security. But information security is hard. And it’s the right combination of security practices and security products that minimizes the risk of incidents. We think that SentinelDB is one such product that can save a lot of time and effort on covering many aspects of data protection.  Feel free to register or to read our documentation to learn more about how we can help you protect personal data and be compliant with GDPR and other data protection regulations.

Learn More About SentinelDB