Due somewhat to recent political events here in the States, I've been binging out on literature revolving around anonymous messaging. Today was Presidents Day here in the US, so it seemed fitting to write down up my collected notes.
Fair warning: I’m not involved in anonymity research at all, so this is just a high-level, “outsiders” view of the research. I probably have a lot of the terms and definitions wrong :)
The problem is simple: encryption hides what you say, but with whom and what frequency you're communicating is just as important
E.g. your communication with the HR department may be encrypted, but the fact your communicating with HR is potentially just as revealing. If the communication revolves around abuse or leaks or whatever, you may not wish adversaries to know you are communicating with HR at all.
Anonymous messaging comes in a few flavors:
Onion routing networks like Tor. These systems take traffic, encrypt them with multiple layers of asymmetric encryption (onion encrypting), and pass it through relay nodes. At each relay, the node decrypts the contents with it's private key and forward the message to the next node in the circuit.
The successive decrypting means the message contents change at each stage and only the exit node sees the "plaintext". These networks allow relatively low-latency traffic.
Onion networks are susceptible to a number of attacks though. The major one is if an adversary owns the entry and exit node, they can correlate traffic to de-anonymize you.
Mix networks are similar to onion routing. Messages are encrypted and sent to a mix node. The mix node accumulates a number of messages, then sends them randomly to a set of peer mixing nodes. Those mixers do the same thing: accumulate messages, randomize and send out. It only requires one "honest" mix for the trust model to work, because one honest mixer will jumble up the messages such that adversaries are no longer sure where they came from.
Mix networks are also susceptible to correlation attacks, and also various ways to "tag" messages so that you can trace them through the honest mixes (replay attacks, delaying messages so they go to the next epoch, etc)
More like a sub-category of mix networks, these basically inject noise or provide some other type of masking behavior to slow down correlation attacks. E.g. each additional message between two clients adds a small amount of statistical information which over time will allow decloaking them, but the injected noise slows down the process and gives plausible deniability.
Computationally secure networks.
These use some kind of computational primitive to guarantee anonymity. One category leverages the Dining Cryptographer problem; all clients in a DC-net publish a message, which can only be decrypted once all messages are accumulated and combined. It is impossible to determine who sent which message, the best you can tell is that the client participated in that round of broadcasting.
Another class uses Private Information Retrieval, which is a crypto primitive that allows a client to request data from a database, without the database knowing what the client requested. If the clients are reading/writing to "shared mailboxes", the untrusted servers cannot tell which mailboxes are being accessed so the metadata is protected
These approaches offer very strong anonymity and often require only one honest player in the entire system, but use expensive crypto primitives and do not scale well. They are also susceptible to clients disconnecting, which allows correlation attacks, or DDOS attacks which bring the whole system down.
These offer strong anonymity to the receiver, less for the sender. Basically, if you broadcast to the entire network, anyone can read the message which protects the receiver.
Smarter derivations use analogies like buses traveling around the network (where the seats contain encrypted messages), taxies, trains, etc. Some use pre-planned routes while others are random.
These offer strong anonymitiy, but are difficult to scale due to network traffic requirements.
Anonymity is hard. :)
Every system has it’s trade-offs; high- vs low-latency, scalable or not, computationally expensive or cheap, etc. All methods have their attack vectors. And all require some amount of user “buy in” to be effective (e.g. systems that utilize rounds are useless if only two people are speaking each round), which indicates that all of these need a critical mass of user participation to be useful.
Seeing as the average citizen still views anonymity and whistleblowing/leaking with disdain, it’s likely that a sub-par system will triumph over a more secure system if it can appeal to users. In some sense, this describes Tor: not a perfect system at all, but has enough critical mass that it is still useful (albeit at a very small scale compared to something like Twitter).