Website Fingerprinting Defenses for Tor Hidden Services

We have recently presented ALPaCA, the first server-side defense against website fingerprinting attacks. The paper, titled “Website Fingerprinting Defenses at the Application Layer”, will appear in PoPETs 2017 and is the result of a collaboration between University College London, Royal Holloway and COSIC.

Website fingerprinting is a traffic analysis attack that allows anybody who is able to observe your network traffic to infer which web pages you are visiting, even if your communications are encrypted and anonymised. Web traffic is exposed to a wide range of entities as it travels through the Internet: from the IT admins of your local network, to Internet Service Providers (ISP) and Autonomous Systems that route the network traffic to its destination. All these intermediaries are in position to deploy website fingerprinting attacks.

In the case of traffic encrypted with HTTPS, the protocol does not hide the IP address of the web server that the victim is browsing, so the adversary knows the site even if he does not know exactly which page was accessed. To find which specific page the user is visiting, the adversary downloads all the pages within the site and builds a database of traffic templates with distinguishing features of their metadata such as timestamps, direction and bytes of the transferred network packets. The adversary then records the victim’s (encrypted) traffic and compares it to the traffic templates in its database. The adversary infers that the visited page is the one which traffic looks more similar to the victim’s.

The attack exploits traffic patterns that are unique to individual web pages. Each page within a site has different amounts of text, images of different sizes and other embedded content such as scripts and styles. For instance, if a page has large high-resolution images and lots of text, the total volume of the transmission can already distinguish this page from others. Several studies have shown that fingerprinting pages within a single website is very effective [1, 2].

Website fingerprinting is hampered if the victim uses Tor [3]. Tor is an anonymous communication system that routes communications through multiple proxies, hiding the web server’s identity from local eavesdroppers. Since the adversary does not know the website IP address, the search space cannot be narrowed down to one single website, thus forcing the adversary to identify a website within the set of all possible websites. Given that there are billions of websites (even more pages), it is not surprising that the attack does not scale to the size of the Web [4].

wf

However, recent studies have shown that hidden services (HS), which are anonymous web servers hosted in the Tor network, may be vulnerable to this attack. It has been shown that an adversary who controls the user’s ISP or the Tor entry node can distinguish between visits to hidden services and normal websites [5]. Since the number of HS websites is orders of magnitude smaller than the total number of websites, the adversary can substantially reduce the set of pages that have to be downloaded. At the same time, website fingerprinting is especially threatening for users of hidden services, as the anonymity provided by the Tor network makes them particularly suited for hosting sensitive content [6].

ALPaCA is a website fingerprinting countermeasure specifically designed for hidden services [7]. Its basic operation is to add random padding to change the observed size of the pages and their embedded resources. This modifies the fingerprint of the page so that it cannot be recognised by the adversary. To do so without altering the appearance of the website when displayed to the user, our defense uses various simple techniques such as padding the metadata section of images or adding HTML comments in the page sources.

ALPaCA is a defense running at the server-side – the first of its kind – that we hope will be implemented by hidden service operators to ensure the anonymity of their users. We are currently in contact with the Freedom of the Press Foundation to help implement ALPaCA in their Hidden Service SecureDrop [8], a tool that allows journalists to safely communicate with their sources.

The source code of ALPaCa is available for download in GitHub.

We have also set up a hidden service that is running ALPaCA for demonstration purposes (you will need Tor to access it): http://3tmaadslguc72xc2.onion.

Acknowledgements

Thanks to Claudia Diaz, Gunes Acar and Dana Brouckmans for their help in the writing of this blog post.

References

[1] Miller, B., Huang, L., Joseph, A. D., and Tygar, J. D. I know why you went to the clinic: risks and realization of HTTPS traffic analysis. In International Symposium on Privacy Enhancing Technologies Symposium (PETS), pp. 143-163, Springer International Publishing, 2014.

[2] Chen, S., Wang, R., Wang, X., and Zhang, K. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In 2010 IEEE Symposium on Security and Privacy, pp. 191-206, IEEE, 2010.

[3] https://www.torproject.org

[4] Panchenko, A., Lanze, F., Zinnen, A., Henze, M., Pennekamp, J., Wehrle, K., and Engel, T. Website Fingerprinting at Internet Scale. In Proceedings of the 23rd Internet Society (ISOC) Network and Distributed System Security Symposium (NDSS), pp. 1-15, IEEE Computer Society, 2016.

[5] Kwon, A., AlSabah, M., Lazar, D., Dacier, M., and Devadas, S. Circuit fingerprinting attacks: Passive deanonymization of tor hidden services. In 24th USENIX Security Symposium (USENIX), pp. 287-302, USENIX Association, 2015.

[6] https://en.wikipedia.org/wiki/List_of_Tor_hidden_services#News.2C_whistleblowing_and_archives_of_document_archives

[7] http://homes.esat.kuleuven.be/~mjuarezm/index_files/pdf/pets17.pdf

[8] https://securedrop.org

 

Thanks to Marc Juarez for writing this blog post!