Mining the Dark Web: Drugs and Fake IDs Andres Baravalle Mauro Sanchez Lopez Outline

Synopsis and introduction Surface web, deep web and dark web Dark markets Going undercover in Agora Results! What now? Synopsis Within the last years, governmental bodies have been futilely trying to fight against dark web hosted marketplaces. Shortly after the closing of The Silk Road by the FBI and Europol in 2013, new successors have been established. Through the combination of

cryptocurrencies and nonstandard communication protocols and tools, agents can anonymously trade in a marketplace for illegal items without leaving any record. Research was carried out to gain insights on the products and services sold within one of the larger marketplaces for drugs, fake ids and weapons on the Internet, Agora, and on new developments after the demise of Agora. Timeline Timeline: April 2015: Inception & funding request June 2015 September 2015: Data collection

September 2015 April 2016: Data analysis July August 2016: Writing up September 2016: Press release, and front page on the Time! The team: Dr Andres Baravalle, lead researcher Dr Sin Wee Lee, researcher Germans Zaharovs, research intern (data collection) Mauro Sanchez Lopez, final year project (data analysis) Media reception Surface web, deep web and dark web

Research on the size of the Internet shows that its size (in term of hosts) has reached 1.05 billion hosts in early 2016 (; about 3.5 billion users have now access to the Internet. The surface web includes resources indexed by search engines and made publicly available.

Regardless of the effort done by these search engines in order to index more content, some of the contents available on the internet are yet not indexed. Thats what we call the deep web. Bergman (2001) estimated the deep web to be 400 to 550 times larger than the content on the surface. Under the deep web, we can find the dark web, the back alley of the Internet. The dark web - a definition We can define the Dark Web as "a collection of websites that are publicly visible, but hide the IP addresses of the servers that

run them" (Egan, 2016). These web sites can be visited by users, but it is hard to identify where they are hosted and who hosts them. Hidden behind encryption protocols typically either Tor (The Onion Routing) or I2P (Invisible Internet Project). While the expression "dark web" as we intend it today is relatively recent, the concepts around dark web have been under investigation since the early 2000s. The concept for example comes up in several works by Chen, H. et el. around a "Terrorist Knowledge Portal" (cited in Oman, 2004).

Dark web crypto currencies and anonymized access The Dark Web usually relies on the combination of crypto currencies such as bitcoins and anonymized access as the foundations in creating a market place for dealing illegal drugs, weapons and other illegal contrabands. In recent years, the Dark Web has been in extreme scrutiny and investigations from legal authorities around the globe. 2015 estimates put the size of the dark web to 7,000-30,000 sites

Dark web markets timeline: Silk Road and Post Silk Road eras February 2011 February 2013: The Silk Road. Considered the first Dark Web hosted black market e-commerce platform. Any user could register anonymously to buy or sell goods with Bitcoins as currency driver. February 2013: FBI and Interpol operation against The Silk Road. February 2013 November 2014: Post Silk road era. Several market places, amongst which were Evolution, Hydra and The Silk Road 2.0. November 2014: Europol and FBI seize the vast majority

of them during Operation Onymous. The Silk Road and Post Silk Road eras are characterised by the fact that the police managed to shut down the markets. Frostys got a problem with his PHP code The rise of Agora and the customer is always right February 2014 September 2015: The rise of Agora. One platform remained after operation Onymous: Agora. With no competition ahead, Agora became king of the Dark Net. Agora closed possibly

because of vulnerabilities in Tor (or not) September 2015 - now: The customer is always right. 90+ markets. Alphabay supports reputation, multisig transactions, coin tumbling and Monero and its nearly 20 times the size of Agora at its best. What was Agora? Agora was a portal selling both products and services, with a minimal set of rules. At the time of our research the only items that couldnt be sold were body parts, and the only service that was

forbidden to sell was assassination. In the final weeks (and before we completed our spidering), weapons were also forbidden Agora changed host and domain name several times in an attempt to avoid cyber-crime law enforcers over its almost two years of existence. One of the instances of this marketplace is the subject of our work (agorahooawayyfoe.onion). Privacy and money As for all black market operations, operations on Agora were not taxed, neither directly nor

indirectly. Agora offered sellers the possibility for sellers to place products that could not be typically sold legally. The key aspects of Agora are largely similar to the ones of other illegal operations: privacy protection, exchange of money, illicit profits. Mining data from Agora Agora was invite-only - so access to the market place required first of all digging for an invite.

Then we had to have several sessions on the web site, to be able to inspect the interaction with the web site. Finally, we were able to create human-like sessions with our software to proceed with the data collection. The application used for collection has been built on a classic LAMP (Linux, Apache, MySQL, PHP) stack for data collection and a variety of languages for data analysis. TOR proxy running; thanks to Frosty (Silk Road) for some hints! The miner was developed using command line PHP (and the cURL library) and an object oriented approach, using MySQL as a backend

Security protections what Agora could have done Protection of their business model in general, and specifically assets is something that Agora's team very much considered, but the techniques used by the team were neither advanced nor seemed to show awareness of the developments of the last few years. There is extensive research on techniques to discourage web scraping; the most common ones include:

Turing tests User-agent identification Throttling of HTTPD requests Obfuscation Data tainting Injecting markers Network traffic analysis

And what Agora did Turing tests (CAPTCHA) and user-agent identification were implemented at the time we started our work Network traffic analysis was most likely introduced later In time, the web site administrators might have realized that data mining was in progress as extra layers of protection were added: geolocation, session expiration and session management were added after we started the monitoring and before the closure. What could they have done? Much more

Dilbert vs Agora I Around 2000 I developed a software to spider Dilberts web site (and then a few hundred others), to automatically download the daily comic strip. To some extent, the anti-spidering protection on Dilberts web site was more advanced. Analysing the data The analysis of the data has been carried with several tools - including Weka and ad-hoc Java and Python scripts

Libraries such as Pandas, Numpy, NLTK and MatPlotLib have been used for the analysis, integrated within a Jupyter notebook What did we find? Over 30,000 products on sale, mostly drugs and IDs, worth at least 170691.12 BitCoins (26 million). A staggering 1,233 sellers spread across 20 countries, with the largest number located in the USA and UK. 90% of the market was dominated by the largest 10% of sellers, with 80% of the market share going to the selling and purchase of drugs.

The highest number of drug sellers were from the USA (388), Australia (138) and the UK (137), while top countries by market size were Germany (7.8 million), USA (6.06 million) and Netherlands (2.9 million). 80% of Agora was drugs 80% of the market was drugs One seller, RADICALRX, was offering a cache of 10 million pounds worth of drugs, including Hydromorphone, Oxycodone, Fentanyl and Meth. A US-based seller, HonestCocaine, boasted 1.24 million worth of cocaine for

sale. Geographical distribution The drugs market is dominated by suppliers from US and UK, while sellers from China lives up to the stereotype and focus on watches and clothing (most likely counterfeit products). Counterfeit documents The total size of the market was ~ 3,700 bitcoins about 650,000

at the time of our research (~ 2.6% of the market) During our research, 84 scans and photos of passports were on sale, with 12 physical passports also being offered A physical UK passport can be bought as cheaply as 752, while scanned passports can be purchased for as little as 7, and can be bought in bulk Counterfeit identity cards can be bought for as cheap as 142 for an European id card and even cheaper for US state id cards, with prices ranging between 25 and 92 Counterfeit documents driving licenses US driving licenses ranged between 51-300;

prices for European driving license were slightly more expensive, up to 419 but more impressively, in one of the listings, the vendor claimed that the license sold would be registered officially Organised crime We wanted to try to understand first of all how concentrated was the supply within the different vendors, and then if there were any existing patterns that would manifest that the supply was operated by well-coordinated organizations instead of individuals.

Over 90% of the market is dominated by the largest 10% vendors. When looking at the hashish category, the mean amount on sale is 47g, with a median of 10g, but with some sellers selling up to 1 kg at the time. This is a reasonable indicator that organized crime is involved. Finally, our research indicates that there was some use of sockpuppets and we want to look at this more in depth How do we know? Image analysis, for starters but also NLP analysis (to complete) Organised crime not teenagers in basements Entities as RADICALRX have over 10

million dollars of product on sale on Agora over the time of our study. This is hardly teenagers in basements the scale is the one of organized crime. How did it start? About 18 months ago I went to a data science workshop organised by Outreach Digital, 3 Steps To Growth Hacking with Data (using Amongst the stuff she presented, was some research by her colleagues at, relating to

the contribution of prostitution to UKs economy. Andrew Fogg presented this work at Data Summit in San Francisco. According to Andrew Fogg, the Office of National Statistics in UK estimates (5.314bn, 0,4% of the GDP), are completely off the mark. His estimate is that contribution it is really closer to 0.6% of the GDP the difference due to methodological errors in the government analysis and due to the fact that they didnt count male prostitution. Thats when I decided that I was going to look at drugs! Conclusions

Over 170691.12 BitCoins (about 26 million) of merchandise where on sale on the period under examination. Over 30,000 products were on sale; 1233 sellers participated in the market, spread across 20 countries, with the largest number located in the US and UK. Drugs, ids and also weapons were readily available in a trans-national marketplace, just one click away and anonymously. When it comes to counterfeit documents, any EU ID card would allow the

potential buyer to travel through any country in the EU, open bank accounts and in general create a new identity for himself/herself. While we didnt manage to collect any data on weapons as they were removed from the market early on Black market services are working very cautiously, implementing security measures and hacker avoidance updates regularly. They are largely dominated by organized crime, and they keep resurfacing regardless of the efforts made to shut them down. Whats next?

A more generalised architecture The other 20% Sharing the data Legal highs: surface web and dark web The role of organised crime Whats new in the dark web? Looking at other datasets

