A couple of days ago, I was sitting at a particularly good Italian restaurant around Pier 39 in San Francisco with a friend. A typical day in the city, it was a tumultuous blend of fog, chilly winds and the general pulse which gives San Francisco its character. As we ate some excellent Calamari and Risotto – our conversation ranged over some very interesting developments in the bay area's startup scene. We were talking about various ideas which were being thrown around, and analyzing to an extent their relative pros and cons.. and somehow, turned to the subject of national security and the internet's role in weakening it. Which is when the unbelievable, yet completely plausible statement tumbled out.
The NSA in the US is one of the most, if not THE most, sophisticated spy agencies in the world. And like any other organization, they have reasons to use the Internet to not only gather information but also communicate with agents and field offices around the world. What is interesting is that NSA employees searching for information on any of the major search engines on the web – Google, Yahoo or what have you MUST leave traces of their original Internet addresses with these. And anyone with half a mind could easily piece together what information was being searched for! But it is inconceivable that an organization such as this would be amateur enough to let this happen. But how would they solve it?
The answer – replicate the internet. That statement may contain 3 words, but it is probably one of the very few today which non-trivially encapsulate millions of hours of work in it, not to mention huge cost incursions. Which led me to the inevitable question – can even the NSA with a huge chunk of the US defence budget have the resources to carry off what would arguably be the biggest project in modern technological times? I thought I'd do some research to see if this was even possible. We know that the largest information repository in the world today – publicly known anyway – is the one at Google. Considering their search engine is the most effective at piecing together even those parts of the web which are very remotely connected to other parts. One option the NSA has is to make a mirror of sorts of the google search engine on a local intranet, and then search this index locally to locate the resource needed. This way, they can have their addresses show up only at the particular resources, and those too can be covered by the use of hacks and other pieces of technology which aims at doing this. This if of course assuming that they don't already have a wide array of IP addresses meant to throw off any tracking agents for exactly this reason!
The most impressive method of course turns out to be their possible replication of the web itself. Given their resourcefulness, replicating each of the 13 root servers, compressing all the data on the known web into a single database, and then developing technologies to search these in real time is the most effective. The ramifications of such an act of course, are staggering. Not only do they have the largest storehouse of human knowledge to ever have existed at their fingertips, they are probably the only ones who can come close or surpass the kind of work which google is doing right now. Which is saying a lot.