The government agency that brought us the Internet has now developed a powerful new search engine that is shedding light on the contents of the so-called deep Web.
Memex, which is being developed by 17 different contractor teams, aims to build a better map of Internet content and uncover patterns in online data that could help law enforcement officers and others. While early trials have focused on mapping the movements of human traffickers, the technology could one day be applied to investigative efforts such as counterterrorism, missing persons, disease response, and disaster relief.
Dan Kaufman, director of the information innovation office at DARPA, says Memex is all about making the unseen seen. "The Internet is much, much bigger than people think," DARPA program manager Chris White told "60 Minutes." "By some estimates Google, Microsoft Bing, and Yahoo only give us access to around 5 percent of the content on the Web."
Google and Bing produce results based on popularity and ranking, but Memex searches content typically ignored by commercial search engines, such as unstructured data, unlinked content, temporary pages that are removed before commercial search engines can crawl them, and chat forums. Regular search engines ignore this deep Web data because Web advertisers -- where browser companies make their money -- have no interest in it.
Memex also automates the mechanism of crawling the dark, or anonymous, Web where criminals conduct business. These hidden services pages, accessible only through the TOR anonymizing browser, typically operate under the radar of law enforcement selling illicit drugs and other contraband. Where it was once thought that dark Web activity consisted of 1,000 or so pages, White told Scientific American that there could be between 30,000 and 40,000 dark Web pages.
Until now it was hard to look at these sites in any systemic way. But Memex -- which Manhattan DA Cyrus Vance Jr. calls "Google search on steroids" -- not only indexes their content but analyzes it to uncover hidden relationships that could be useful to law enforcement.
DARPA's search tools were introduced to select law enforcement agencies last year, including Manhattan's new Human Trafficking Response Unit. Memex is now used in every human trafficking case it pursues and has played a role in generating at least 20 sex trafficking investigations. The supercharged Web crawler can identify relationships among different pieces of data and produces data maps that help investigators detect patterns.
In a demo for "60 Minutes," White showed how Memex is able to track the movement of traffickers based on data related to online advertisements for sex. "Sometimes it's a function of IP address, but sometimes it's a function of a phone number or address in the ad or the geolocation of a device that posted the ad," White said. "There are sometimes other artifacts that contribute to location."
White emphasized that Memex does not resort to hacking in order to retrieve information. "If something is password protected, it is not public content and Memex does not search it," he told Scientific American. "We didn't want to cloud this work unnecessarily by dragging in the specter of snooping and surveillance" -- a touchy subject after Edward Snowden's NSA revelations.
Memex got its name (a combination of "memory" and "index") and inspiration from a hypothetical device described by Vannevar Bush in 1945 that presaged the invention of PCs, the Internet, and other major IT advances of the next 70 years. Now DARPA and Memex seem set to bring us one step closer to Philip Dick's futuristic police department depicted in "Minority Report."
A new round of testing, set to begin in a few weeks, will include federal and district prosecutors, regional and national law enforcement, and multiple NGOs. According to the Scientific American report, it aims to "test new image search capabilities that can analyze photos even when portions that might aid investigators -- including traffickers' faces or a television screen in the background -- are obfuscated."
By inventing better ways of interacting with and presenting information gathered from a larger pool of sources, "we want to improve search for everybody. Ease of use for nonprogrammers is essential," White said.