AOL has apparently released details of Internet searches performed over a period of three months by hundreds of thousands
of its subscribers, raising privacy concerns.
The data, apparently made available for research purposes, is no longer available at the Web site http://research.aol.com, but details of the data were cited by technology blog site Techcrunch, and the page linking to it was cached by Google's
search engine.
The cached copy of the page said the data comprised about 19 million Web searches performed by 658,000 users from March through
May. The page warned of sexually explicit language in some of the queries, and said of the data, "This collection is distributed
for noncommercial research use only." The page contained a link to a compressed copy of the data archive.
The page asked researchers using the data to cite a research paper entitled "A Picture of Search" based on the data, which
names two AOL employees as co-authors. That paper is still available for download here.
AOL officials in London are aware of the issue, they said Monday morning. They had no further comment, and referred queries
to the company's U.S. headquarters. Reached in the U.S., company officials did not have an immediate comment.
The release of such information poses serious privacy concerns. Major search engine companies fought a request for similar
data on user searches last year by the U.S. Department of Justice.
The U.S. government wanted to use the data to check the effectiveness of a federal law aimed at minors' access to harmful
material. In January it filed a motion with the court to compel Google to comply with its subpoena and turn over a "random sample" of 1 million Web site addresses found in its search engine index.
It also asked the company the text of all queries filed on the search engine during a specific week. America Online, Yahoo,
and Microsoft's MSN were also subpoenaed, and complied to varying degrees.
The alleged release of AOL's data has sparked concern over how it might be used after its widespread release. While the original
page is gone, the data has since been made available on several other Web sites.
The data is valuable from a market research perspective, said David Bradshaw, principal analyst at Ovum. Normally, similar
kinds of data sets are only released to trusted researchers, not the general public, he said.
Even then, the resulting research is released as a batch of aggregated statistics, masking signs of individual users' behavior,
he said.
"I do think this was foolhardy at best and a complete disaster or worse for AOL," Bradshaw said. "If I were an AOL user, I'd
be up in arms."
The researchers who used the data wrote in an introduction that user IDs were replaced with an anonymous number. However,
observers are expressing concern about whether users could be tracked based on their queries.
The data also contains the time when a particular query was executed. If a user clicked on a result, the rank of the item
was recorded, along with the domain portion of the URL (uniform resource locator).
The release of the AOL data prompted numerous comments on blog entries dedicated to the issue.
Ben Noble of Aberystwyth, Wales, wrote in a blog posting that the data is anonymous enough that "there's still an amount of
deniability, but it's appalling that anyone should be put in the position of having to deny anything."
Noble wrote that AOL could possess a file linking anonymous users with their real ID and their searches.
The data's public release may violate AOL's privacy policy, said Sean McManus, contacted after posting a comment on the issue.
McManus, who said he does not use AOL as an ISP (Internet service provider), examined AOL's privacy policy after finding it
through a Google search.
"I think the big issue is whether the data should be available at all," said McManus. "Users have a reasonable expectation
of privacy when they use the Internet, particularly since they use the Internet on the condition of AOL's own privacy policy."