News of how spy agencies are leveraging information exposed by Google's proprietary cookies to track people has inspired a lot of speculation about what in some of Google's cookies has the NSA so interested. The truth is fairly mundane, but a closer look turns up a few intriguing factoids.
Among the Google cookies snorted up by the NSA is the PREF cookie, used to persistently store certain user preferences for Google sites. Its presence and behavior have been known about for quite some time. Google talks about the PREF cookie openly in its "Types of cookies used by Google" page.
By itself, the cookie isn't any different from any other cookie placed by any other website. It can be deleted or blocked manually with no outward ill effects, although Google claims this will affect the quality of one's experiences with Google services. Visiting any of Google's websites -- Google+, Docs, and so on -- causes the cookie to reappear.
The BBC further claims that "since many other firms make use of Google's technologies to place ads, a user may have PrefIDs on their computer even if they have never visited the search firm's own services." I attempted to test this myself by deleting the cookie and visiting a number of pages served by Google ad affiliates. PREF did not appear to return, but other cookies traceable to the google.com domain, such as the NID cookie (which also appears to contain persistent information and sports a six-month expiry period), did resurface.
Discussions of PREF, or Google's cookie policies, are not new. The Wall Street Journal's Digits blog had a sizable post about it last year. As far back as 2008, the site iMilly.com has had a page documenting and discussing the PREF cookie, and it even provided a bookmarklet that would allow users to zero out the identifying information in the cookie. The site has not been updated in some time, but the bookmarklet still seems to work.
What data the cookie holds has also been the subject of a fair amount of analysis, and the short version is that while the data by itself doesn't leak much personal information, it becomes more problematic when swept up as part of an aggregate of user data.
A research paper by Vincent Toubiana and Helen Nissenbaum, published in 2011 in the Journal of Privacy and Confidentiality (Volume 3, No. 1) examines the PREF cookie in detail as part of an analysis of Google's data-retention policies, and identifies the fields in it as follows:
- ID: the cookie ID number; this number is generated for each instance of the cookie in each browser
- LD: the default language (such as "en" for English)
- NR: the number of results to be displayed on a page for a given action
- TM: timestamp for the cookie's creation
- LM: timestamp for the last preference changes
- S: an undocumented value that the researchers believe is "a hash of the precedent values"
Much of the information in the paper was taken, in turn, from a video Google created in 2007 explaining its privacy policies in detail. The researchers also describe how the cookie ID is used by Google to link one or more IP addresses to a given user's behavior. The ID number remains linked to the user's IP address, with only the last octet of the address erased after being stored for 18 months.
"Because Cookie ID is never deleted [from Google's systems]," the researchers point out, "it remains possible to link user searches based on other values that a cookie contains."
Such linkages are done routinely by Google internally, and the paper doesn't rule out the possibility of Google supplying a third party with that information -- for instance, via a subpoena. This has become less likely, though, given Google's increasingly vigilant stance against surrendering data to authorities.
And while the cookie itself doesn't store anything that could be directly traced back to an individual user, its persistency across user sessions is what gives privacy and security advocates pause.
As the Washington Post pointed out in its slideshow, harvesting PREF in conjunction with other data can "[allow] the agencies to single out an individual's communications among the sea of Internet data in order to send out software that can hack that person's computer." In other words, it's not the cookie by itself that's the issue, but how it, in conjunction with other data and techniques, can be used as part of an attack.
The cookie also doesn't appear to be transmitted over HTTPS, either, which makes it all the easier to snort from Web traffic over Wi-Fi or via deep packet inspection technology.
None of this is likely to be news to privacy and online-safety advocates -- such as Ghostery and Mozilla -- who already stump for giving the user tight control over how data is tracked. It certainly ought to give them that much more ammunition to convince people that even innocuous browsing behavior leaves enough traces behind to be pieced together by third parties -- and not just spy agencies, but the far more prevalent threats of organized crime and malicious hacking.
This story, "Google cookies are pretty mundane. So why do spy agencies want them?," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow InfoWorld.com on Twitter.