Some is public on open websites. Some is "public" in the sense that it is passing through unsecured intermediaries that anyone could theoretically observe. Some is private, but can be gathered because a broad interpretation of certain legal doctrines ("sent abroad" for example, when the service provider is in a different country to the originator) allows them to treat it as public. Intelligence agencies are thus slurping up enormous quantities of data in a wide range of protocols and contexts, far more than could ever be appropriate for any investigation.
Why are they doing this? Because otherwise the data would be lost by the time they knew they needed it. They are not actually looking at most of it, at least not straight away. All they are doing is making transient data persist -- they are caching. They are not breaking any rules by doing so (according to their own legal outlooks). They are simply engaged in blanket data gathering to the limits of the legality they understand for their acts. The result is truly enormous data lakes.
To study the data is a different matter, in their view. According to the NSA's legal advisers, "wiretapping" or "hacking" starts at the point a human being actually analyzes or interprets the data. The NSA's XKeyscore tool provides such a capability for fishing in data lakes. The NSA claims that access to the lake is limited, but disclosures suggest it is limited by rule and the threat of audit and not actually by any technical means. As a consequence, agents have to consciously ignore out-of-scope results from tools like XKeyscore.
Using "metadata" is considered OK as it is simply the "public" aspect of the contents of the data lake. Metadata helps target the fishing more accurately, but it can also be used to "triangulate" and determine facts directly. It's an open question whether using varied metadata to triangulate on private facts is surveillance. The British secretary of state is probably speaking the truth according to her chosen frame of definition (in the same sense as Bill Clinton's statement "I did not have sexual relations with that woman" was true). Certainly, a well-considered system of rules makes her statements precisely true.
The distinction the intelligence agencies make is a useful one for the IT profession because the truth is you can't dissociate data gathering from data usage. To make a proper risk assessment and calculate the return on the investment of filling a lake with big data, you need to account for all the costs, not just the ones associated with your primary goal. As author Quinn Norton is credited as saying, "In the end, all data is either deleted or public."
The celebrities in the first story were probably thrilled their photos were being backed up and didn't consider the possibility that Apple's security mechanisms -- or their password choices -- would one day lead to an anonymous pervert posting their privates across the Web. The AT&T executives and engineers who first started collecting the network traffic information probably didn't ever consider their work would end up in the hands of the DEA being casually used to catch kids with pot. And the NSA wants us to not ask too many questions what is going on with all the data they slurp off the Internet for future use. The DEA and NSA weren't even going to tell us it was happening, despite their assurances that there is no illegal activity planned.
All this teaches us a design lesson. If we accumulate data, it is going to get used in the end. If we store that data in a place with public access, it will eventually become public. If we consider only the immediate application, we could be exposing ourselves to great risk in the long term. Storing data can have costs beyond the storage medium -- costs associated with assisting legal investigations or satisfying discovery requests of litigants. Before we accumulate data indefinitely, we should always make an allowance for future abuses of the data, just in case.
This article, "Nude photos, phone records, NSA data offer essential lessons for admins," was originally published at InfoWorld.com. Read more of the Open Sources blog and follow the latest developments in open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.