More and more, we rely on Web services as a matter of course. The key word is rely: We assume that the data we upload to, say, a photo-hosting account or blog service today will still be there tomorrow. In large part, that's because we assume the services themselves will still be there tomorrow.
But over the past few years, we've seen plenty of examples of sites that are here today and all-too-gone tomorrow -- for example, Friendster (which dumped user data for a redesign in May) and GeoCities (which shut down in 2009).
[ Also on InfoWorld: The 10 worst cloud outages (and what we can learn from them). | In the data center today, the action is in the private cloud. InfoWorld's experts take you through what you need to know to do it right in our "Private Cloud Deep Dive" PDF special report. | Also check out our "Cloud Security Deep Dive," our "Cloud Storage Deep Dive," and our "Cloud Services Deep Dive." ]
In other words, nothing lasts forever. The Web services that we entrust with our data can -- and do -- vanish. And when that happens, you need to have a plan. In the following pages, I'll take a look at some cases where user data was lost or endangered, how the companies (and their users) handled the situation, and what you can do to keep your own information safe.
Don't let this happen to you
Unfortunately, there are plenty of examples of services that have shut down, changed hands or simply lost their data.
MySpace. The slow death and muddled rebirth of MySpace -- once a fiercely popular social network, overshadowed by the rise of Facebook -- raised a lot of questions about what would happen to existing users' data and whether or not there would be an easy way to bulk-export any of that information.
MySpace did set up what has been described as a "data-portability initiative" back in 2008. But this seemed not so much for the sake of exporting data from MySpace as allowing consistently reused contact information to be automatically filled in across sites. Worse, the terms of service for MySpace developers explicitly forbids creating applications designed to export user data to another service. That hasn't stopped people from creating scrape tools for MySpace such as Make Data Make Sense's blog-export utility.
Google Videos. After Google's acquisition of YouTube in 2006, Google Video seemed as redundant as a second navel. By 2009, the ability to upload new videos was shut down, although concerted protest by users kept Google from shutting the service off entirely so that any videos still there could be archived manually. Those who had spent money on Google Videos' download-to-own/-rent program found access to their purchased content gone, although those with outstanding credit in the system could have that transferred as funds to Google Checkout. (Later, Google announced it would also offer credit card refunds; in April 2011, it announced that it was keeping Google Video content up indefinitely, until all the remaining videos could be moved to YouTube.) As with some other closed Web services, the issue wasn't just the content but the existing user investment in the site, in multiple senses of the word "investment."
Sidekick. Back in October 2009, around 800,000 T-Mobile users who owned the Sidekick phone were in for a rude shock when the servers holding their personal data -- including email and contact information -- went down. It was originally reported that the data was lost for good, although the majority of the data was later restored. Not that it made for any less of a black eye for T-Mobile and Microsoft (which were managing the servers that kept the Sidekick data). Worse, users had no short-term recourse for recovery other than whatever data might have been synced to their computers.
The data service for Sidekick was discontinued for good on May 31, 2011. According to a statement by Microsoft, T-Mobile provided "an enhanced Web tool ... on myT-Mobile.com to easily export their personal data, including contacts, photos, calendar, notes, to-do lists, and bookmarks, from the Danger service to a new device, computer, or a designated e-mail account." If they had provided something so convenient during the earlier data outage, or as a routine way to allow Sidekick users to keep their data intact, the wailing and gnashing of teeth might not have been as loud.
Blogging and Web-hosting services. With blogging and free websites now throwaway commodity offerings, it's not surprising when these services bite the dust. GeoCities, an artifact of the Web's earliest commercial days, was widely lamented when its plug was pulled in 2009. Yahoo did little on its own to preserve the sites, but there were third-party efforts to save the contents of Geocities. Also, Windows Live Spaces was shut down in March 2011, ai which time users were given the option to migrate to WordPress. And as of May 24 this year, Yahoo's MyBlogLog was also canned; there are, however, tutorials on how to migrate your data from it.
Lala.com. For the users of Lala.com, the problem was a little more complex. The short-lived online music service, which allowed users to cheaply purchase streaming access to music, was bought out by Apple in December 2009. Users who had existing credit with the service were allowed to transfer those credits to iTunes, but any purchased streams were gone for good. No provision existed for, say, allowing legal MP3 downloads of the purchased streams. (Blame the thicket of restrictive licensing agreements that automatically spring up around any online media service and the fact that music isn't really "purchased" online but merely licensed.)
The fate of Lala brings up an interesting question. Given how many media services are offering "rental" rather than "purchase" models for their offerings, at what point will people feel an entitlement to that data as theirs? And given the voracity with which companies can gobble each other up, how willing should people be to pay money for access to something that could dry up overnight?
These are not questions that have set answers, since they deal with conceptual changes in the nature of the services people consume, and are heavily affected by the reputation of the company in question. For example, few people expect Amazon to go out of business anytime soon, so there's not the same hesitancy about buying books on the Kindle as there would be about streaming music from a fresh young startup.
Look for these features
If you're currently deciding whether to use a specific Web service, it helps to know how it will handle your data and if it can provide you with ways to rescue your data or move the information offsite. There are several things to look for.
Data is available in open formats for easy download. The best sign that a website or service has the preservation of its users' data in mind is the ability for users to make a backup copy of their data through the service itself. If there's no back-end tool for downloading copies of your content, you may be forced to scrape the data manually, so anything that saves you the trouble of having to do so is worth noting. The wiki-creation site Wikia.com, for instance, lets you save whole wikis or individual pages into plain text files either for archiving or offline editing.
Interestingly, Google has been making major strides in this area. When it recently started beta-testing its Google+ social network, it added extensions to allow personal data (contacts, circles, etc.) to be exported via Google Checkout. The real test of such a feature, though, is how useful it'll be to transport your data into other services.
Data tools are provided by the service or third parties. If you don't have direct access to your data through the service's own Web interface, the next best thing is an application that can pull that data for you via one of the service's APIs. You might have to do some programming on your own to take advantage of those APIs, but it's a good idea to look around first -- someone else out there might well have done that work for you and made the results freely available.
Andrew Reichman, principal analyst at Forrester Research, says any service you use should be considered proprietary, even if the provider of the service advertises its own exit strategy. In other words, take any claims about data portability with a hefty chunk of salt. "Even with standards [for data interchange], you are still at the mercy of the administrators and policies of the company operating the equipment on your behalf."
Terms of service. The ToS for almost any service these days is worded to within an inch of its life, with almost every conceivable aspect of the service's functionality covered. "Paying close attention to the SLAs [service-level agreements], contracts and penalty structure related to non-performance of SLAs is critical," says Reichman. "Having an exit strategy, or at least some discussion about what would happen in the event the customer wants to pull out or the vendor cancels service, is an important preliminary step to take, prior to committing to a given vendor." The fewer details about such things in the ToS, the more wary you should be.
George Hamilton, an analyst at Yankee Group, is even more insistent on this point. "Caveat emptor," he says. "Know how the service provider protects stored data and data in motion, and how it is backed up."
This is where services can afford to compete most aggressively: by allowing customers more freedom of movement with their data, even if it seems counterintuitive at first to let them leave. "Vendors should sell their functionality, not create lock-in with technology," says Hamilton, noting that the general movement in the industry is toward open standards of one kind or another.
Reichman, however, disagrees. "The most likely [scenario] is one vendor's proprietary structure becoming a de-facto standard that other vendors follow," he says.
Watch for warning signs
Is it possible to tell ahead of time if a service's plug is about to be pulled? Sometimes the best places to look for signs of that happening are not on the service itself.
The ArchiveTeam Web site maintains a list of sites that are in danger of being shut down or are already dying. If you use a site listed there (under the heading "Watchlist"), it's probably a good time to think about taking your data elsewhere or, at the very least, backing it up somewhere solid.
Reichman advises looking at the company's numbers. "You can't always discover that a potential vendor has financial problems, but some issues can be uncovered with a bit of due diligence in financial statements, if available, or funding history and any news stories about the vendor," he says. "Rumors of impending acquisitions or divestitures, layoffs or strategy shifts are all signals that there may be trouble looming."
Both Reichman and Hamilton say there may be few outward warning signs, even financial ones. "Companies in fiscal trouble typically don't pre-announce that kind of trouble," notes Hamilton. "You need to be proactive. If they're a public company, you can see their financials. If not, you should still watch to see if they're in the news. If you have questions about their viability, don't use them in the first place."
That said, again, it's hard to say no to a particular service if job requirements or peer pressure require you to do so, especially without viable alternatives. For a time it was difficult to spurn Facebook, for instance, despite its lack of data portability and questionable privacy practices -- everyone used it. Now that wall of dominance may be crumbling a bit with the appearance of Google+ and the quiet success of LinkedIn.
Other warning signs include:
Declining quality of service. An ongoing, chronic disintegration of the service -- "increasing service disruptions or performance issues", as Hamilton puts it -- is a major red flag. He adds to that, "a general lack of responsiveness to calls or emails."
Declining third-party support. Sites with APIs typically develop a culture of third-party apps -- image uploaders for photo-hosting sites, for instance, or applications that integrate directly into the service, such as Facebook's massive roster of games. If development of such applications has fallen off, that could be a sign the service is losing its user base. If the pace slackens not because of market saturation (you can only have so many photo uploaders) but because of genuine programmer alienation -- to the point where word filters out into the general user community -- that's a bad sign.
Changes in terms of service or arbitrary behaviors. Many people leave a Web service behind not because the service itself is endangered, but because of things the service has done. A common reason for this is changes to the terms of service, which can spark a massive user backlash. It doesn't help that terms of service are all too often pools of mud, where the implications of any changes are unclear unless spelled out with total precision. Think of the recent flap over DropBox's clause indicating it would turn files over to the government if asked -- which forced the company to add wording to the effect that your stuff remains yours and they won't mess with it unless they have no other choice. (In its own words: "These Terms do not grant us any rights to your stuff or intellectual property except for the limited rights that are needed to run the Services.")
Different folks have different thresholds of tolerance for such things, so what ticks off your neighbor may not seem as egregious to you. But if you hear about such a thing happening with a service you use, pay attention, and give the ToS a fresh read whenever you're asked to reconfirm your acceptance.
Read the terms of service
Speaking of the terms of service, that's the one part of any service you shouldn't ignore, since it spells out what can and can't be done with your data. It doesn't help that most terms of service are terribly arcane, with crucial points buried within multiple clauses of pure lawyer-speak. Here are several major clauses that appear in a site's ToS, as they affect movement of user data.
Rules about third-party programs. Many sites explicitly disallow the use of unapproved applications designed to scrape or harvest site data, on pain of termination. If you're leaving anyway, this threat isn't quite as weighty, but it might cause trouble if you are relying on such a program to back up your data on a regular basis. These rules often cover the service's stance on data portability -- they may not come out and say that data can't be exported from the service, but they may add rules like this to make it massively inconvenient.
A lot of that is achieved by general vagueness in the wording of these rules. Paragraph 6.j of Yahoo's ToS (which includes Flickr) forbids "disobey[ing] any requirements, procedures, policies or regulations of networks connected to the Yahoo! Services, including using any device, software or routine to bypass our robot exclusion headers," which could conceivably include Web scrapers or other such applications. Most of the time, it would be hard for it to tell that those apps were in use, unless a great many people started using them, a lot of content from an individual user's account was being scraped or the service attempted to detect use of such tools and took steps to block them.
Users who ignore ToS provisions about third-party applications do so at their own risk. "Legally, you could be breaking a term of service or violating copyright laws," notes Hamilton. "Or, if a Web scraper is constantly scraping a site, they could impose performance issues or become the equivalent of a denial-of-service attack."
Reuse of your content. Some sites will have a ToS provision that allows whatever you post to your account to be redisplayed in other contexts. If you see this clause, don't panic, but do read it closely. This clause typically exists for the sake of allowing whatever you post to be shown in promotional material, rotated on the site's home page or just manipulated internally.
Google's ToS, for instance, has this in paragraph 11.1: "By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to reproduce, adapt, modify, translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services. This license is for the sole purpose of enabling Google to display, distribute and promote the Services and may be revoked for certain Services as defined in the Additional Terms of those Services." Many other services retain a similar clause.
As-Is/As-Available. This is another catchall clause that, in effect, means the service has no particular obligation to provide continuous uptime, to protect your data's integrity or even to keep the service active. Note that As-Is clauses may be a bit buried and not broken out into their own section; search on the keywords "As-Is" or "warranty" to find them.
At-will termination. Finally, some terms of service have a clause that states they can pull the plug on your account, just because. Don't be surprised if you see something like this -- it's usually in there as a catchall way to kick people off if they flaunt the rules or consume a disproportionate amount of the service's resources. You may not need to worry about this most of the time, but it may be used to justify booting you off if, for instance, you use an unorthodox or unapproved method to retrieve or mirror your data. Google has this clause in paragraph 4.3 of its ToS; Yahoo's ToS has it in section 15. In both cases, it's worded in an open-ended enough fashion to make it possible for an account with either service to be closed for no apparent reason at all.
Create an exit strategy
If you don't have major qualms about a service you're with but you still want to create an exit strategy, a few basic points are worth keeping in mind.
Keep local copies of everything that's crucial. The only storage you can completely trust is the storage you physically own, so always make sure there's a local copy of everything important. If you've already been trusting your only copies to a site, break the habit now. Any Web service should be thought of as a replicator, not a repository.
For instance, don't ever trust a remote service to your only copy of a given photo, since the service's rule about data preservation might not be in your best interest. Flickr, one of the most popular photo hosting services, doesn't allow you access to the original copy of an uploaded photo unless you have a paid account. A utility like Flump or FlickrEdit can help you extract pictures from your stream, although they will probably not be able to rescue images that aren't publicly accessible. (Flump, in particular, requires a Pro-level Flickr account to be useful.)
On the other hand, many Gmail users have no qualms about leaving their entire trove of mail on Google's servers -- even though both POP3 and IMAP connectivity exist for Gmail, making it not only possible but easy to keep mail local. It's easy to get into the habit of unthinkingly trusting Gmail to always be there -- at least until the next network outage or Google cloud failure.
Practice making local copies of the service's data. If a site has a way to allow you to make a local copy of your data, make a practice run. Step through the process of creating a local copy of the data and see how difficult it is -- how many steps are involved, are third-party tools required and so on.
Also be warned that the process could change on you without warning, so you should take a full review of the process every so often or whenever you get word about major changes to the service.
Keep an eye on what third-party apps are being introduced or removed. If you're depending on a third-party app to help you keep copies of your data, keep in mind that apps can be fickle as well. That app you downloaded six months ago might have since been blocked by the service in question -- or there might be a replacement or even a new (and superior) substitute. In other words, keep up to date and check to make sure your backup mechanism, whatever it is, still works.
Most people don't think much about the inherent closed-endedness that goes with using proprietary Web services, simply because they offer so much in return. That closed-endedness -- and the difficulties involved in porting your data back out -- is becoming increasingly problematic now that such services are so common.
The sad truth of the history of Web services is that any site can disappear, given a long enough time span. But even in the face of such a history, most proprietary Web services still skimp on providing tools to make it easier for users to leave. And why wouldn't they make it difficult, when they have a vested interest in keeping their users? Give people a way to easily switch to a competitor and you've chipped away that much more at the advantages you hold over them.
On the other hand, those who do offer such tools have another advantage: a level of trust with their users that their competition might not have. And given that trustworthiness is becoming a Web currency at least as valuable as ad dollars to some people, it's in any Web service's long-term best interest to start offering those tools. Until then, the rest of us will have to make do with the tools available -- and keep our ears to the ground when rumbling starts. Not if, but when.
Serdar Yegulalp has been writing about computers and information technology for over 15 years for a variety of publications.
This story, "How to protect your data when a cloud service vanishes" was originally published by Computerworld.