The physics of data management used to dictate that your data could be either consistent or highly available but never both at the same time. The discipline of data synchronization sits uncomfortably on the horns of this Heisenbergian dilemma. As times change, though, so do the trade-offs associated with synchronization and its uses.
My favorite example is Usenet. At one time it made sense to replicate content across a federation of news servers. “There are few (if any) shared or networked file systems that can offer the generality of service that stream connections using Internet TCP provide,” wrote the authors of RFC 977 when they proposed the NNTP back in 1986. A decade later, though, many shared file systems did offer that generality of service. We called them Web servers, and among other things, they heralded a transition from the NNTP-based Usenet to nonreplicated forums, blogs, and wikis.
In my 1999 book Practical Internet Groupware, I explored ways of using a nonreplicated NNTP server as a groupware platform. Synchronization still played a role, but a diminished one. With the Internet backbone solidly established, intermittently connected clients were the only ones that really needed to synchronize.
We’ll be using offline clients for a long time to come, but they’re getting scarcer all the time. As a result, it’s tempting to discount the future value of synchronization. A pervasive, fast Internet alters the payoff matrix. If we can relax the constraints that prevented us from depending on coherent and relatively centralized data sets, why shouldn’t we?
The answer, I think, is that synchronization is not just a way to overcome network constraints. It also works hand in hand with an architectural pattern called MVC (model-view-controller). Consider Groove, a timely example in light of Microsoft CTO Ray Ozzie’s recent announcement of a proposed synchronization mechanism for RSS. The MVC pattern is typically used to isolate multiple views of data from an underlying data model. But in Groove, updates to that data model don’t just alter the views. They also propagate to synchronized instances of the data model.
Current trends suggest that this pattern will matter more, not less, over time. In a service-oriented world, systems of record will recede into the background. Increasingly we’ll work with transitory views of information that applications will receive, process, and transmit in the form of XML packets.
In clients’ and routers’ memories, on their local disks and on the wire, those chunks of information will be fairly large. There are three reasons why. First, local memory will always be fastest. Second, we’ll want to scale out by exploiting many processors. But third and most subtly, just because we’ll be able to contact the mainframe directly doesn’t mean that we should. Layers of intermediation will isolate us from it, and for a reason: to enforce proper separation of concerns.
Rohit Khare, director of CommerceNet Labs, once said that the ultimate integration challenge arises at layers 8 and 9 of the OSI stack: politics and economics. For individuals who share personal information, enterprises who syndicate customer or product information, and intermediaries who broker among them, the boundaries of data ownership are beginning to blur. For that reason if no other, we’ll still need to synchronize multiple views of data.