My first real Java application, back in 1997, was a servlet-based group scheduler. It wasn’t quite the smash hit that Hanson’s “MMMBop” was that summer, but as some of you may recall, it had its charms.
One of the things that fascinated me was the ease with which Java enabled me to manage our data in a memory-resident object and serialize it to disk when users made changes to their calendars. The application was, quite simply and elegantly I thought, little more than a Java Dictionary exposed for transactional use on the Web.
Kent Beck and Ward Cunningham, two leaders of the agile programming movement, would have been proud of me. Although I didn’t know it at the time, I had embraced one of their central tenets: Do the Simplest Thing That Could Possibly Work.
I hadn’t foreclosed any options. There were ways to scale the application if I needed to, and in fact, I later experimented with swapping out Java’s native serializer for an industrial-strength object database. But as often turns out to be the case, there was never any need to fire that big cannon.
My group scheduler was an example of what Clay Shirky calls "situated software" -- an application that’s used by, at most, dozens of people, and that needs agility more than it needs scalability. I’ve since revisited that strategy from time to time, most recently for several of the services I use to search my own blog.
In April 2003 I began accumulating all of my entries in a single XML file. I also run them through a publishing system to create Web pages and RSS feeds, but the XML file is my canonical archive. And although I’ve written more than 700 items since then, amounting to a third of a million words, the file doesn’t yet exceed three megabytes.
It’s entirely feasible to keep that corpus in memory, so I do. One instance of it backs my structured search service, which I use to run XPath queries over the collection. That gives me instant access to a variety of microformatted elements: quotes by Ward Cunningham, or code snippets in XSLT or Python.
Structured search is handy, but like everyone else I still regard good old-fashioned full text search as my bread and butter. Until recently, I’d been relying on InfoWorld’s Ultraseek engine. But because it crawls my site, which includes templated elements, the results aren’t very precise. I wanted to search just the words I’ve written.
So now I load up another instance of the file and search that. The index? There isn’t one. The service just rips through memory, finding substrings. It’s blindingly fast. And charting my productivity alongside Moore’s Law suggests this strategy won’t run out of gas anytime soon.
When we consider the exponential growth of storage, we often forget that our most essential data is textual and numeric. And that stuff tends to grow only linearly. For example, my 2005 e-mail archive tops 100 megabytes, but a big chunk of it is PowerPoint attachments people have sent me. Boiled down to their textual and numeric essence, they’d occupy a fraction of the space.
There’s nothing new about in-memory databases. They come in many different flavors, all of which are still fairly exotic, but emerging technologies such as Microsoft’s LINQ (language integrated query) promise to pull this approach into the mainstream. For our most vital and most volatile data, it’s a strategy whose time has come.
This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.
Download now »Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.
Download now »
The emergence of WLANs has created a new breed of security threats to enterprise networks.
Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation
Effectively address data protection challenges, implementing solutions that help store and protect businesscritical data while cutting costs and improving efficiency and reliability.
Download now »
Sign up to receive Data Management Resource Alerts
