| About InfoWorld : Advertise : Subscribe : Contact Us : Awards : Events : Store |
|
||||
|
|
||||
|
Data management gets it together By Tom Sullivan April 13, 2001 1:01 pm PT Data is more important than ever, and companies need new ways to manage it
To put the growth of data in perspective, a University of California at Berkeley study predicts that after taking approximately 300,000 years for humans to generate 12 exabytes (an exabyte is over 1 million terabytes or a million trillion bytes) of information, the next 12 exabytes will be accumulated in just two and a half years. And the sources of data are growing as well. Witness the variety of corporate, personal, and industrial devices that not only house data but, more important, are becoming enabled to hook into back-end data sources to feed and retrieve data. Meanwhile, only about 20 percent of the world's data resides in relational databases; the rest is in a combination of flat files, audio, video, prerelational, and unstructured formats -- not to mention the mountains of paper-based data just waiting to be digitized. The result of incorporating all these different data types and sources is that data management is changing into a broader category of managing content that includes all data types, vendors say. Data, data everywhere To keep up with the data explosion, database vendors are working to manage more data types, and in some cases they are doing so from within the core database engine. "We're attempting to get at all of the data out there," says Jeff Jones, senior program manager of the data management group at IBM in Armonk, N.Y. "We want to provide data management to the universe of nonrelational data." Although several companies once competed in the database space, a number of vendors, such as Sybase, have honed in on specific niches such as financial institutions and telecommunications companies. The field has cleared, so to speak, and there are two general approaches to data management moving forward: Redwood Shores, Calif.-based Oracle's centralized philosophy and IBM's federated data style. Understanding that very few customers have all their relational data in a single vendor's database, Big Blue's approach is to be capable of managing data residing just about anywhere, including competitors' databases, and to extend the functionality of its flagship database, DB2, to other data types and locations, including competitors' databases. "Federation is about enabling middleware to reach out and touch data from a variety of sources, then manage it as if it were in one relational database," Jones says. One of the most important advantages to the federated approach is that companies don't need to migrate data from a variety of sources, such as legacy and nonrelational systems, into a single repository. Migrating small amounts of data is not problematic, but moving a multiterabyte data warehouse is nothing short of Herculean. Instead, IBM's approach extends the core database engine capabilities to sources outside the database, including non-IBM databases. The British Library, in London, which has a mix of databases from different vendors, as well as nonrelational and unstructured data, is subscribing to the federated approach. With more than 150 million items to catalog and archive, tying together all the traditional material -- including recently digitized medieval manuscripts -- with the library's burgeoning collection of digital content and making all of it easily accessible through a single interface are the library's biggest challenges, according to Helen Shelton, deputy director of collection management. The first step of the federated approach is to enable visitors to the physical library to search its digitized archives using in-building terminals. "The vision, over time, is that everyone will be able to access the British Library's entire collection via the Web," Shelton says. Shelton continues to say that the British Library is also working with a number of organizations, such as the Royal Dutch Library in The Hague, Netherlands, to build a searchable repository for the collections of numerous kinds of objects. "We're trying to create a common database that includes not only documents but pictures and objects as well," says Johaan Steenbakkers, director of IT at the Royal Dutch Library. Big Blue's fiercest rival, Oracle, on the other hand, is pushing the notion of centralized management, where all of a company's data resides in an Oracle database from which it can be easily managed. "I guess we have a philosophical disagreement with IBM," says Jeremy Burton, Oracle's senior vice president of products and services marketing. "We think the industry has come out of the distributed computing model." Burton added that the biggest benefits to centralizing data management are its low cost, faster performance, and the fact that it provides better information because it is all in one place. Oracle is not ignoring this need to access data outside the database engine. Companies that have content residing on the Internet, for instance, can use the database's query engine to index that content, Burton says. Ralf Zwanziger, lead engineer at Siemens in Munich, Germany, says the company uses Oracle databases and Oracle's Internet File Server (iFS), and in the process of employing the centralized approach has pulled parts of its intranet from a server into the database where it now can be accessed and managed from a single interface. Although Zwanziger isn't specific about cost savings, he comments that centralized management has its benefits. "A single point of administration makes things easier and cheaper," he says. Analysts, however, maintain that IBM's federation provides the best of both worlds. "IBM probably has the better approach because you can either use federation, or you can bring all the data into the database to centrally manage it because it scales," says Peter Urban, a senior analyst at AMR Research in Boston. Urban says that Oracle's scalability within the forthcoming 9i database will improve considerably, particularly with Real Application Clusters, a feature that enables customers to add or subtract servers from a cluster as need be without taking the server farm down. Analyst firms Dataquest, in San Jose, Calif., and IDC, in Framingham, Mass., both list IBM and Oracle as the market's top guns; but Microsoft has been making its own push into the enterprise with each new iteration of SQL Server and has won some household-name accounts, such as BarnesandNoble.com. The Redmond, Wash.-based software giant is expected to gain market share quickly because it offers a database that is considerably less expensive than either Oracle or DB2, yet easier to use. Microsoft's approach is almost a blending of both Oracle's with IBM's. "Our philosophy is that you really need to have centralized management of metadata to effectively search across different sources and types of data," says Steve Murchie, Microsoft's group product manager for SQL Server. The engine that could Although the various approaches differ, all the database vendors are moving away from managing just data and the metadata that describes it toward managing a broader category of content, analysts say. "If users stretch the definition of what data is, it stretches our definition of what the database has to do," says Pat Selinger, a fellow at IBM. To that end, as data management morphs into a broader category of content management, the vendors say more and more functionality will be packed into the database. The latest technologies being pulled into the database are data mining and analytic capabilities. "The database vendors are finally starting to get it and are adding functionality to help end-users use the data," says industry analyst Howard Dressner, vice president and research director at Stamford, Conn.-based Gartner. Without nailing down a specific time frame, IBM says it plans to pull its Content Manager software, currently a stand-alone product, into the database in the future. Dressner added that incorporating more and more functionality into the database engine generally improves performance and makes the specific technology more effective. In the case of business intelligence functionality, for instance, users can get more insight out of the data when they interact with the database engines. A market reborn "As users are exposed to more functionality over the Internet for e-business purposes, you will have intense demand for data that is immediately accessible online. This in turn causes demand for [database] software that manages that data," says Carl Olofson, a program director at IDC. In fact, in a survey of IT executives by AMR Research, almost half responded that databases are their top investment area in 2001 and will remain the most important through 2002. And Dataquest reports that by 2004 the database market will reach $12.7 billion -- no small pittance. Analysts also expect the competition to increase between vendors as they all vie to manage the most data, and analysts say there will be a considerable technology overlap. With such blurry lines, choosing a database management system is not an easy task, and not all users find the picture black-and-white. Eaton, for instance, is standing at a crossroads. The Cleveland-based industrial manufacturer was a DB2 shop until 1996 when it committed to implement only Oracle solutions, according to John Schindler, an Eaton program manager. Schindler says that Eaton is torn between continuing with Oracle as it moves into the e-commerce realm or going back to using DB2. Schindler adds that Eaton is considering vendor strategies, pricing, and, most important, the openness of the products. "We're sort of coming back around full circle," he says. "At this point, we're not going to battle about the technology -- it's a business decision." Send comments to Senior Writer Tom Sullivan (tom_sullivan@infoworld.com). Ed Scannell contributed to this report.
RELATED SUBJECTS SPONSORED WHITE PAPERS
SPONSORED LINKS
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||