(Editor's note: This story has been updated to include clarifications and additional information, which appear in italics. For the latest developments, see the follow-on story "The Oracle flaw: Clarifications and more information.")
Over the past two months, InfoWorld has been researching a flaw in Oracle's flagship database software that could have serious repercussions for Oracle database customers, potentially compromising the security and stability of Oracle database systems.
Typically, when a bug results in a database outage, affected systems can simply be recovered from backups. But as InfoWorld has learned, this particular collection of Oracle issues could incur database outages that take considerable time and effort to correct.
[ For new information unfolding about this story, read "The Oracle flaw: Clarifications and more information." | Subscribe to the InfoWorld Daily newsletter and follow InfoWorld on Twitter to make sure you don't miss an article. | See Editor in Chief Eric Knorr's post to the Oracle community: "Calling all Oracle customers." ]
According to a source who asked not to be named, "This is a very real problem for us. We're spending considerable time and money to monitor, plan, and address it where we can."
In the process of reporting this story, we've conducted our own tests, verified information with sources we believe to be reliable, and consulted extensively with Oracle itself, which has given credit to InfoWorld for bringing the security aspect of the problem to the company's attention.
After we notified Oracle of our discoveries and following several technical discussions, the company requested that we hold this story until it had time to develop and test patches that addressed the flaw. In the interest of security for Oracle users, we agreed. Those patches are available at 1 p.m. PT today as part of the Oracle Critical Patch Update for January 2012.
To be clear, the security aspect of the flaw could make any unpatched Oracle Database customer vulnerable to malicious attack. Yet another, more fundamental aspect could pose a special risk only to large Oracle customers with interconnected databases. Both stem from a mechanism deep in the database engine, one that most Oracle DBAs seldom deal with day to day.
The heart of the matter
At the core of the issue is the System Change Number (SCN) in Oracle. This is a number that increments sequentially with every database commit: inserts, updates, and deletes. The SCN is also incremented through linked database interactions.
The SCN is crucial to normal Oracle database operation. The SCN “time stamp” is the key to maintaining data consistency in Oracle, allowing the database to respond to every user’s query with the appropriate version of data at every point in time. It functions as the database's clock, so to speak. And like time, the SCN cannot decrement. It must always tick forward.
When Oracle databases link to each other, maintaining data consistency requires them to synchronize to a common SCN. This is necessarily the highest SCN carried by any participating Oracle database instance because the SCN clock cannot run backward -- so database linking causes the SCN in many databases to jump during normal operations. And only very basic permissions are required to make a connection that can cause one database to increment the SCN on another.
The architects of Oracle's flagship database application must have been well aware the SCN needed to be a massive integer. It is: a 48-bit number (281,474,976,710,656). It would take eons for an Oracle database to eclipse that number of transactions and cause problems -- or so you might think.