The Oracle Database Flaw

Fundamental Oracle flaw revealed

A design decision made by Oracle architects long ago may have painted some of Oracle's largest customers into a corner. Patches have arrived, but how much will they correct?

1 2 3 4 5 6 Page 4
Page 4 of 6

Plainly put, this means that every single interlinked Oracle instance across an entire company will need to be shut down just to move a bit further away from the line. If only a handful are shut down, they will very quickly jump right back to the high SCN whenever they connect with other Oracle instances. The only way to ensure this problem is completely eradicated is to shut down every affected system: backup servers, replicas, everything.

Not only that, but while they're down, admins will need to scour the infrastructure to be absolutely certain that no affected Oracle systems have escaped remediation. If they miss even one instance, they will have to perform the complete shutdown again.

Then there's the issue of how long to shut down. If you shut down every instance for a week, that would buy you about 10 billion ticks away from the line of no return. How many businesses would entertain the idea of shutting down their database systems for that long?

While they're down, the SCN could potentially be "reset" -- but only by dumping out each database, dropping the database, and importing the dump into a fresh database. This would have to be done for every database running on every database server across the entire organization, all at the same time. With databases routinely in the multiterabyte range, this will take a while.

Again, only very large customers with many interconnected Oracle databases would be likely to run a significant risk of being affected by this problem. But the larger the Oracle environment, the longer this restoration would take. Typically, large organizations have the least tolerance for downtime.

The fix
Until recently, aside from the backup bug fix, Oracle's only response to the SCN elevation issue -- as far as we've been able to determine -- has been to release a patch that extends the SCN calculation to 32,768 times the number of seconds since 01/01/1988, doubling the rate at which the soft limit increases. Oracle even made it modifiable, so admins can further increase the multiplier. (Oracle has informed InfoWorld that the patch to double the rate "was withdrawn by Oracle, as it potentially can incur more problems than it solves, as further outlined in the story. Instead, we added a better set of fixes in the January 2012 Critical Patch Update.")

If this patch is applied to an Oracle instance, it will definitely increase the time the interlinked databases can run before hitting the SCN limit. However, it also introduces new variables.

Part of the problem is that you can't patch every system at once. Additionally, if you have a patched system with an elevated soft limit -- based on a multiplier of, say, 65,536 -- the SCN on that system could be higher than the SCN on an unpatched system using the original 16,384 multiplier, causing the unpatched system to refuse the connection or encounter another problem as it fails the soft limit check. There's also the issue of servers running older Oracle versions that may not have a patch available.

Furthermore, if this patch is a default inclusion in the next Oracle release, admins may suddenly discover that their existing servers are unable to communicate with new or upgraded servers that use the new, higher SCN calculation method, should the new servers have a sufficiently elevated SCN. If the SCN values line up just right, it's possible that a patched system could connect and set the SCN of an unpatched system just shy of the soft limit, causing the unpatched system to hit the limit through its own processing.

1 2 3 4 5 6 Page 4
Page 4 of 6