How do I freaking scale Oracle?

Contrary to the rants you read on Stack Overflow and Hacker News, Oracle is still your only option in many cases -- even when you need scalability and high availability

Page 2 of 4

Oracle RAC
Oracle RAC is mainly about load balancing and high availability to Oracle processes. It uses a shared disk architecture, which means that while it keeps the database available against a server outage, it does not protect against any outages that affect storage.

Moreover, Oracle RAC doesn't really help with overall scalability concerns if your database wasn't originally CPU-bound -- the load on the disk, if anything, increases. As a database is tuned, it eventually becomes disk-bound; RAC isn't much help with this. Also, if your data center goes dark, RAC probably isn't your wide-area solution.

Nonetheless, with any large Oracle system, you need RAC. It helps with a lot of common RDBMS and Oracle-specific problems:

  1. Connection management. In cooperation with Oracle's JDBC driver, the connections from Database clients such as your application servers will load balance to Oracle instances.
  2. High availability. In cooperation with Oracle Clusterware, the system can "hide" downed back-end processes or even entire instances from end-users.
  3. Load management. Different types of application requests can be grouped and automatically routed to specific instances or restricted in the amount of resources they use.

Oracle separates the management tools for RAC under separate branding as Oracle Clusterware -- and lumps the likes of Oracle Enterprise Manager under Clusterware, which provides much of the infrastructure for RAC.

Mirroring can be useful for a hot standby, although you probably won't be balancing load between mirrored sites.

There are several mirroring products from the block layer on up. They are helpful for availability, but may not be that great for disaster recovery, as mirroring doesn't really scale across the WAN -- although you may be able to achieve acceptable performance within a 20-mile radius. Conventional wisdom says mirroring within that radius protects against most disasters, and statistical analysis backs that up.

That said, few were prepared for a major hurricane in New York City. No one generally thinks about massive Northeastern blackouts, either, but there have been a few. Twenty miles away may not be far enough.

Another major disadvantage of storage-level solutions is that you can't mix versions of Oracle. You'll have to either combine your storage-level solution with something else or create scheduled maintenance windows for outages.

Mirroring technologies at the block layer include products like:

  1. Storage replication solutions like SRDF. These allow for both synchronous and asynchronous transfer. The synchronous mode ensures consistency between the primary device and the backup. I've yet to see a working Oracle cluster that did not run in the lower-performing synchronous mode.
  2. Storage virtualization products like VPLEX. These can be thought of as really advanced SAN versions of what Linux offers as "logical volume management." Mainly storage virtualization is for growing and shrinking storage, as well as balancing load across disks, but it can mirror logical volumes (a logical volume is a group of disks that act as if they were one disk to the next layer above). This is sometimes done at the hardware layer; it can also be achieved at the file system/operating system layer.
  3. RAID 1. At the lower end you have traditional RAID disk mirroring, which can be implemented at the hardware or controller level or the software level. Most Linux distributions can do this out of the box.
| 1 2 3 4 Page 2