Beware disk I/O variability
Relational databases often appear as unnecessarily complex beasts. But they've evolved the way they have to provide an array of great features. We can load them full of data, then mix and match that data asking complicated questions and selecting slices based on an endless set of conditions.
Behind the SQL language we use to fetch data and make changes, the underlying engine of a relational database -- whether it's MySQL, Oracle, or SQL Server -- has the sole job of reading and writing data to disk, keeping the hottest (most requested) bits in memory, and finally protecting against server failure.
That said, disk I/O -- the speed with which you read and write to that underlying storage system -- is crucial. In the early days of Oracle, for example, before you had RAID, the database engine offered ways to stripe data across many disks, and it emphasized putting redologs to protect against data loss on its own disks entirely. When RAID became widely available, DBAs could simply place all their data files on one volume and the RAID array would take care of the rest.
Enter the present day where Amazon's EBS (Elastic Block Storage) is virtualized, allowing you to cut up a slice of a RAID array that sits somewhere on your network and attach it to any instance. This greatly enhances operational flexibility, allowing easy programmatic changes to hardware, but with any new solution there are challenges.
EBS grapples with the limitations of a shared network resource. Many servers and many customers will all be using that network storage, so their resource usage can potentially impact your server. Amazon's SLAs promise an average disk I/O throughput; however, that throughput can rise and fall dramatically in a given time period. This door swings both ways. When the disk subsystems are overused by multiple tenants, you'll receive less of the resource; when it becomes underutilized, you will receive more.
Keep offsite backups
Replication is great for a high-availability solution that will cover you should your primary database fail you. But what if Amazon suffers a sitewide failure that affects networks or EBS in general? These scenarios can and do happen.
To guard against this, take advantage of Amazon's global proliferation of data centers. Place your objects and instances in a variety of AWS Regions. You can choose MySQL's Multi-AZ option, which works with RDS to replicate your data automatically to another data center. Or you can build your own MySQL primary database in your main availability zone and a replica slave in another region.
Further, consider a scenario where a court action or subpoena impacts Amazon. In the discovery phase, an expansive net accidentally draws your servers into the mess, interrupting your business.
In either case, you'll want a last-ditch insurance policy for restoring your application. Scripting can vary depending on the complexity of your environment. For just a couple of servers, you can build your image as you like it, then snapshot and use that as your gold-standard server. When the instance spins up, a user data script is called, which you can supply to fulfill last steps or additional configuration needs.
For more complex environments, Scalr or Rightscale can provide a templating solution for your automation needs. For even more sophisticated environments or for operations teams ready to embrace configuration management to the fullest, Chef and Puppet may be options for you. With your automation scripts built, you can deploy a new server either in Amazon or another cloud provider, then deploy your code and configurations. As a final step you'll want to restore your data; with an offsite backup, you'll have that base covered.