July 25, 2006

Are Your DBAs Working for You?

OK, for a while now I've been on this kick about how companies ought to look out for their people, and recognize the talent they have and keep and nurture it. So, this one is for the IT managers out there, but DBAs should read it too just to know what they're being graded against. Here's the problem: You have a DBA, or even a team of DBAs on staff. Your main production system goes down, and it's now time for the

OK, for a while now I've been on this kick about how companies ought to look out for their people, and recognize the talent they have and keep and nurture it. So, this one is for the IT managers out there, but DBAs should read it too just to know what they're being graded against.

Here's the problem:
You have a DBA, or even a team of DBAs on staff. Your main production system goes down, and it's now time for them to spring into action. After assessing the problem, they decide that the DB is completely shot and needs to be restored from backup.

Here's your quiz:
Are they telling the truth, or do they just not really know what they're doing? Could the system have been brought back online much faster? Was it really necessary to restore and possibly lose the latest transactions? Did you lose any transaction, and if so, is it because you just couldn't help it, or because they didn't set it up correctly to recover from disaster?

There are other questions to ask yourself, but you get the idea.

Now, most of you probably don't know the answers to any of these questions, and it's partially your fault. More on that in a minute.

In the example I gave above, it could have really gone either way. Here are just some of the things that could have happened.

1. DB could be in suspect mode, in which case it could be as simple as a drive outage, but it could easily be that the DB outgrew the drive in the middle of a huge transaction and there wasn't enough space to rollback. Does this require a restore? No.

2. DB could have corruption. In this case, it really depends on the level and cause of the corruption. It also depends on how long it's been corrupt. There are ways to fix the corruption both with and without data loss. However, in this case restoring could actually make the problem worse because you could restore the corruption instead of fixing it.

3. If you restore for whatever reason, did your DBAs think ahead and test the backups? Did they leave you in a position to be able to backup the tail of the log so you don't lose transactions?

I think I've made my point.

So how is it partially your fault that you're in a bad position?
The hows and whys are really important, and it's important for you to not just accept your DBA's word. When there's a situation, ask for details. Ask them what the root cause of the problem was, and how it could have been avoided. Get them to explain everything to you, and if you don't understand, don't stop until you do. I do a post mortem after every outage. I've got a form I use that outlines the players, when it happened, when the DBAs were informed, when it got fixed, what the cause was, how it can be avoided, etc. This is very important in understanding your environment and can help you recognize major shortcomings in your plan.

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.