Find and fix 15 performance bottlenecks

Suffering slowdowns? Our experts have the experience to help you find the problem -- and offer a little advice on how to fix it, too

"Bottleneck" is a wonderfully descriptive term. It describes an artificial constraint on some form of communication, interaction, or transfer of information. And it leads one to believe that some magical combination of luck, money, and ingenuity can smash that bottleneck and let all good things flow.

The trouble with performance bottlenecks is that they can be tough to identify. Is it the CPU? The network? A clumsy bit of code? Often, the most obvious culprit is actually downstream of something larger and more mystifying. And when performance riddles remain unsolved, IT management may find itself faced with a Hobson's choice between admitting ignorance and making up excuses.

Fortunately, as with medical diagnoses or detective work, experience helps. Drawing on our years of sleuthing and experimentation, we've collected 15 of the most likely ailments -- and suggested remedies -- to help your IT operation track down and crack performance problems.

Some of these bottlenecks are more obvious than others. Most likely, you have something to say about some sneaky spoilers of your own (and we'd love to hear your tales about them). But by identifying common speed killers across IT disciplines, we hope to jumpstart your quest to create the highest performing infrastructure your resources will allow.

No. 1: It's probably not the servers

Server upgrades used to make all the difference, which is why the old saw "When all else fails, throw more hardware at it" persists today. That's still true in some cases. But how much of IT is really that compute-intensive? Generally, you can save a lot of time and money by turning your hairy eyeball away from server hardware. The lower end of the server spectrum has more than enough horsepower to handle everyday tasks.

Here's one concrete example. On a network of over 125 users, an elderly Windows domain controller appeared to be ripe for replacement. This server originally ran Windows 2000 Server and was upgraded to Windows Server 2003 some time ago, but the hardware remained unchanged. This HP ML330 with a 1Ghz CPU and 128MB of RAM was functioning as an Active Directory domain controller carrying all the AD FSMO roles, running DHCP and DNS services as well as running IAS (Internet Authentication Services).

Molasses, right? In fact, it actually did the job just fine. Its replacement was an HP DL360 G4 with a 3Ghz CPU, 1GB of RAM, and mirrored 72GB SCSI drives. Carrying all those services, it runs hardly any load at all -- and the performance difference is unnoticeable.

It’s easy to identify applications that will eat all your CPU and memory, but they tend to be pretty specialized. For almost everything else, the humble commodity box will do the trick.

No. 2: Speed up those queries

You can create the niftiest application in the world, but if access to back-end database servers creates a bottleneck, your end-users or customers won’t be happy. So fine-tune those database queries and maximize performance.

Three basic measures can help you improve query performance. First, most database products include tools (such as DB2 UDB for iSeries’ Visual Explain) that can dissect your query during development, providing feedback on syntax and the approximate timing of the various sections of the SQL statements. Using this information, locate the lengthiest portions of the query and break those down further to see how you might shorten the execution time. Some database products also include performance advice tools, like Oracle’s Automatic Database Diagnostic Monitor, that provide recommendations (such as suggesting you create a new index) to speed up queries.

Next, turn on database monitoring tools on a staging server. You might use a third-party monitoring product, such as Fidelia’s NetVigil, if your database lacks monitoring support. With the monitors enabled, generate traffic against the database server using load-testing scripts. Examine the data gathered to see how your queries performed while under load; this information may lead you to some further query tweaking.

If you have enough server resources to mimic your mixed workload production environment fairly closely, you can execute a third round of query tuning using a load testing tool, such as OpenSTA, plus database monitoring to see how your queries perform alongside other applications that hit the database.

As database conditions change -- with volume growth, record deletions, and so on -- keep testing and tuning. It’s often well worth the effort.

No. 3: What cost, virus protection?

Virus protection on critical servers is a basic requirement, especially for Windows servers. The impact can be painful, however. Some virus scanners are more obtrusive than others and can reduce server performance significantly.

Try running performance tests with and without your virus scanner running to determine the impact. If you see a marked improvement without the scanner, it's time to look for another vendor. Also check specific features. Disable real-time scans, and quite often you'll bring performance up.

No matter how well written your business logic, when you deploy it to the middle tier, you will need to tune the application server runtime environment to maximize performance.

Like a vintage stereo with oodles of knobs for tweaking sound quality, application servers from vendors such as BEA, IBM, and Oracle, supply a dizzying set of controls. The trick is to turn the knobs just the right way, depending on the attributes of your application.

No. 4: Maximizing the middle tier

No matter how well written your business logic, when you deploy it to the middle tier, you will need to tune the application server runtime environment to maximize performance.

Like a vintage stereo with oodles of knobs for tweaking sound quality, application servers from vendors such as BEA, IBM, and Oracle, supply a dizzying set of controls. The trick is to turn the knobs just the right way, depending on the attributes of your application.

For example, if your application is servlet-heavy, you’ll want to enable servlet caching. Likewise, if your application uses many SQL statements to support a large user base, you’ll want to enable prepared statement caching and set the maximum size of the cache so it’s large enough to support the intended workload.

One of the major areas where performance tuning can really help is with the database connection pool. Set the minimum or maximum connections too low and you’re certain to create a bottleneck. Set them too high and you’ll likely see a slowdown resulting from the added overhead needed to maintain the larger connection pool.

If you know the intended workload, tune the application server runtime by turning on performance monitoring tools such as IBM’s Tivoli Performance Viewer for WebSphere on a staging application server. Generate the amount of workload that you expect by using a load-generation tool, then save the monitoring results and play them back to analyze which knobs need adjusting.

When in production, it’s a good idea to turn on low-overhead, passive monitoring to keep tabs on the runtime. If your workload changes over time, you’ll want to execute a fresh performance review.

No. 5: Optimize network connectivity

Most mid-level enterprise servers now have dual gigabit NICs -- but most of them don’t use the second pipe. Moreover, gigabit switch prices have dropped through the floor. With a 120MBps link to your fileserver, a number of 100-megabit clients can achieve wire-rate file access simultaneously.

Even without gigabit switching, NIC bonding should be a staple. At its simplest, bonding two NICs gives you redundancy, but add transmit load-balancing, and you can effectively double the outbound bandwidth. Using switch-assisted teaming will provide the same effect on inbound traffic. Almost every major server vendor offers NIC teaming drivers -- and there are third-party utilities, too. It’s a big, cheap bandwidth boost.

No. 6: Winding up your Web servers

Is there really that much you can do to tune a Web server and maximize performance? In fact, there is -- mainly by adjusting a handful of critical settings to match the production traffic you expect.

For Web servers already in production, begin by collecting real-time Web server statistics (most major Web servers have that functionality built in). Then move to staging in order to determine which parameters, if any, need adjustment.

Activate the Web server's performance-monitoring tools on the staging server. Execute a load test and inspect relevant parameters, such as response time, bytes sent and received, and the number of requests and responses.

Key parameters you'll want to tune depending on the volume of traffic include caching, threading, and connection settings.

Enable caching for frequently used content; some Web servers allow you to cache files dynamically based on usage, whereas others demand that you specify what will be cached. Make certain your maximum cache size is sufficient for the traffic you expect. And if your Web server supports cache acceleration, enable that, too.

For the threading and connection settings, set the minimums and maximums in accord with expected workload. For connections, you'll also need to define the maximum number of requests per connection and the connection time-out setting. Don't set either of these values too small or too large, or slowdowns may result.

No. 7: The woe of the WAN

Think you need to reclaim WAN bandwidth? You can easily spend a bundle on traffic-shaping appliances or caching engines in an attempt to rein in WAN bandwidth utilization. But what if it’s not the pipe?

First things first: Before you buy anything, get a solid idea of what traffic is crossing the WAN. Network analysis tools such as Ethereal, ntop, Network Instrument’s Observer, or WildPacket’s EtherPeek NX can give you a fresh look at what’s really on the wire.

You may find that replication times for your Active Directory are set far too low and simply configuring longer replication intervals can buy you breathing room during the workday. Are some users in remote locations mapping shares to the wrong servers and pulling large files across the WAN without realizing it? Are the vestiges of a long-disabled IPX network still floating around? Some WAN problems boil down to application misconfiguration, where traffic is directed across the WAN when it should have stayed local. Regular reports on WAN traffic patterns will save money and headaches.

No. 8: Let’s play nice

All too often, applications, Web services, and Web sites from multiple departments across the enterprise compete for server resources. Although each of these components may be well-tuned in its own right, an application from another department that is also using the same production clusters may have a poorly tuned query or some other issue, which in turn affects your users or customers.

In the near term, all you can do is work with your system administrators and the department that is having the performance problem to obtain resolution for your users or customers. Longer term, create a community across all of the departments that use the production clusters where your objects are deployed. Work across teams to ensure that there is adequate funding for a staging environment that is truly representative of the mixed workload production environment. Ultimately, you’ll want to develop a series of benchmarks that can be used to validate mixed workload performance in the staging environment.

No. 9: Caching, shaping, limiting, oh my!

If your WAN is truly undersized -- and you can’t afford a long-haul frame-relay network -- traffic shaping and caching can help unclog the pipe.

Traffic-shaping configurations are more art than science. Prioritizing apps is often more political than technical but may have tremendous effects on perceived network performance.

Caching is a different beast altogether. It requires less work than traffic shaping, but the impact will likely be smaller. Caching engines store and serve up local copies of commonly accessed data to reduce WAN traffic.  The downside is that dynamic content can’t truly be cached, so e-mail won’t enjoy the same performance bump.

No. 10: Predictive patching

You arrive at work on Monday only to learn that a bunch of desktops are hung or that the performance of a critical application has slowed to a crawl. After investigating, you determine that a patch that was applied over the weekend is the cause.

That’s why you need tools that support patch rollbacks. Even better, include patch testing as part of your patch-management strategy. First, you must take regular inventory of the applications and technologies in play on desktops and servers. Most systems-management tools, such as Microsoft’s SMS, have the capability to take inventory for you automatically.

Next, replicate the applications and technologies into a staging environment. If your operating system and infrastructure software do not include patch testing tools, get a third-party tool such as FLEXnet AdminStudio or Wise Package Studio.

Alternatively, you can write some scripts to functionally exercise the platform or technology with the latest patches in play. You will need to repeat this scenario (and adjust scripts) as new patches arrive and as software changes are made.

1 2 Page 1
Page 1 of 2