Benchmarks, analyst reports, and other pretenses

Keep the these four facts in mind when assessing how much you should trust “independent” evaluations

Industry benchmarks and analyst reports are bibles for many people in IT, providing independent assessment of vendors product and technology claims. But too many people take them literally. The fact is that they should be treated with much more skepticism than they are typically given. The devil is in the details—rather, in the methodology.

I know that from personal experience: I now work at and have worked for multiple software vendors that have been frequently covered by analyst firms both positively and negatively. I’ve personally assisted in tuning for benchmarks including specs to present my companies’ best face.

There’s nothing wrong as a vendor in putting your best face forward and trying to encourage those analysts to support your point of view, But sometimes it goes further than that. We’ve all heard rumors that some analyst reports’ conclusions are influenced by which vendors are those analysts’ customers. And we’ve all seen reports on the habitual cheating on industry benchmarks in several segments of the hardware industry (it’s not just Volkswagen’s diesel engines), where they detect the benchmarks in use and change their behavior accordingly to look better than they are in the real world.

So, when you see those analyst reports or benchmark results, make sure you keep the following facts in mind when assessing how much you should trust them.

1. Analysts don’t run the software

The methodology for many of these reports consists of asking vendors questions, reading the documentation, and watching a demo. It is rare that the analysts actually use the software.

If you’ve ever been sold a piece of software then tried to use it, you know that vendor claims and actual usage are not the same thing.

Analysts’ common “ask the vendor” method results in a wide range of vendor claims with varying levels of veracity. It’s really more a test of the vendor’s sales process rather than actual product capabilities.

2. Analysts chase buzzwords

Remember SOA (service-oriented architecture)? At JBoss, I worked with a group of smart software developers, but not one of us knew what the hell it was compared to just modular design and web services. Only the VP of marketing could explain what it meant.

So, we just started labeling some of the products as SOA and using the term in our marketing information. Eventually, people settled that it had something to do with an enterprise service bus, so it got stamped on our ESB stuff.

You can be the leader of XYZ one year, and the very next year the analysts say XYZ is irrelevant it is now all about ABC. Never mind that ABC is mainly just a rebranding of XYZ with some minor technical interfaces that often don’t mean a hill of beans to customers.

Case in point: If SOA was so important, why isn’t there still a Gartner Magic Quadrant or Forrester Wave for it?

Analysts chase buzzwords, but they aren’t always aligned with real customer needs or projects.

3. The vendor “self-certifies”

Benchmarks like Spec’s are often self-certified vendor tests. Back in the day, JBoss got pelted on SpecjAppServer because ultimately this was more a test of database and container-managed persistence (CMP) than anything else. CMP was a terrible monster created by the Java2EE specifications that should have never existed (it flew in the face of how object relational mapping worked, with no real reason why).

Still, the benchmarks measured CMP, so you had to do well in them. The trick for doing that was to tune your CMP engine to get out of the way of the database and pump up database performance. Eventually, we got that trick working, but to my knowledge we didn't publish because by then no one cared about that benchmark.

Of course, these tests had nothing to do with anything. How the app server performed for a customer had nothing to do with those weird permutations of how the CMP handled particular joins, because in the real world customers didn’t do those things. Moreover the run rules restricted common tricks used to improve performance, so they didn’t reflect the real-world tricks (optimizations) that customers used.

As a result, vendors spend big money to “win” on benchmarks that don’t reflect what the customers will actually experience. Ironically, as long as customers keep taking the benchmarks seriously, the vendors will keep focusing on them.

How to get valid information from analysts and benchmarks

Clearly you can’t know everything, which is why you look for outside expertise and validation. Despite the issues I’ve outlined, you can get valid insights from third-party analysis and benchmarks.

Here’s what to investigate:

Methodology: No analyst report that is based merely on the analyst asking vendors questions, seeing sales demos, and reading the documentation is worth a whole lot. If they didn’t use the software and validate those results, they’re really just passing on vendor claims.

Relevance: If you didn’t already think, for example, that service-oriented architecture was important to you and your firm, don’t start giving it import just because an analyst now says it is relevant. Moreover, if analysts evaluate products based on their support of some random buzzword, be skeptical. Look for what’s clearly useful to you: Can it get my data in, apply a meaningful algorithm, and spit out decent results?

For example, if you are looking at messaging servers don’t pick one based on service oriented architecture or machine learning even if the vendors or analysts say “this is the biggest thing ever.” Chances are that a couple years from now they’ll focus on new buzzwords but you’ll still be running a message server.

Yes, new concepts and technologies do emerge, and they become legitimate evaluation criteria. You’ll know when that happens because you’ll be able to see what’s clearly useful to you in those new areas. Until you get that specificity and clarity, they’re just buzzwords.

Third-party certification: Unless someone else has validated the benchmarks or tests as both accurate and relevant, don’t believe the vendor’s claims. They know how to comply with the run rules but still dance around the edges of honesty. That also means the third party can’t have been paid by the vendor.

Evaluation: If the benchmark runnable or analysts’ evaluation criteria aren’t closely related to the things you care about, think twice. For a benchmark you might reuse their harness but put in your own runnables. Favor clear publicly available code to proprietary stuff you don’t understand. For an analyst report, either rework/reweight their criteria (Forrester Wave, for example, lets you play with its spreadsheet) or find some other report that lets you do that.

The moral of the story is to trust but verify. Do your own tests, remember that buzzwords are nonsense, and know what you’re reading. Go beyond the summary page, double-check the methodology, and assume the headline is just an ad to get you to buy the report or view the benchmark.

Related: