I’ve been meaning to writing a short section on Performance Benchmark Testing for some time. I’ve come across a few situations recently where this is the ideal solution. Benchmark performance testing is the most powerful and commonly used type of test in the Performance Testers toolkit. So lets start with a definition of what benchmark testing is:
“A performance benchmark test will give a repeatable set of quantifiable results from which present and future releases for specific functionality can be reliably compared or baselined against.”
That short sentence neatly captures the essence of this type of test but fails to convey the enormous benefits. In plain english, this means the same test can be executed again and again and a predictable set of metrics will be produced. This gives the performance tester a “benchmark” or set of baselines. When a software product is updated the benchmark test(s) can be reapplied and the effects measured. The test allows an objective measurement of the applications performance – has it improved or degraded? It could be said that a benchmark test is the performance testers equivalent of a functional regression pack.
So what makes good Performance Benchmark Test?
Consistency and control is the short answer. Here are some general rules for setting up good benchmark tests:
- Control the Transaction Rate
- Understand the architecture – e.g. Warming up the system before you begin to test (e.g. caches). You may even decided to turn off caching in some instances
- Have a good breadth and depth of test data. Note: It is also ineffective being completely random if you cannot repeat the randomness reliably
- Have a known quantity of Users with appropriate and well known profile settings
- Ensure all the initial static test data can be recreated for each subsequent test
- Have “reset” processes where necessary – if you keep adding data to the DB you may skew your own benchmarks.
- Engineer tests that will hit the system at a transaction rate that is making it ‘sweat’. e.g. Is it effective hitting a system at 12 transactions per second (TPS) if corresponding CPU is at 5% – this isn’t going to highlight differences.
- Have a high statistical confidence in your derived metrics by having a corresponding associated high input transaction rate (or volume for batch testing).
- Don’t be afraid of splitting out the elements of performance benchmarks tests separately (I side effects of propecia term this isolated benchmark testing). Discrete functionality can then be targeted by the separate benchmark performance tests.
- Layer isolated benchmark tests as required, this increases complexity and moves to a more realistic overall performance model that can also be benchmarked.
- Measurement of not just the average response time, but the 95% percentile, number of transactions and any associated CPU graphs for an indication or improvement or degradation (more on this in part two).
Every system is different, so some of the above rules may not apply. The trick is to exert as much control over the benchmark test as possible so you can have 100 percent confidence in the test. Why? Benchmark Tests are unique – they are similar to an ill patient. When the results show a degradation there can be a multitude of unobvious culprits. Its takes time, effort and precious project resources to help diagnose the underlying causes. By pulling resources into an investigation, spotlight and profile is given to performance testing. The tests and all aspects then become the subject of intense focus. This can be viewed as a positive; rigorous inspection will help strengthen understanding and quality of the constructed tests.
Performance Benchmark Testing – why are they so powerful?
In my experience Benchmark testing exposes the maturely of a companies SDLC. If any aspect of the SDLC is weak then there will be problems with the benchmark results. Software, OS, and hardware deployment can all effect the results. This is a blessing and a time consuming curse. Days can be spent diagnosing the effects only to find someone has inadvertently set an obscure OS/DB parameter incorrectly. Also:
- Provides a reliable indicator of application performance for overnight builds
- Diagnoses expensive issues early and reliably before application release
- If daily baseline tests are desired – then this promotes a movement towards a Continuous Integration framework. This is one of the most effective ways to improve software application quality and delivery.
That’s performance benchmark testing in a nutshell. In my next benchmarking article I will describe how I solved the issue of running 15+ individual benchmark tests against each release and then quickly compared the 100’s of accumulated metrics against previous results without having to manually inspect graphs.