Performance testing is an umbrella term used for highly transactional types of tests. It is a general definition used to encapsulate Stress, Volume, Load, Soak, Spike and Failover testing to name a few. What follows is an attempt to define, in order of priority, the generalised importance of the different types of performance tests:
Test Type: Benchmark Testing
Definition: A set of component tests that aim to measure key sets of business functionality either in isolation or collectively.
Objective: To help quickly identity any areas responsible for performance degradation.
Test Type: Load Testing
Definition: The simulation of anticipated live patterns of behaviour on the planned site
Objective: Gain confidence the system is able to go live with acceptable response times
Test Type: SOAK Testing
Definition: The simulation of anticipated live patterns of behaviour on the planned site over an extended period
Objective: To measure stability of the application, check response times do not degrade over an extended period
Test Type: Volume Testing
Definition: The submission/simulation of a large number of entities entering the system as the result of a pre-determined job.
Objective: To measure the behaviour of the system when subjected to a high volume of pending within the system
Test Type: Stress Testing
Definition: To find the point at which the system becomes unusable to actors upon the system
Objective: Find the point and capacity at which the system becomes unusable.
There are many other different types of performance testing – reliability, resilience, spike and capacity testing, plus more. But I’ve found that the above tests only need a slight adaptation to cater for these.
“Break it down and make it simpler”. I find this helps focus my attention when confronted with a complex scenario that needs to be simulated. Benchmark testing is Priority 1 for myself. After I have agreed a picture of live behaviour on the system I decompose the target Load Test into a series of isolated benchmark tests. This allows me to concentrate on creating repeatable performance tests for key areas of business functionality. I can then think about the amount of effort required to set up test data and users and also quiz developers on behaviour of the application, Does it cache items? What are the cache timeouts etc?
The ultimate aim of the benchmark testing is to have a completely repeatable set of consistant metrics. If you run the same test twice and the metric’s deviate substantiality the test needs reworking. You need to have 100% confidence in your tests ability to produce repeatable metrics, why? Because it isn’t only code that can change the behaviour of the system; any system settings, DB config, lost indexes, a slightly incorrect deployment. Any mistake can affect your tests. When you report a negative metric that involves effort on other people’s part to investigate you will find all of a sudden people have a strong interest in your tests . this is where confidence in your tests and your ability to the development team is made. I’ll write more on benchmark testing in another article,
Load Testing: – What do we need to simulate to go live with?
Once you have a set of benchmark tests it should only take a little work to layer these together, change the transaction rates and run a combined test. I call this approach the ‘layered approach’. The benchmark tests are a decomposition of a live model into a set of simpler repeatable tests. Once layered together this should give you an accurate representation of live. I always attempt to achieve repeatable metrics with Load Tests – if you can’t measure accurately you can’t report accurately. With load testing you should focus on simulating the volumes of transactions entering the system. Don’t get too caught up on precisely modelling live behaviour, an approximation is perfectly acceptable (more on this after). Companies often talk about load testing for traffic in the distant future – 6 mths, 1 year and 2 years. Its very easy to lose focus. I concentrate on “what traffic do we need to simulate in order to go live on the planned date” – If that can’t be achieved the future load definitely cannot.
SOAK Testing – Think about transactional throughput
If the performance tests have been structured correctly then the SOAK test will be your load test run for an extended period. This should highlight any stability issues associated with build. Key to this type of test is monitoring CPU, Memory and transaction response times. If the SOAK test is taking too long then adapt. I was performance consulting at a company, which attempted to run a SOAK test over a period of 3 days. This simply wasn’t practical: it tied up resources, extended timelines substantially and was prone to error. After inspection of the test, we realised the CPU of the system was extremely low. The performance testers had concentrated too much on simulating live e.g. 1 particular transaction type every 30 mins. We took a look at the tests and then substantially raised the transaction rates of different types of business flows. Three days of transactions were comfortably squeezed into 10 hours whilst keeping CPU below 60%. A Three day test could now be run overnight giving all the associated benefits. The moral of the story is this – don’t get too hung up on simulating exact live volumes for extended periods. Think about the transactional volume you wish to simulate over the period and adapt the test sensibly. Remember the goal of soak testing is to test stability of the application.
Volume Testing – Don’t forget the jobs!
This generally refers to the simulation of batch processes into or within the system – this may be a end of day financial activity or intraday activity. Talk to the business and find out when these jobs run and if you need to simulate them. Run them in isolation and then layer them onto you Load Test – check for a difference in the metrics being reported. Key to this is ensuring you can re-run the volume test again an again – find any details of flags and DB’s that you may have to re-set in order to re-run the test with the same volumes.
Stress Testing – Useful, but not that important
This is the test that everyone seems to get fixated with, maybe it the name – maybe its just pure boyish interest in attempting to blow the system apart but in my experience this tends to be the least useful test of those outlined. Stress testing is often defined as “finding the point at which the system breaks or fails” … to me this slightly misleading, to me stress testing should be defined as “finding the point at which the system become unusable to the actors on the system” – not when it breaks down. Stress testing is simple if you have built a good load test – you can simply keep increasing the number of users on the system until you observe unacceptable response times. Benefits of Stress testing include:
- Finding capacity figures of the system so the business can plan ahead,
- Finding the limits of the system when the business cannot give projected volumes
- Finding areas of application contention so development can plan for scalability
- Spike testing – measuring the behaviour of the application when subjected to an extreme increase in transactional volumes
I’m not saying stress testing isn’t useful – it just tends to be the performance test I execute least but the test that generates most interest.