Visualizing performance coverage | oneillscrossing.com

Regular performance testing of software releases yields a tremendous volume of comparative data that can be plotted to illustrate many trends in release over release performance and coverage. Take the following plot:

release 6
switch DB
release 5
release 4
release 3
release 2
release 1

In this example we have represented 6 release cycles (titled “release n”) and one extended tuning cycle associated with a switch in the primary database platform. The information contained in this plot includes:

Each vertical bar represents the comparison of a set of experiments, the current release versus the value from the previous release’s baseline value.
“Up” gray bars represent relative improvements in throughput of up to 20% (maximum vertical value).
“Down” red bars represent relative decays in throughput of up to 20% (minimum vertical value).
Tan central bards illustrate +/- 1%.
The horizontal length of a test set represents test coverage – the approximate number of workload scenarios tested for a particular release.

Some interesting trends can be observed across years of testing. First, a continuous investment in automation yielded steady improvements in the coverage of the product, with new workload scenarios injected into each successive release test cycle that result in an increase in the width of a release micro plot. The use of gray up bars and red down bars allows for a quick assessment of the results from a particular release. Specific components targeted for performance improvements can be seen in blocks of “up” bars, as seen in the latter half of the release 2 series. Negative results can be seen with patches of red areas as in release 4, where a specific change was injected in the product for HA purposes that added latency to a specific transaction, resulting in a corresponding drop in throughput. This was an expected degradation.

This technique can also be used to illustrate the effort leveled against a major tuning exercise like changing the underlying supported database. In this example, “switch DB” represents the lab tuning cycles for a reference workload as different database parameters on the new DB were varied to determine the optimal configuration for the new DB. Eventually positive gains were recorded for this reference workload and the full test set was executed against the configuration (release 6). In this case more workloads were executed against the new DB yielding large areas of relative improvements as well as areas of degradations.

This small graphic summarizes years of test data spanning nearly 1,000 tests and thousands of lab hours.