An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment (MSR 2018 - Technical Papers)

Who

Christoph Laaber, Philipp Leitner

Track

MSR 2018 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 May 2018 11:00 - 11:17 at E4 room - CI and Release Engineering Chair(s): Shane McIntosh

Abstract

Continuous integration (CI) emphasizes quick feedback to developers. This is at odds with current practice of performance testing, which predominantly focuses on long-running tests against entire systems in production-like environments. Alternatively, software microbenchmarking attempts to establish a performance baseline for small code fragments in short time. This paper investigates the quality of microbenchmark suites with a focus on suitability to deliver quick performance feedback and CI integration. We study ten open-source libraries written in Java and Go with benchmark-suite sizes ranging from 16 to 983 tests, and runtimes between 11 minutes and 8.75 hours. We show that our study subjects include benchmarks with result variability of 50% or higher, indicating that not all benchmarks are useful for reliable discovery of slowdowns. We further artificially inject actual slowdowns into public API methods of the study subjects and test whether test suites are able to discover them. We introduce a performance-test quality metric called the API benchmarking score (ABS). ABS represents a benchmark suite’s ability to find slowdowns among a set of defined core API methods. Resulting benchmarking scores (i.e., fraction of discovered slowdowns) vary between 10% and 100% for the study subjects. This paper’s methodology and results can be used to (1) assess the quality of existing microbenchmark suites, (2) select a set of tests to be run as part of CI, and (3) suggest or generate benchmarks for currently untested parts of an API.

Link to Preprint

http://www.ifi.uzh.ch/dam/jcr:ccf1399a-2d57-4ff9-a3b0-59d69616d5d3/msr18-author-version.pdf

DOI

https://doi.org/10.1145/3196398.3196407

Christoph LaaberAuthor

University of Zurich

Switzerland

Philipp LeitnerAuthor

Chalmers | University of Gothenburg

Replication Package