Software Verification for Weak Memory via Program Transformation


Multiprocessors implement weak memory models, but program verifiers often assume Sequential Consistency (SC), and thus may miss bugs due to weak memory. We propose a sound transformation of the program to verify, enabling SC tools to perform verification w.r.t. weak memory. We present experiments for a broad variety of models (from x86-TSO to Power) and a vast range of verification tools, quantify the additional cost of the transformation and highlight the cases when we can drastically reduce it. Our benchmarks include work-queue management code from PostgreSQL.

Dec 1st 2012Corrected machine definition on Coq proofs.
Oct 5th 2012New sound, optimised strategy for instrumenting critical cycles implemented in Goto-instrument
Jul 4th 2012We now analyse Apache's fdqueue module
Jun 24th 2012Concurrent CBMC added to the set of model checkers used for the experiments
Feb 22nd 2012We report distinct results for Poirot with integer vs. bitvector arithmetic
Feb 8th 2012SatAbs is now able to handle RCU!
Jan 23rd 2012Release of the tool

Paper and proofs

Paper (ESOP 2013), arXiv version and proofs of the paper, formalised in Coq.

Tool manual

Here is the tool, its manual and how to plug in a model checker (we have tried CheckFence, ESBMC, MMChecker, Poirot, SatAbs, Threader and our new CBMC-MT).

Case Study: Fixing a WMM bug in PostgreSQL

We show here how we can detect this bug for Power reported by PostgreSQL developers.

We then test their fix (the reader should search for the comment "XXX there really ought to be a memory barrier operation right here" in the diff). We show that this fix might not be sufficient, we provide a counter-example generated by our tool, and we propose an additional fix, which provably fixes the problem.

Summary of systematic testing for CBMC

Here are the average runtimes and overheads for CBMC, on all litmus tests. We provide the same data for all model checkers, on all litmus tests but also on C examples below.

The following scatter plot (click for more details and an enlarged version) depicts the execution time of running CBMC on a program whose all accesses are instrumented, as proposed by Atig et al., vs. execution times for the optimised instrumentation using weighted selection under TSO. Any dot below the diagonal line marks a case where full instrumentation is slower than the optimised one.

Experimental Results

Here are all our experimental data, for all model-checkers, on all litmus tests and C examples, including standard ones, borrowed ones, the PostgreSQL bug and RCU.

As we can independently compute the expected verification result for all litmus tests, we are able to assess the soundness of each of the verification tool. The following graphics provide an executive summary of this assessment (click to zoom in).