Institute of Theoretical Computer Science POPL'19 Project Page


Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis

Roland Meyer and Sebastian Wolff, TU Braunschweig, Germany

Accepted for POPL'19.

Abstract. Verification of concurrent data structures is one of the most challenging tasks in software verification. The topic has received considerable attention over the course of the last decade. Nevertheless, human-driven techniques remain cumbersome and notoriously difficult while automated approaches suffer from limited applicability. The main obstacle for automation is the complexity of concurrent data structures. This is particularly true in the absence of garbage collection. The intricacy of lock-free memory management paired with the complexity of concurrent data structures makes automated verification prohibitive.
In this work we present a method for verifying concurrent data structures and their memory management separately. We suggest two simpler verification tasks that imply the correctness of the data structure. The first task establishes an over-approximation of the reclamation behavior of the memory management. The second task exploits this over-approximation to verify the data structure without the need to consider the implementation of the memory management itself. To make the resulting verification tasks tractable for automated techniques, we establish a second result. We show that a verification tool needs to consider only executions where a single memory location is reused. We implemented our approach and were able to verify linearizability of Michael&Scott's queue and the DGLM queue for both hazard pointers and epoch-based reclamation. To the best of our knowledge, we are the first to verify such implementations fully automatically.


CCS Concepts: • Theory of computation → Data structures design and analysis; Program verification; Shared memory algorithms; Program specifications; Program analysis;

Additional Key Words and Phrases: static analysis, lock-free data structures, verification, linearizability, safe memory reclamation, memory management

ACM Reference Format:
Roland Meyer and Sebastian Wolff. 2019. Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis. Proc. ACM Program. Lang. 3, POPL, Article 58 (January 2019), 31 pages. https://doi.org/10.1145/3290371


Tool Overview

Our paper presents an analysis for verifying lock-free data structures that use safe memory reclamation (SMR) techniques. We follow a compositional approach which factors out the verification of the SMR implementation. Consequently, we have two distinct tools for:

The SMR specification is given in form of an observer automata (hard coded into our tools). For data structures, we verify liniearizability (we assume that programs are annotated with linearization points).

We submitted our tool to the POPL'19 artifact evaluation where it received the Artifact Evaluated -- Reusable badge.

In the this page we discuss how to reproduce our experiments. Therefore, we walk through the installation, compilation, and benchmarking. In the end, we briefly link the concepts and techniques from the paper to our implementation.

Tool Limitation

Our tools do not come with a front end (aka parser). To analyse programs not included in our benchmarks, one has to compile an appropriate binary first. If desired, consider the provided benchmark implementations. For example, consider:

Benchmark program code is constructed in those files using intuitive shortcuts (a complete list can be found in TMRexp-POPL19/src/prog.hpp, lines 655f).


Installation and Compilation

This section describes how to setup your system to repeat the benchmarks from the paper.

Quick Start. Download the Virtual Machine Snapshot (created with Virtual Box, credentials in ~/Desktop/pw.txt). The artifact is located at ~/popl19artifact/. The folder ~/popl19artifact/Binaries contains executables corresponding to the benchmarks from the paper. If you choose this option you can skip the remainder of this section.

Download

The source code can be downloaded from GitHub here:

For a unified description of the installation/compilation for both tools, we assume that unzipping yields a folder named TMRexp-POPL19.

System Requirements

To compile this project you need CMake, a C++17 compatible compiler, and Boost multi_array. Compilation was tested with Apple LLVM v.9.1.0, Apple LLVM v.10.0.0, Clang v.5.0, and Clang v.6.0.

Compilation

To build the project, simply type make in the root directory of the downloaded sources.


					cd path/to/TMRexp-POPL19
					make
			

This creates a folder build/bin containing all executables.

Alternatively, you can do the following (for example, to specify the compiler to be used):


					mkdir build
					cd build
					cmake .. -DCMAKE_CXX_COMPILER=/path/to/CXX-compiler
					make
			

Again, folder build/bin contains all executables.


Benchmarking chkds

We describe how to repeat the benchmarks for verifying lock-free data structures relative to an SMR implementation from the paper.

Quick Start. Using the VM from above, execute ~/popl19artifact/Binaries/<Program>_<SMR> to run the benchmark for Program using SMR. When compiling on your own, execute TMRexp-POPL19/build/bin/<Program>_<SMR> (for the DGLM Queue see the discussion below).

Benchmark Programs

We evaluated our tool on a Coarse Stack, a Coarse Queue, Treiber's Stack, Michael&Scott's Queue, and the DGLM queue using no SMR (N), No Reclamation/Garbage Collection (G), Hazard Pointers (H), and Epoch-Based Reclamation (E). C-style pseudo code is given in the following. We mark lines that are required only for a specific SMR by the aforementioned letters N, G, H, and E.

Running Benchmarks

To run the benchmarks got to the build/bin/ folder and execute the corresponding binary. The naming scheme is <DS>_<SMR> where <DS> is the name of the benchmark program and <SMR> is the reclamation scheme.

For example, to benchmark the running example from the paper, Michael&Scott's Queue using Hazard Pointers (HP), you have to do the following:


				cd path/to/TMRexp-POPL19/build/bin
				./MichaelScottQueue_HP
			

The DGLM Queue

As stated in the paper, the implemented shape abstraction requires hints to verify the DGLM Queue. To enable/disable hinting you have to set the corresponding flag in src/config.hpp and recompile the tool. More precisely, you have to change line 22 from


				#define DGLM_HINT false
			

to


				#define DGLM_HINT true
			

Beware: using DGLM_HINT=true with any benchmark other than the DGLM Queue exhibits undefined behavior!

For users of the VM image: the folder ~/popl19artifact/Binaries/ already contains the properly compiled binaries. The binaries where compiled from ~/popl19artifact/chkds/ and ~/popl19artifact/chkds-DGLM/ for all benchmarks except the DGLM Queue and the DGLM Queue, respectively.

Expected Benchmarks Results

As reported in the paper, running the benchmarks on an Intel Xeon X5650@2.67GHz with Ubuntu 16.04 and using Clang version 5.0 we expect the following running times and verification results.

Program SMR Time Verif. States ABAs Time ABA Linearizable
Coarse Stack GC 0.44s 300 0 0s yes
None 0.5s 300 0 0s yes
Coarse Queue GC 1.7s 300 0 0s yes
None 1.7s 300 0 0s yes
Treiber's Stack GC 3.2s 806 0 0s yes
EBR 16s 1822 0 0s yes
HP 19s 2606 186 0.06s yes
Opt. Treiber's Stack HP 0.8s - - - no*
Michael&Scott's Queue GC 414s 2202 0 0s yes
EBR 2630s 7613 0 0s yes
HP 7075s 19028 536 0.9s yes
DGLM Queue GC 714s 9934 0 0s yes**
EBR 3754s 27132 0 0s yes**
HP 7010s 41753 2824 26s yes**

* Pointer Race due to an ABA in push makes the verification fail.
** Imprecision in the heap abstraction required hinting that Head cannot overtake Tail by more than one node.


Benchmarking chksmr

We describe how to repeat the benchmarks for verifying SMR implementations from the paper.

Quick Start. Using the VM from above, execute ~/popl19artifact/Binaries/<SMR> to run the benchmark for the SMR implementation SMR. When compiling on your own, execute TMRexp-POPL19/build/bin/<SMR>.

Benchmark Programs

We evaluated our tool on an Hazard Pointer (HP) and an Epoch-based Reclamation (EBR) implementation both of which support dynamic joining/parting of threads. C-style pseudo code is given in the following.

Running Benchmarks

To run the benchmarks got to the build/bin/ folder and execute the corresponding binary. The naming scheme is <SMR>_<Info> where <SMR> is the (abbreviated) name of the benchmark program and <Info> is a description of the implementation (we have various implementations for each class).

For example, to benchmark the running example from the paper, the Hazard Pointer implementation, you have to do the following:


				cd path/to/TMRexp-POPL19/build/bin
				./HPimpl
			

Expected Benchmarks Results

As reported in the paper, running the benchmarks on an Intel Xeon X5650@2.67GHz with Ubuntu 16.04 and using Clang version 5.0 we expect the following running times and verification results.

Program Observer Time States Correct
Hazard Pointers OBase x OHP 1.5s 5437 yes
Epoch-based Reclamation OBase x OEBR 11.2s 11528 yes

Tool Output

When executing a binary from above, our tool prints the following:

  1. The input program
  2. Progress of the fixed point computation
  3. Verification verdict
  4. Summary (gist)

For an example, the output of chsmr HPimpl has the following form:


      post image...     [#enc] done! [#enc=0.9k, #step=0k, #steptotal=0k]
      interference...   [32 buckets][#bucketsize][198][129][130][129][4][59][59][59][105][105][70][70][4][59][59][59][59][59][17][17][105][105][59][59][101][37][70][70][17][17][37][67] done! [#enc=2.2k, #step=85k, #steptotal=40k]
      post image...     [#enc][2k][3k][4k][4k] done! [#enc=5.0k, #step=4k, #steptotal=5k]
      interference...   [32 buckets][#bucketsize][274][253][255][253][79][156][156][156][223][223][157][159][79][156][156][156][154][154][93][95][223][223][154][154][217][107][157][157][93][93][105][153] done! [#enc=5.2k, #step=172k, #steptotal=96k]
      post image...     [#enc] done! [#enc=5.4k, #step=0k, #steptotal=5k]
      interference...   [32 buckets][#bucketsize][274][253][255][253][156][156][156][156][223][223][159][159][156][156][156][156][154][154][95][95][223][223][154][154][217][135][157][157][93][93][133][153] done! [#enc=5.4k, #step=173k, #steptotal=151k]
      post image...     [#enc] done! [#enc=5.4k, #step=0k, #steptotal=5k]
      interference...   [32 buckets][#bucketsize][274][253][255][253][156][156][156][156][223][223][159][159][156][156][156][156][154][154][95][95][223][223][154][154][217][135][157][157][93][93][133][153] done! [#enc=5.4k, #step=173k, #steptotal=207k]
			

The output tells you that sequential steps (post image...) and interference steps (interference...) are executed in rounds. Sequential steps print the current size of the fixed point in regular intervals. For example, in the third line we have [#enc][2k] meaning that there are 2000 views at the time of the output. Interference steps print the number of buckets. The explored views are stored in buckets to reduce the search space of interference steps. Views from different buckets are guaranteed to not match, i.e., are guaranteed to not require interference. In the second line, for example, we have [32 buckets][#bucketsize][274]. This states that there are 32 different buckets and that the first bucket contains 274 views.

The remaining output is straight forward.


Notes on the Implementation

The following tables relate (informal) descriptions of the implemented analysis from the paper to the implementation.

General chkds chksmr

Paper Implementation Description
Views src/cfg.hpp Views are implemented by the class Cfg. They carry, among other things, a shape abstraction of the memory, validity information, and observer states.
Thread-modular fixed point src/fixpoint.cpp The fixed point computation maintains a collection of views the post image of which may not have been explored. The post image of these views is computed and added to the fixed point. The implementation proceeds in rounds. In each round sequential steps are performed exhaustively first and then interference steps are applied.
(There is an alternative fixed point computation using a work list approach. It can be activated by setting the WORKLIST_INTERFERENCE to true in the configuration file.)
Shape abstraction src/shape.hpp Class Shape implements a container for a shape abstraction of a specific memory layout. More details on the shape abstraction can be found in Abdulla et al. [2013,2017].
src/post/*.cpp Contains post-image computation for the shape abstraction for every supported statement type.
Interference src/fixp/interference.cpp Implements interference steps. Such a step considers all pairs of previously explored views, combines them into a two-thread view, computes a post image for one of the threads, and projects the result to the other thread. The resulting views are added to the fixed point.
Observers src/observer.hpp Class Observer implements observer automata as presented in the paper.
test/ObserverFactory.hpp Creates observer instances along the lines of the paper.

Data Structure Verification chkds

Paper Implementation Description
Views src/cfg.hpp Besides the above, also stores freedness information of the single reusable cell.
Elision support test/ObserverFactory.hpp The implementation relies on Lemma 6.1, thus does not check if the observer supports elision (Definition 5.14).
Environment free src/post/mem.cpp Following the semantics for environment frees, rule (Free), the implementation checks which frees can be executed by checking whether the observer reaches a final state upon consuming a free event.
Reallocation of a single address src/post/mem.cpp To identify the single address available for reuse, shapes carry a special ghost pointer to the reusable cell. Moreover, views carry a flag stating whether or not the cell is freed. If so, an allocation can yield a pointer to that cell. Otherwise, the allocation can only yield fresh cells.
Validity src/cfg.hpp Views (class Cfg) carry validity information about pointers.
src/post/*.cpp The post image operator updates the validity according to the definition in the paper.
Pointer race check src/post.cpp The post image operators check whether an unsafe access or a racy SMR call is performed. The latter relies on Lemma 6.2 and deems racy calls to retire that use an invalid pointer.
Harmful ABA check src/chkaware.cpp Checks for every view where an invalid pointer can be compared in an assert for harmful ABAs. The implementation follows the description in the paper.

SMR Verification chksmr

Paper Implementation Description
Views src/cfg.hpp Besides the above, also stores data (set) abstractions.
Retain records src/shape.hpp Stores the myRecord pointer of a non-view thread.
Last retire test/ObserverFactory.hpp Introduces redundant observer locations to judge whether or not the observed thread performed the last retire call.
src/fixp/interference.cpp Prunes interference among views for identical threads.
Data (set) abstraction src/post/*.cpp Contains post-image computation for the data abstraction for every supported statement type.