Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis

Roland Meyer and Sebastian Wolff, TU Braunschweig, Germany

Presented at POPL'19.

Abstract. Verification of concurrent data structures is one of the most challenging tasks in software verification. The topic has received considerable attention over the course of the last decade. Nevertheless, human-driven techniques remain cumbersome and notoriously difficult while automated approaches suffer from limited applicability. The main obstacle for automation is the complexity of concurrent data structures. This is particularly true in the absence of garbage collection. The intricacy of lock-free memory management paired with the complexity of concurrent data structures makes automated verification prohibitive.
In this work we present a method for verifying concurrent data structures and their memory management separately. We suggest two simpler verification tasks that imply the correctness of the data structure. The first task establishes an over-approximation of the reclamation behavior of the memory management. The second task exploits this over-approximation to verify the data structure without the need to consider the implementation of the memory management itself. To make the resulting verification tasks tractable for automated techniques, we establish a second result. We show that a verification tool needs to consider only executions where a single memory location is reused. We implemented our approach and were able to verify linearizability of Michael&Scott's queue and the DGLM queue for both hazard pointers and epoch-based reclamation. To the best of our knowledge, we are the first to verify such implementations fully automatically.

Paper Slides Technical Report DOI Virtual Machine

CCS Concepts: • Theory of computation → Data structures design and analysis; Program verification; Shared memory algorithms; Program specifications; Program analysis;

Additional Key Words and Phrases: static analysis, lock-free data structures, verification, linearizability, safe memory reclamation, memory management

ACM Reference Format:
Roland Meyer and Sebastian Wolff. 2019. Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis. Proc. ACM Program. Lang. 3, POPL, Article 58 (January 2019), 31 pages. https://doi.org/10.1145/3290371

Tool Overview

Our paper presents an analysis for verifying lock-free data structures that use safe memory reclamation (SMR) techniques. We follow a compositional approach which factors out the verification of the SMR implementation. Consequently, we have two distinct tools for:

chkds verifying lock-free data structures relative to an SMR specification, and
chksmr verifying an SMR implementation against an SMR specification.

The SMR specification is given in form of an observer automata (hard coded into our tools). For data structures, we verify liniearizability (we assume that programs are annotated with linearization points).

We submitted our tool to the POPL'19 artifact evaluation where it received the Artifact Evaluated -- Reusable badge.

In the this page we discuss how to reproduce our experiments. Therefore, we walk through the installation, compilation, and benchmarking. In the end, we briefly link the concepts and techniques from the paper to our implementation.

Tool Limitation

Our tools do not come with a front end (aka parser). To analyse programs not included in our benchmarks, one has to compile an appropriate binary first. If desired, consider the provided benchmark implementations. For example, consider:

chkds TMRexp-POPL19/test/HP/MichaelScottQueue.cpp, or
chksmr TMRexp-POPL19/test/HP/Stack.cpp.

Benchmark program code is constructed in those files using intuitive shortcuts (a complete list can be found in TMRexp-POPL19/src/prog.hpp, lines 655f).

Installation and Compilation

This section describes how to setup your system to repeat the benchmarks from the paper.

Quick Start. Download the Virtual Machine Snapshot (created with Virtual Box, credentials in ~/Desktop/pw.txt). The artifact is located at ~/popl19artifact/. The folder ~/popl19artifact/Binaries contains executables corresponding to the benchmarks from the paper. If you choose this option you can skip the remainder of this section.

Download

The source code can be downloaded from GitHub here:

chkds GitHub release with tag POPL19-chkds
chksmr GitHub release with tag POPL19-chksmr

For a unified description of the installation/compilation for both tools, we assume that unzipping yields a folder named TMRexp-POPL19.

System Requirements

To compile this project you need CMake, a C++17 compatible compiler, and Boost multi_array. Compilation was tested with Apple LLVM v.9.1.0, Apple LLVM v.10.0.0, Clang v.5.0, and Clang v.6.0.

Compilation

To build the project, simply type make in the root directory of the downloaded sources.


					cd path/to/TMRexp-POPL19
					make

This creates a folder build/bin containing all executables.

Alternatively, you can do the following (for example, to specify the compiler to be used):


					mkdir build
					cd build
					cmake .. -DCMAKE_CXX_COMPILER=/path/to/CXX-compiler
					make

Again, folder build/bin contains all executables.

Benchmarking `chkds`

We describe how to repeat the benchmarks for verifying lock-free data structures relative to an SMR implementation from the paper.

Quick Start. Using the VM from above, execute ~/popl19artifact/Binaries/<Program>_<SMR> to run the benchmark for Program using SMR. When compiling on your own, execute TMRexp-POPL19/build/bin/<Program>_<SMR> (for the DGLM Queue see the discussion below).

Benchmark Programs

We evaluated our tool on a Coarse Stack, a Coarse Queue, Treiber's Stack, Michael&Scott's Queue, and the DGLM queue using no SMR (N), No Reclamation/Garbage Collection (G), Hazard Pointers (H), and Epoch-Based Reclamation (E). C-style pseudo code is given in the following. We mark lines that are required only for a specific SMR by the aforementioned letters N, G, H, and E.


      // binary name: CoarseStack_GC, CoarseStack_NoSMR

      struct Node {
          data_t data;
          Node* next;
      };
      
      shared Node* ToS;
      
      atomic init() {
          ToS = NULL;
      }
      
      void push(data_t input) {
          Node* node = new Node();
          node->data = input;
          atomic {
              node->next = ToS;
              ToS = node;
          }
      }
      
      data_t pop() {
          atomic {
              Node* top = ToS;
              if (top == NULL) return EMPTY;
              data_t output = top->data;
              ToS = Tos->next;
  GN          retire(top);
              return output;
          }
      }


      // binary name: CoarseQueue_GC, CoarseQueue_NoSMR

      struct Node {
          data_t data;
          Node* next;
      };
      
      shared Node* Head, Tail;
      
      atomic init() {
          Head = new Node();
          Head->next = NULL;
          Tail = Head;
      }
      
      void enqueue(data_t input) {
          Node* node = new Node();
          node->data = input;
          node->next = NULL;
          atomic {
              Tail->next = node;
              Tail = node;
          }
      }
      
      data_t dequeue() {
          atomic {
              Node* head = Head;
              Node* next = head->next;
              if (next == NULL) return EMPTY;
              data_t output = next->data;
              Head = next;
  GN          retire(head);
              return output;
          }
      }


      // binary name: TreibersStack_GC, TreibersStack_EBR, TreibersStack_HP

      struct Node {
          data_t data;
          Node* next;
      };
      
      shared Node* ToS;
      
      atomic init() {
          ToS = NULL;
      }
      
      void push(data_t input) {
  E       leaveQ();
          Node* node = new Node();
          node->data = input;
          while (true) {
              Node* top = ToS;
  H           protect(top, 0);
  H           if (top != ToS) continue;
              node->next = top;
              if (CAS(&ToS, top, node))
                  break
          }
  H       unprotect(0);
  E       enterQ();
      }
      
      data_t pop() {
  E       leaveQ();
          while (true) {
              Node* top = ToS;
              if (top == NULL) return EMPTY;
  H           protect(top, 0);
  H           if (top != ToS) continue;
              Node* next = top->next;
              if (CAS(&ToS, top, next)) {
                  data_t output = top->data;
  HEG             retire(top);
  H               unprotect(0);
  E               enterQ();
                  return output;
              }
          }
      }


      // binary name: OptTreibersStack_HP

      struct Node {
          data_t data;
          Node* next;
      };
      
      shared Node* ToS;
      
      atomic init() {
          ToS = NULL;
      }
      
      void push(data_t input) {
          Node* node = new Node();
          node->data = input;
          while (true) {
              Node* top = ToS;
              node->next = top;
              if (CAS(&ToS, top, node))
                  break
          }
      }
      
      data_t pop() {
          while (true) {
              Node* top = ToS;
              if (top == NULL) return EMPTY;
  H           protect(top, 0);
  H           if (top != ToS) continue;
              Node* next = top->next;
              if (CAS(&ToS, top, next)) {
                  data_t output = top->data;
  H               retire(top);
  H               unprotect(0);
                  return output;
              }
          }
      }


      // binary name: MichaelScottQueue_GC, MichaelScottQueue_EBR, MichaelScottQueue_HP

      struct Node {
          data_t data;
          Node* next;
      };
      
      shared Node* Head, Tail;
      
      atomic init() {
          Head = new Node();
          Head->next = NULL;
          Tail = Head;
      }
      
      void enqueue(data_t input) {
  E       leaveQ;
          Node* node = new Node();
          node->data = input;
          node->next = NULL;
          while (true) {
              Node* tail = Tail;
  H           protect(tail, 0);
  H           if (tail != Tail) continue;
              Node* next = tail->next;
              if (tail != Tail) continue;
              if (next != NULL) {
                  CAS(&Tail, tail, next);
                  continue;
              }
              if (CAS(&tail->next, next, node))
                  break
          }
          CAS(&Tail, tail, node);
  E       enterQ;
  H       unprotect(0);
      }
      
      data_t dequeue() {
  E       leaveQ;
          while (true) {
              Node* head = Head;
  H           protect(head, 0);
  H           if (head != Head) continue;
              Node* tail = Tail;
              Node* next = head->next;
  H           protect(next, 1);
              if (head != Head) continue;
              if (next == NULL) return EMPTY;
              if (head == tail) {
                  CAS(&Tail, tail, next);
                  continue;
              } else {
                  data_t output = next->data;
                  if (CAS(&Head, head, next)) {
  HEG             retire(head);
  E                   enterQ;
  H                   unprotect(0);
  H                   unprotect(1);
                      return output;
                  }
              }
          }
      }


      // binary name: DGLMQueue_GC, DGLMQueue_EBR, DGLMQueue_HP

      struct Node {
          data_t data;
          Node* next;
      };
      
      shared Node* Head, Tail;
      
      atomic init() {
          Head = new Node();
          Head->next = NULL;
          Tail = Head;
      }
      
      void enqueue(data_t input) {
  E       leaveQ;
          Node* node = new Node();
          node->data = input;
          node->next = NULL;
          while (true) {
              Node* tail = Tail;
  H           protect(tail, 0);
  H           if (tail != Tail) continue;
              Node* next = tail->next;
              if (tail != Tail) continue;
              if (next != NULL) {
                  CAS(&Tail, tail, next);
                  continue;
              }
              if (CAS(&tail->next, next, node))
                  break
          }
          CAS(&Tail, tail,node);
  H       unprotect(0);
  E       enterQ;
      }
      
      data_t dequeue() {
  E       leaveQ;
          while (true) {
              Node* head = Head;
  H           protect(head, 0);
  H           if (head != Head) continue;
              Node* next = head->next;
  H           protect(next, 1);
              if (head != Head) continue;
              if (next == NULL) return EMPTY;
              data_t output = next->data;
              if (CAS(&Head, head, next)) {
                  Node* tail = Tail;
                  if (head == tail) {
                      CAS(Tail, tail, next);
                  }
  HEG             retire(head);
  H               unprotect(0);
  H               unprotect(1);
  E               enterQ;
                  return output;
              }
          }	
      }

Running Benchmarks

To run the benchmarks got to the build/bin/ folder and execute the corresponding binary. The naming scheme is <DS>_<SMR> where <DS> is the name of the benchmark program and <SMR> is the reclamation scheme.

For example, to benchmark the running example from the paper, Michael&Scott's Queue using Hazard Pointers (HP), you have to do the following:


				cd path/to/TMRexp-POPL19/build/bin
				./MichaelScottQueue_HP

The DGLM Queue

As stated in the paper, the implemented shape abstraction requires hints to verify the DGLM Queue. To enable/disable hinting you have to set the corresponding flag in src/config.hpp and recompile the tool. More precisely, you have to change line 22 from


				#define DGLM_HINT false


				#define DGLM_HINT true

Beware: using DGLM_HINT=true with any benchmark other than the DGLM Queue exhibits undefined behavior!

For users of the VM image: the folder ~/popl19artifact/Binaries/ already contains the properly compiled binaries. The binaries where compiled from ~/popl19artifact/chkds/ and ~/popl19artifact/chkds-DGLM/ for all benchmarks except the DGLM Queue and the DGLM Queue, respectively.

Expected Benchmarks Results

As reported in the paper, running the benchmarks on an Intel Xeon X5650@2.67GHz with Ubuntu 16.04 and using Clang version 5.0 we expect the following running times and verification results.

Program	SMR	Time Verif.	States	ABAs	Time ABA	Linearizable
Coarse Stack	GC	0.44s	300	0	0s	yes
Coarse Stack	None	0.5s	300	0	0s	yes
Coarse Queue	GC	1.7s	300	0	0s	yes
Coarse Queue	None	1.7s	300	0	0s	yes
Treiber's Stack	GC	3.2s	806	0	0s	yes
	EBR	16s	1822	0	0s	yes
	HP	19s	2606	186	0.06s	yes
Opt. Treiber's Stack	HP	0.8s	-	-	-	no^*
Michael&Scott's Queue	GC	414s	2202	0	0s	yes
	EBR	2630s	7613	0	0s	yes
	HP	7075s	19028	536	0.9s	yes
DGLM Queue	GC	714s	9934	0	0s	yes^**
	EBR	3754s	27132	0	0s	yes^**
	HP	7010s	41753	2824	26s	yes^**

^* Pointer Race due to an ABA in push makes the verification fail.
^** Imprecision in the heap abstraction required hinting that Head cannot overtake Tail by more than one node.

Benchmarking `chksmr`

We describe how to repeat the benchmarks for verifying SMR implementations from the paper.

Quick Start. Using the VM from above, execute ~/popl19artifact/Binaries/<SMR> to run the benchmark for the SMR implementation SMR. When compiling on your own, execute TMRexp-POPL19/build/bin/<SMR>.

Benchmark Programs

We evaluated our tool on an Hazard Pointer (HP) and an Epoch-based Reclamation (EBR) implementation both of which support dynamic joining/parting of threads. C-style pseudo code is given in the following.


      // binary name: HPimpl

      struct HPRec {
          HPRec* next;
          void* hp0;
          void* hp1;
      };
      
      shared HPRec* Records;
      threadlocal HPRec* myRec;
      threadlocal List<void*> retiredList;
      
      atomic init() {
          Records = NULL;
      }

      void join() {
          myRec = new HPRec();
          while (true) {
              HPRec* rec = Records;
              myRec->next = rec;
              if (CAS(Records, rec, myRec)) {
                  break;
              }
          }
          myRec->hp0 = NULL;
          myRec->hp1 = NULL;
      }

      void part() {
          unprotect(0);
          unprotect(1);
      }

      void protect0(void* ptr) {
          myRec->hp0 = ptr;
      }
      
      void protect1(void* ptr) {
          myRec->hp1 = ptr;
      }
      
      void unprotect0() {
          myRec->hp0 = NULL;
      }
      
      void unprotect1() {
          myRec->hp1 = NULL;
      }
      
      void retire(void* ptr) {
          if (ptr != NULL) {
              retiredList.add(ptr);
          }
          if (*) {
              reclaim();
          }
      }
      
      void reclaim() {
          List<void*> protectedList;
          HPRec* cur = Records;
          while (cur != NULL) {
              void* hp0 = cur->hp0;
              void* hp1 = cur->hp1;
              protectedList.add(hp0);
              protectedList.add(hp1);
              cur = cur->next();
          }
          List<void*> freeList = retiredList.copy();
          freeList = freeList.remove_all(protectedList);
          retiredList = retiredList.remove_all(freeList);
          free_all(freeList);
      }

      void free_all(List<void*> list) {
          for (void* ptr : list) {
              free(ptr);
          }
      }


      // binary name: EBRimpl

      enum epoch_t { 0, 1, 2 };

      struct EBRRec {
          EBRRec* next;
          epoch_t epoch;
      };
      
      shared EBRRec* Records;
      shared epoch_t Epoch;
      threadlocal EBRRec* myRec;
      threadlocal List<void*> bag0;
      threadlocal List<void*> bag1;
      threadlocal List<void*> bag2;
      
      atomic init() {
          Records = NULL;
          Epoch = 0;
      }

      void join() {
          myRec = new EBRRec();
          while (true) {
              EBRRec* rec = Records;
              myRec->next = rec;
              if (CAS(Records, rec, myRec)) {
                  break;
              }
          }
          myEpoch = Epoch;
          myRec->epoch = myEpoch;
      }

      void part() {
      }

      void enterQ() {
      }
      
      void leaveQ() {
          if (*) { 
              if (myEpoch != Epoch) { 
                  free_all(bag2);
                  bag2 = bag1;
                  bag1 = bag0;
                  bag0 = List<void*>::empty();
                  myEpoch = Epoch;
                  myRec->epoch = myEpoch;
              }

              EBRrec* cur = Records;
              while (cur != NULL) {
                  if (epoch != cur->epoch) {
                      break;
                  }
                  cur = cur->next;
              }
              CAS(Epoch, epoch, (epoch + 1) % 3);
          }

      }
      
      void retire(void* ptr) {
          bag0.add(ptr);
      }

      void free_all(List<void*> list) {
          for (void* ptr : list) {
              free(ptr);
          }
      }

Running Benchmarks

To run the benchmarks got to the build/bin/ folder and execute the corresponding binary. The naming scheme is <SMR>_<Info> where <SMR> is the (abbreviated) name of the benchmark program and <Info> is a description of the implementation (we have various implementations for each class).

For example, to benchmark the running example from the paper, the Hazard Pointer implementation, you have to do the following:


				cd path/to/TMRexp-POPL19/build/bin
				./HPimpl

Expected Benchmarks Results

As reported in the paper, running the benchmarks on an Intel Xeon X5650@2.67GHz with Ubuntu 16.04 and using Clang version 5.0 we expect the following running times and verification results.

Program	Observer	Time	States	Correct
Hazard Pointers	O_Base x O_HP	1.5s	5437	yes
Epoch-based Reclamation	O_Base x O_EBR	11.2s	11528	yes

Tool Output

When executing a binary from above, our tool prints the following:

The input program
Progress of the fixed point computation
Verification verdict
Summary (gist)

For an example, the output of chsmr HPimpl has the following form:


      post image...     [#enc] done! [#enc=0.9k, #step=0k, #steptotal=0k]
      interference...   [32 buckets][#bucketsize][198][129][130][129][4][59][59][59][105][105][70][70][4][59][59][59][59][59][17][17][105][105][59][59][101][37][70][70][17][17][37][67] done! [#enc=2.2k, #step=85k, #steptotal=40k]
      post image...     [#enc][2k][3k][4k][4k] done! [#enc=5.0k, #step=4k, #steptotal=5k]
      interference...   [32 buckets][#bucketsize][274][253][255][253][79][156][156][156][223][223][157][159][79][156][156][156][154][154][93][95][223][223][154][154][217][107][157][157][93][93][105][153] done! [#enc=5.2k, #step=172k, #steptotal=96k]
      post image...     [#enc] done! [#enc=5.4k, #step=0k, #steptotal=5k]
      interference...   [32 buckets][#bucketsize][274][253][255][253][156][156][156][156][223][223][159][159][156][156][156][156][154][154][95][95][223][223][154][154][217][135][157][157][93][93][133][153] done! [#enc=5.4k, #step=173k, #steptotal=151k]
      post image...     [#enc] done! [#enc=5.4k, #step=0k, #steptotal=5k]
      interference...   [32 buckets][#bucketsize][274][253][255][253][156][156][156][156][223][223][159][159][156][156][156][156][154][154][95][95][223][223][154][154][217][135][157][157][93][93][133][153] done! [#enc=5.4k, #step=173k, #steptotal=207k]

The output tells you that sequential steps (post image...) and interference steps (interference...) are executed in rounds. Sequential steps print the current size of the fixed point in regular intervals. For example, in the third line we have [#enc][2k] meaning that there are 2000 views at the time of the output. Interference steps print the number of buckets. The explored views are stored in buckets to reduce the search space of interference steps. Views from different buckets are guaranteed to not match, i.e., are guaranteed to not require interference. In the second line, for example, we have [32 buckets][#bucketsize][274]. This states that there are 32 different buckets and that the first bucket contains 274 views.

The remaining output is straight forward.

Notes on the Implementation

The following tables relate (informal) descriptions of the implemented analysis from the paper to the implementation.

General `chkds` `chksmr`

Paper	Implementation	Description
Views	`src/cfg.hpp`	Views are implemented by the class `Cfg`. They carry, among other things, a shape abstraction of the memory, validity information, and observer states.
Thread-modular fixed point	`src/fixpoint.cpp`	The fixed point computation maintains a collection of views the post image of which may not have been explored. The post image of these views is computed and added to the fixed point. The implementation proceeds in rounds. In each round sequential steps are performed exhaustively first and then interference steps are applied. (There is an alternative fixed point computation using a work list approach. It can be activated by setting the `WORKLIST_INTERFERENCE` to `true` in the configuration file.)
Shape abstraction	`src/shape.hpp`	Class `Shape` implements a container for a shape abstraction of a specific memory layout. More details on the shape abstraction can be found in Abdulla et al. [2013,2017].
Shape abstraction	`src/post/*.cpp`	Contains post-image computation for the shape abstraction for every supported statement type.
Interference	`src/fixp/interference.cpp`	Implements interference steps. Such a step considers all pairs of previously explored views, combines them into a two-thread view, computes a post image for one of the threads, and projects the result to the other thread. The resulting views are added to the fixed point.
Observers	`src/observer.hpp`	Class `Observer` implements observer automata as presented in the paper.
Observers	`test/ObserverFactory.hpp`	Creates observer instances along the lines of the paper.

Data Structure Verification `chkds`

Paper	Implementation	Description
Views	`src/cfg.hpp`	Besides the above, also stores freedness information of the single reusable cell.
Elision support	`test/ObserverFactory.hpp`	The implementation relies on Lemma 6.1, thus does not check if the observer supports elision (Definition 5.14).
Environment `free`	`src/post/mem.cpp`	Following the semantics for environment frees, rule (Free), the implementation checks which frees can be executed by checking whether the observer reaches a final state upon consuming a free event.
Reallocation of a single address	`src/post/mem.cpp`	To identify the single address available for reuse, shapes carry a special ghost pointer to the reusable cell. Moreover, views carry a flag stating whether or not the cell is freed. If so, an allocation can yield a pointer to that cell. Otherwise, the allocation can only yield fresh cells.
Validity	`src/cfg.hpp`	Views (class `Cfg`) carry validity information about pointers.
Validity	`src/post/*.cpp`	The post image operator updates the validity according to the definition in the paper.
Pointer race check	`src/post.cpp`	The post image operators check whether an unsafe access or a racy SMR call is performed. The latter relies on Lemma 6.2 and deems racy calls to `retire` that use an invalid pointer.
Harmful ABA check	`src/chkaware.cpp`	Checks for every view where an invalid pointer can be compared in an `assert` for harmful ABAs. The implementation follows the description in the paper.

SMR Verification `chksmr`

Paper	Implementation	Description
Views	`src/cfg.hpp`	Besides the above, also stores data (set) abstractions.
Retain records	`src/shape.hpp`	Stores the `myRecord` pointer of a non-view thread.
Last `retire`	`test/ObserverFactory.hpp`	Introduces redundant observer locations to judge whether or not the observed thread performed the last `retire` call.
Last `retire`	`src/fixp/interference.cpp`	Prunes interference among views for identical threads.
Data (set) abstraction	`src/post/*.cpp`	Contains post-image computation for the data abstraction for every supported statement type.

Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis

Roland Meyer and Sebastian Wolff, TU Braunschweig, Germany

Presented at POPL'19.

Tool Overview

Tool Limitation

Installation and Compilation

Download

System Requirements

Compilation

Benchmarking chkds

Benchmark Programs

Running Benchmarks

The DGLM Queue

Expected Benchmarks Results

Benchmarking chksmr

Benchmark Programs

Running Benchmarks

Expected Benchmarks Results

Tool Output

Notes on the Implementation

General chkds chksmr

Data Structure Verification chkds

SMR Verification chksmr

Benchmarking `chkds`

Benchmarking `chksmr`

General `chkds` `chksmr`

Data Structure Verification `chkds`

SMR Verification `chksmr`