Benchmarking Sbuilder

This post examines TLA+tools model checker performance, as a number of states processed per second, when model checking formal models generated by Sbuilder. It finds the performance to be CPU bound, but scalable to match real world applications.

The blog entry demonstrates, how Setups in Sbuilder can be used to manage state space explosion, and associates setups with Use Case Slices to make the idea of managing state space size more comprehensible for developers. Ideally, a Use Case Slice, identified in development, is configured as an Sbuidler Setup and added into a repository for Regression Verification. The objective is to increase confidence in application correctness, and to make QA more efficient with faster feed back because a formal model

• can be executed without the need to have it installed on real production environment, and
• the model checker can effectively check all possible executions in the formal model.

Introduction

TLA stands for the Temporal Logic of Actions, a formal language to specify system behavior, but it has become a shorthand for referring to the TLA+ specification language and the PlusCal algorithm language, together with their associated TLA+tools, e.g PlusCal translator, and TLC model checker.

Sbuilder is a tool create formal models in TLA+ language. A model created by Sbuilder is translated using PlusCal translator, and model checked using TLC model checker. The bottleneck in this tool chain is in the model checking algorithm executed by TLC tool, particularly in state space explosion, and in performance decline, when processing large data sets, or formal models with large code volume.

Sbuilder is used to model IT systems, where the system architecture identifies

• services: units of functionality accessible over interfaces.
• database: persistent data store, which services may access, and modify.

To complete runnable formal model, Sbuilder uses setups. A setup defines environment model, which controls sequence of interface operations and their input parameter bindings. A setup also limits data model size using domain cardinalities for formal model variables.

For more details see sbuilder github repository, and sbuilder documentation.

TLA-tools Model Checker Performance

Time to model check a formal model depends on the number states in the spate space generated for the formal model, and on model checker performance, the average number of states processed in time unit (states/sec):

$time (sec) = StateSpaceSize (states) / ModelCheckerPerformance (states/sec)$

These benchmarks focus on model checker performance, for controlling state space size see Heuristics Used in Creating Runnable Formal Models.

Model checker performance is evaluated from two perspectives1, 2: on one hand, what is the impact of formal model code volume, and on the other hand, what is the impact of generated state space size, on model checker speed.

Code volume in a formal model created by Sbuilder is a result of two factors: number of interfaces in the model, and number of statements in service implementations.

Size of state space generated during model checking is controlled using setups in Sbuilder. A setup governs data model size using domain cardinalities, and how values are bound to interface parameters. A setup also controls length of execution trace by defining sequence of interfaces calls.

Benchmark Application

To analyze performance factors presented in the previous chapter this post uses a simple Ethrereum contract, outlined below as a Mustache template:

```// Iterate CONTRACT -times
{{#CONTRACT}}
Contract Benchmark_{{contract}} {

struct s {
string strVal;
int    intVal;
}

// Persistent state of a contract
s memberVariable;

// Contract constructor (mapped to an interface in Sbuilder)
Benchmark_contract( string inp1 ) {
// Iterate STMT times
{{#STMT}}
memberVariable.intVal = {{i}};
{{#STMT}}
}
}
{{/CONTRACT}}
```

Benchmark runs are configured using four parameters shown in the table below:

 Parameter Purpose Code volume `CONTRACT` Define number of interface Yes `STMT` Determines number of statements in services Yes `DOMAIN` Control cardinality of interface input parameter No `ITERATION` Control interface invocation No

Code volume in a benchmark formal model is determined by `CONTRACT` and `STMT` parameters. Parameter `CONTRACT` defines number of contracts in a benchmark application. Contract constructors are mapped to interfaces in Sbuilder formal model. Calling a constructor creates an entry in formal model persistent state, recording the value of contract `memberVariable`. Benchmark parameter `STMT` specifies number of assignment statements in the service body.

Parameter `DOMAIN` sets cardinality for the input variable `inp1` in the application data model. The setup in the benchmark allows model checker to use all variable values to bind interface input variable `inp1`. In effect, increasing cardinality of the (single input variable) domain results to (linear) increase of states in the generated state space for each (deterministic) service execution.

Benchmark variable `ITERATION` configures number of times a contract interface is called. Increasing `ITERATION` count increases also generated state space size. Number of state added to state space depends on benchmark parameters `ITERATION`, `CONTRACT` (number of interface calls is `ITERATION` count multiplied by `CONTRACT` count), `DOMAIN` (scheduler uses `DOMAIN` different parameter bindings on interface calls), and `STMT` (each assignment statement is mapped to a TLA action, generating a new state, when triggered).

Impact of Formal Model Code Volume

Impact of formal model code volume on model checking performance is analyzed from two perspectives:

Impact of Interfaces – Parameter `CONTRACT`

This sections explains, how model checking performance changes as a function on a number of interfaces in the benchmark model, controlled by benchmark parameter `CONTRACT`.

Each benchmark run calls exactly 10 contracts (invariant `ITERATION * CONTRACT` = 10) with each contract having 10 statements (`STMT` parameter value `10`). Number of contracts (parameter `CONTRACT`) is varied resulting to a formal model with “dead code”3, i.e. `CONTRACT-10` interfaces, which are never invoked in the formal model. Model checking performance is studied along two dimensions: on the number of model checker workers (`default` for 1 worker/ `config1` for 8 workers), and on domain cardinality of interface input data element ( parameter `DOMAIN` with value 1/10), resulting totally to four configurations of benchmark runs.

The line diagram below shows, how average model checking speed slows down, as the number contracts in the model increase. Highest value, 600 states/sec, is found for 8 workers with domain cardinality value 10. The line for 8 workers moves above of the line for 1 workers. When domain cardinality is set to value 1, the benefit of having more workers seems to diminish, as the distance between lines for 1/8 workers is narrower when compared to lines with domain cardinality value 10. Increasing domain cardinality impacts performance (lines for `DOMAIN` value `10` are above lines for `DOMAIN` value `1`), as TLA+tools model checker can bind value to formal model interface actions quite effectively, as demonstrated in chapter Impact of Data Model Size – Parameter `DOMAIN`.

Performance decrease is more due to growth in code volume than to state space size. The picture on the left shows number of lines in the formal model increasing with `CONTRACT` parameter value. However, for a given domain cardinality 1/10, number of states generated are constant for all `CONTRACT` values, and does not depend on number of workers (lines `default` / `DOMAIN 1` and `config1` / `DOMAIN 1` overlap, and well as lines `default` / `DOMAIN 10` and `config1` / `DOMAIN 10`), as shown in the picture on the right.

 Lines States

Impact of Statements – Parameter `STMT`

This sections documents model checking performance as a function of number of statements in the application, controlled by benchmark parameter `STMT`.

The benchmark runs invoke interfaces for 10 contracts with number of statements being varied. Runs are made using four different configurations outlined in the table below.

Configuration model checker workers Domain cardinality
config1, domain 1 8 1
config1, domain 10 8 10
default, domain 1 1 1
default, domain 10 1 10

Number of lines in the formal model, as shown in diagram on the left, increases linearly with the number of statements in the application. Line count is not affected by number of TLC workers, nor by the domain cardinality, hence the lines for all configurations overlap.

Diagram on the right present data space size of the benchmark runs. Increasing domain cardinality makes state space size growing faster with the number of statements in the model, that is, the slope of the line for `DOMAIN` value 10 is steeper than the slope for `DOMAIN` value 1. Number of model checker workers does not affect number of states being generated, and lines for `config1` and `default` overlap for a given `DOMAIN` value.

 Lines States

The line diagram below shows average model checking speed (states per second) on y-axis versus number of statements (benchmark parameter `STMT`) on x-axis for benchmark configurations:

Performance peaks just above 800 states/sec for a configuration with 8 workers (`config1`) and `DOMAIN` parameter with value `10` , when the peak for one worker (`default` configuration) is just above 500 states/sec. Having `DOMAIN` parameter with value `1` results to slower model checking speed, just above 200 states/second for `config1/DOMAIN 10`, even poorer for `default/DOMAIN 1`.

TLA+model checker can clearly benefit from having more workers running. Gains are realized as larger domain cardinality (`DOMAIN` value 10) allows non-deterministic value binding to interface inputs, ramifying generated state space, and creating opportunity for the model checker to distribute next state calculation to several workers (`config1` with 8 workers).

For a configuration `default` (1 worker), domain cardinality `DOMAIN` value 10, we can notice an initial peak when `STMT` value is less than 25 statements, but, for `STMT` values larger than 150, performance drops below configuration `config1` (8 workers) with `DOMAIN` value 1. We may understand this phenomenon, that initially the dominating factor is TLA+tools model checker binding values to formal model interface actions quite effectively (ref. chapter Impact of Data Model Size – Parameter `DOMAIN`), but for larger code volumes having more workers helps in maintaining performance level.

Impact of State Space Size

The benchmark analyzes impact of state space size on model checking performance from two perspectives

Impact of Data Model Size – Parameter `DOMAIN`

This section documents how varying data model cardinality (benchmark parameter `DOMAIN`) results in change of state space size, and how it impacts model checking performance. Other benchmark parameters are fixed, resulting to a formal model, where one interface is called with `DOMAIN` different variable bindings, each interface invocation executing a service comprising fixed number of statements (parameter `STMT` value 10).

Diagram on the left shows, that formal model line count does not depend on data model size. Diagram on the right shows, how the number states generated increases linearly with benchmark parameter `DOMAIN` from 3 states to apporimately to 150.000 states. Obviously, number of lines, or number of states generated, do not depend on number of workers in the configuration, i.e. lines for using 1 worker (configuration `default`) and for 8 workers (configuration `config1`), overlap in both of these diagrams.

 Lines States

In the benchmarks, TLA+ model checker was able to reach peak performance of 44.000 states/sec for 8 TLC workers, and 39.000 states/sec for 1 TLC worker. When domain cardinality as increased to 100.000 elements the performance was sloping gently downwards.

The formal model in these benchmarks had one action for a contract interface binding to a set with varying cardinality. This setup allows TLA+model checker to process states much faster (44.000 states/sec), when compared to situation in chapter Impact of Interfaces – Parameter `CONTRACT` (600 states/sec), where number of interface actions was varied.

Impact of Longer Execution Traces – Parameter `ITERATION`

This section documents, how model checking performance responds to larger state space due to longer execution traces, as controlled by benchmark parameter `ITERATION`.

In these benchmarks, `ITERATION` parameter controls how many times application interfaces are called. When code volume is kept fixed (one contract with 20 statements), as well as domain cardinality (fixed value 1 or value 10), varying `ITERATION` parameter has a linear impact on state space size, as shown in the picture on the right. Slope in the number of the states depends on domain cardinality, but is unaffected by the number of workers. Number of lines is constant for all configurations, as shown in the picture on the left.

 Lines States

The diagrams below performance peak, approximately 26.000 states/sec for a configuration with 8 workers with data cardinality 10. Performance peak is more than twice compared to the case with one worker. Noticeable, when domain cardinality is 1, the benefit from having more workers seems to vanish for larger `ITERATION` counts.

The line representing runs with 8 workers (`config1`) and 10 domain values (`DOMAIN` = 10) moves clearly above the lines for other configurations, indicating that, under right conditions, model checking performance can be effectively boosted using multiple workers.

Conclusions

TLA+tools model checking is CPU bound, at least for the formal models benchmarked for this blog posts. Its performance can be scaled up using multiple workers. TLA+tools model checking performance can be further increased by running it in distributed mode4 on Amazon Web Services (AWS).

Formal models created by Sbuilder should scale well as Sbuilder setups allows non-deterministic value binding to interface inputs, ramifying state space, and creating opportunity for the model checker to distribute next state calculation to workers.

In these benchmarks, using a laptop PC, 2.2GHz Intel i7 CPU with 4 Cores/8 Threads, and 12 GB memory, it was possible to model check applications with 60 interfaces at a speed of approximately 100 states/sec. Assuming faster CPU, with more cores, running on distributed hardware, it should be possible achieve better model checking speed, and to maintain performance level for larger code volume. In Sbuilder formal models, one state represents one application statement executed. These numbers are of order of magnitude, which makes Sbuilder usable for real world applications.

However, the real crux is in state space explosion. In Sbuilder, state space explosion, and the impact of model checking performance degrading with code volume (ref. chapter Impact of Formal Model Code Volume), can be counteracted using setups:

• Setups are arbitrarily long sequences of interface calls. A setup of length 0 results to a minimal state space. Sequence length can be increased at will, with increasing sequence length resulting to a larger state space. (ref. chapter Impact of Longer Execution Traces – Parameter `ITERATION`)
• Setups allow configuring domain cardinalities. By default cardinality is fixed to value 1, but may be set to any value, when need to explore behavior not covered for the default. Again, increasing domain size increase state space (ref. chapter Impact of Data Model Size – Parameter `DOMAIN`)
• Setups allow complete control on interface input variable bindings. Possibilities range from fixing input values on a setup, to allowing model checker non-deterministacally bind all variable values to interface inputs. Fixing input values on a setup is required if domain set is too large to be evaluated, which is the case for example for `Integer` domain, or when Cartesian product of base domains explodes the size of an input set too large.
• Formal model code volume is also controlled using setups, as Sbuilder prunes out parts from formal model un-reachable from the interfaces defined on a setup, (or from the correctness criteria used in verification). (ref. chapters Impact of Interfaces – Parameter `CONTRACT` and Impact of Statements – Parameter `STMT`)

To make the idea of managing state space growth more comprehensible for developers Sbuilder Setups can be associated with Use Case Slices. From this point of view, a Use Case is a set services in an application, and a Use Case Slice is the result of calling theses service with certain parameter bindings, which is implemented in Sbuilder as a Setup.

Ideally, Sbuidler Setups are stored in a repository controlling automated model checking process. This repository enables Regression Verification, i.e. running model checking after an application change. The objective is to increase confidence in application correctness, and to make QA more efficient with faster feed back because a formal model

• can be executed without the need to have it installed on real production environment, and
• the model checker can effectively check all possible executions in the formal model.

Fin

This blog post has documented, how TLA+tools model checker performs, when model checking formal models generated by Sbuilder. In addition, it has also demonstrated, how state space growth can be managed in Sbuilder.

Benchmarking model checker performance is one step in Sbuilder Roadmap. Next step includes a Case Study with the objective to understand applicability of creating formal models for real world applications.

Footnotes:

1

Result from benchmark runs were collected using hardware with: HP ENVY 17/Notebook PC, memory 12GB, CPU: Intel(R) Core(TM) i7-4702MQ CPU @ 2.20GHz, 4 Cores of 8 Threads

2

TLA+ tools versions: pcal.trans Version 1.8 of 2 Apr 2013, TLC2 Version 2.07 of 1 June 2015

3

Notice: In these benchmark runs, we are not using Sbuilder prune option, which would eliminate “dead code” from the formal model

4

Notice: distributed mode was not covered by this blog entry