Benchmarks
These benchmarks and tools are subject to change until their respective freeze
dates, as documented in the SMT-COMP rules.
Application track Benchmarks
- 2012 additions
- safari - derived from SMT-queries of an infinite-state model-checking
tool (from R. Bruttomesso, SAFARI tool, CAV 2012)
- smtic3_QF_LRA - benchmarks derived from an IC3 algorithm applied to some software verification benchmarks (from A. Griggio)
- QF_UFLIA benchmarks, from Blast (128MB)
These are just logs of the calls to the Simplify theorem prover, used
by Blast when trying to model check some C programs (the name of the
program is reflected in the name of the benchmark -- original Simplify traces
can be made available for those who are interested)
Results
- QF_LRA BMC and k-Induction problems on networks of hybrid automata, from NuSMV (161MB)
These are benchmarks for hybrid automata, generated by unrolling the NuSMV models and checking
them with BMC or k-Induction
Results
- QF_BV BMC and k-Induction problems on SystemC designs, from NuSMV (134MB)
Unrollings of translation of some SystemC programs into NuSMV. The programs were those used, e.g., in the FMCAD'10 paper:
Verifying SystemC: a Software Model Checking Approach by Alessandro Cimatti, Andrea Micheli, Iman Narasamdya and Marco Roveri
The two sets are essentially the same, except for the different logic
used.
Results
- QF_LIA BMC and k-Induction problems on SystemC designs, from NuSMV (346MB)
Unrollings of translation of some SystemC programs into
NuSMV. The programs were those used, e.g., in the FMCAD'10 paper:
Verifying SystemC: a Software Model Checking Approach by
Alessandro Cimatti, Andrea Micheli, Iman Narasamdya and Marco Roveri
The two sets are essentially the same, except for the different logic
used.
Results
- QF_LIA BMC and k-Induction problems on Lustre programs, from NuSMV (172MB)
These were obtained from subset of the Lustre models also used for
the KIND set, except that they were generated from a NuSMV version of
the Lustre programs, by NuSMV itself.
Results
- QF_UFLIA traces from KIND, postprocessed (for competition, 128MB).
These benchmarks were obtained from the KIND tool during Lustre programs verification
Results
Original traces are also available, they aren't in the restricted format for competition, e.g., they use define-fun. They also are mis-labeled as QF_UFIDL. (4.2MB)
- ASASP benchmarks (31MB)
ASASP (http://st.fbk.eu/ASASP) implements a symbolic reachability
procedure for the analysis
of administrative access control policies. A more detailed
description of the benchmarks can be found in the following paper:
Efficient Symbolic Automated Analysis of Administrative Attribute-based RBAC-Policies, by F.
Alberti, A. Armando, and S. Ranise.
http://st.fbk.eu/sites/st.fbk.eu/files/asiaccs174-alberti.pdf
Results (updated 26 June 2011)
Unsat core benchmarks
The unsat core benchmarks are the subset of the benchmarks from the indicated
logics that are unsatisfiable; the benchmarks themselves have been modified
to include names for the assertions. Per the SMTLIB standard, the benchmarks
are allowed to contain a mix of named and unmaned formulae, though ordinarily,
all top-level formulae will have names. The names may be scrambled by the
benchmark scrambler.
Format for results files
Each tarball contains one or two status files for each
benchmark:
BENCHMARK.results.txt
is always present, and
BENCHMARK.results_untrusted.txt
might be present as well
These files contain one status per line, corresponding to the series of
(check-sat)
commands in the benchmarks.
Each status line (where different from "unknown") has been determined by
running at least 3 different SMT solvers on the set of instances
resulted from "unfolding" each incremental benchmark using the
scrambler. For *.results.txt
, all the results different from "unknown"
have been reported by at least 2 different solvers, whereas all the
status lines generated by a single solver only (because e.g. the others
timed out) have been replaced with "unknown". For
*.results_untrusted.txt
, the single-solver
answer is also included. However, they are marked with an "# untrusted" comment.
Last modified: Tue 17 Feb 2015 14:57 UTC