User Tools

Site Tools


starter_projects

Starter Projects

This contains a list of simple projects which are a good introduction to the code base.

Instrumentation to Check Pointer Alignment

Some architectures only allow load instructions / pointer dereferencing when the address is a multiple of the word length (i.e. 4 or 8 bytes). This project would add a new instrumentation option for cbmc and goto-instrument that checks each pointer before dereference and makes sure it is correctly aligned.

Command line options to SAT solvers

There are options for which SMT solver to run and there are ways of building with multiple SAT solvers but no command line options to pick a SAT solver. It would be nice if there were.

César is working on this one.

Combine Uninitialized Variables Analysis with Alias Analysis

goto-instrument features an instrumentation to track cases of use-before-define, i.e., reading of uninitialized variables. At present, the analysis does not take into account aliasing information. As such, the following code will yield a spurious counterexample:

int main()
{
  int x;
  int *ptr;
  ptr=&x;
  *ptr=42;
  if(x < 42)
    return 1;
  return 0;
}

Constant Folding

With the recently added reaching definitions analysis, it is straightforward to implement a program transformation that performs constant folding. To ensure soundness in concurrent settings, it should be combined with the existing is_threaded analysis. (A concurrency-aware version of reaching definitions now exists and is awaiting a merge.) Along the way, assertions that become trivially true should be eliminated.

Compute Program Dependence Graphs

Program dependence graphs (PDGs) form an essential element for slicing. The reaching definitions analysis is particularly well suited to derive a clean and precise implementation of program dependence graphs, because it is field-sensitive. Michael has an implementation of both program dependence graphs and a working slicer.

Make SAMATE Tests Work

Read this paper and make the tests mentioned in there work.

Error Explanation

Re-implement the algorithm described in this paper.

LTL

Build an instrumenter in goto-instrument for checking LTL (or even CTL) properties on C code. E.g., look at this paper. (And please get in touch with Michael Tautschnig, who has done very similar work in FShell.)

Rajdeep is now looking at this one.

Test SMT Back Ends

CBMC allows to choose among a selection of SMT solvers to use as solving back end. As these tools evolve, CBMC's code for running them may need a fresh look from time to time. After a first round of manual experiments, these tests should become part of the routine build process on dkr-build. (Michael Tautschnig knows how to do this.)

Simple Test Harness Generation

CBMC currently outputs a trace of execution when it find a counter-example. Instead it should be possible to produce a C file that when compiled and run will cause the program to execute this trace. This will allow a faster verify/debug cycle. (To a certain extent this has been accomplished in FShell. Speak to Michael.)

Pascal is now looking at this one.

Improved Interpreter

CProver has an interpreter for goto programs but it could be improved, for example adding support for dynamic analysis, calls to external functions and possibly even mixed concrete and symbolic execution. It would also be good to be able to interpret traces found by BMC to catch mismatches / bugs between the logical model and the bit blasting models. This would also be useful for more advanced CEGAR like techniques (also see below).

"KLEE-PROVER"

Alter the CBMC symbolic execution engine to create a dynamic symbolic execution tool similar to KLEE. Possible features include:

  • Distributed and incremental support using the caching scheme here
  • A feature to run BMC as you go to explore the traces 'around' the execution traces
  • Use path exploration as the witness generation part of an ACDL process.
  • Perform precondition inference, possibly using ACDL.

See the work Björn and Alex have done with path-symex.

Interact with GDB

GDB implements a client-server protocol with a GDB instance acting as a server and the GUI as a client. (The documentation for that protocol may be found here.) By implementing this protocol in various place within CProver, there are various things that could be achieved.

To do some of these will require working out which functions are non-deterministic models of real functions and accounting for this. This would also be necessary for some of the preceeding ideas.

Trace as a GDB server

Implement a GDB server that uses a trace to 'execute' the program. This allows any GUI debugger that can talk to GDB to be used to graphically explore the trace. Particularly valuable for concurrent programs in which just reading the trace could be very hard. Pascal has something like this implemented for Eclipse.

Handling the SV-COMP counter-example format (ask Michael) would probably make this tool a lot more widely usable.

Trace as a GDB client

Conversely, implement a GDB client that uses the trace to control a debugger. If connected to a 'real' GDB then this would allow the debugger to be set to the position that shows the error. In the case of parallel programs this avoids the need to create custom schedulers, etc. and over-all it is a much better interface – rather than generate a trace, CProver simply gives you the debugger in the state that triggers the problem.

One option would be to convert the trace to a GDB script but this misses some of the value as it would be possible to query they GDB server as the program runs to check that values really are assigned correctly. This will catch the first point at which the model and the real system diverge and allows checked 'resimulation' of the traces on a 'real' platform. If the goto-binary contains the ELF binary then this could be used directly.

Interpreter as a GDB server

An enabling tech, this would allow us to run the interpretter as a debugger back end. Combine with the trace as a GDB client, this would allow resimulation of traces, which would be useful for catching encoding bugs in CProver and ideally should be run after the decode from the solver all of the time. Michael has a partial implementation of this.

Recording GDB server

Acts as a GDB server and passes commands back to another GDB server to actually execute. Records the state changes and locations visited. This allows a test case to be 'run' and the trace / conditions on the trace to be collected and (after generalisation) be used for test case driven analysis / test case generalisation / something “concolic” like.

Differential GDB server

Acts as a GDB server and then connects (as a client) to one or more GDB servers. Commands are executed on all of these and then the state / position compared (much of the infrastructure of the previous idea could be used). Divergences of behaviour can then be found, particularly root causes can be identified by the first divergence. This may not be CProver specific but can likely be used for various projects and ideas above.

"Concolic" GDB client

An interactive component that runs a program via a GDB server and query / interact with it. This would be useful for building a “concolic” tool, especially for running library functions that are not defined.

Other Ideas

Valgrind and the Linux kernel both have GDB servers – what could we do with these?

Static Detection of Data Races

It would be good to have a tool that checks for data races in a similar fashion to DRD / Helgrind. The current pointer analysis can over approximate shared variables, so possibly some combination of this, trace generation, DRD / Helgrind and then refinement of which traces to pick? Perhaps this is even something ACDL like?

CORAL / Q

Build something similar to the Q tool in SLAM.

CryptoMinisat

Try CryptoMinisat vs. MiniSat.

k-liveness

Implement k-liveness for software-liveness checking.

Hongyi is now looking at this.

Known Bugs on the Web

Turn the list of KNOWNBUGS from the CBMC regressions in the SVN into a webpage of known bugs of the CBMC trunk and the last CBMC release.

Coding Standards Checker

Build instrumenters for various coding standards, e.g., AutoSar, MISRA-C, CERT C HIC++, etc.

Front-End Testing

Run the regressions mentioned here and the GCC C Torture tests with goto-cc and a goto-program interpreter (mild dependency on the interpreter-related task above).

Dalvik Front-End

Connect to an existing unpacker or add to the Java byte code front-end so that it can handle Dalvik / Android binaries. Some work implementing the Android libraries will be needed to make this useful. Nassim was working on this at one point and one of Marta's post-docs was interested.

Unspecified C Behaviours

Build instrumentation to check for unspecified behaviours in C, as per Section J.2 (and maybe also J.3) of the C standard. Some of them may be possible to handle entirely in the front-end (for example, catching x = x++ as modifying a variable twice between sequence points could be done in side effect removal in ansi-c), while others may be best to be instrumentation (for example handling the unspecified nature of the order of evaluation of function arguments). Talk to Michael Tautschnig or Martin Brain for this item.

Reporting Software Errors with CWE Information

The Common Weakness Enumeration project seeks to classify software weaknesses and errors in a common dictionary. Reporting CWE information together with failed assertions, such as invalid arguments to free, would help both users of the tools as well as industrial partners. In the first instance, errors already caught by present instrumentation (such as the mentioned invalid free), should be augmented with additional information. In a second step, instrumentation for other problems listed in the CWE database should be developed.

Machine Learn Refiment Settings

The refinement option enables a tactic of under and over approximation to try to solve the generated formulae more quickly. There are a large number of possible options – which operations to approximate, how much to under or over approximate them, how much to add back if this doesn't work, etc. Rather than trying to manually pick the best strategy from these, it would be good to machine learn them. It would probably be worth asking Nando de Freitas' advice on the machine learning side of things.

Ruben is now looking at this one.

Cluster CBMC

Most users of CBMC have access to multi-core machines; many have clusters available. There are a number of different approaches and a number of different levels at which the task could be parallelised; per assertion, portfolio approaches of different back-ends, parallel solvers, etc. It would be good to be able to exploit these. The caching approach suggested under “KLEE-PROVER” might be an easy route to implementing this.

x86 to Goto

Look at various efforts to model x86 assembler, e.g., Andrew Kennedy, “Formalizing .EXEs, .DLLs, and all that”, Gang Tan, “Reusable tools for formal modeling of machine code”, and check whether this can turn into a translator from x86 to goto binary. Also look at Google's Native Client, and talk to Matt, who has tried Valgrind.

Furthermore, talk to Michael, who has got a student who made some progress on this.

And maybe have a read of Automated synthesis of symbolic instruction encodings from I/O samples.

CBMC for Solaris

Make an OpenCSW package for CBMC. Instructions are here.

An SMT-LIB front-end for our back-end

The flattening code and other wrappers around the SAT solver are pretty much identical to the internals of a simple SMT solver. Writing or integrating an SMT-LIB front-end (possible sources include Alberto's Parser, OpenSMT, CVC4 or any other open source solver) would allow an SMT-solver to be build from the CProver code base. This would allow fair comparison with other solvers, significantly increase the range of benchmarks available for the back-end and be a useful tool in it's own right.

Daniel has volunteered Ruben for this.

Branch Prediction

Implement branch prediction algorithms in path-symex.

An abstract interpreter for C

There is a lack for a robust, open source, abstract interpreter for C/C++. CPROVER contains most of the components for it, the key thing that would be needed is a front-end and some testing, plus possibly some more domains.

GCC Torture Tests

GCC has a set of 'torture tests' which are intended to stress test and find bugs in their C parser and front-end. It would be good to run these through goto-cc / add them to the nightly testing and fix any bugs that arise. (Possible overlap with “Front-End Testing” above.). The same is probably also true for CSmith and other compiler fuzzing tools.

Compiler loop optimisations for BMC

Try out some of the compiler loop optimisations (hoisting invariant code, fusing sequential independent loops, etc.) and see if they improve BMC, K-Induction or loop acceleration. These should be possible to implement in goto-instrument.

Michael has an undergraduate student looking at this…

Handling restrict in a smart way

C99 adds the restrict keyword to mark unaliased pointers. This can be discarded by the compiler and thus has no semantics, it is only an annotation. However there are a number of things we could do with:

  1. Insert checks in calling functions to make sure that the pointers really are unaliased.
  2. Add an instrumentation pass that adds assumptions that restricted pointers are not aliased. This may improve the quality of counter-examples (although at the cost of potentially missing some bugs, although not if combined with the previous technique).

C++ Exceptions

C++ has annotations for what exceptions a function can throw. It would be good to be able to check / generate / minimise these and it should be possible to do this with a reasonably simple abstract domain. The CVC4 project would be interested in using this if it exists.

Peter may have some thoughts on this.

Java Memory Model

CBMC's Java front-end is currently in development and will support the full Java bytecode instruction set soon. Beyond that, it would be great to support some more advanced aspects of the Java programming language semantics, such as the Java Memory Model. This will include adding models for a subset of the Java concurrency packages in the Java Runtime Library. Jade Alglave may have some additional input on this.

Deadlock Analysis

Michael outlined an instrumentation approach to deadlock / termination analysis on the support mailing list. It would be nice to turn this into an actual analysis.

starter_projects.txt · Last modified: 2016/01/28 14:38 by mbrain