# Student Projects

The Secure Software Lab offers a number of student projects. These projects are suitable for both undergraduate final-year projects and summer vacation internships.

Read up on individual project courses here, or check out all CECS projects here. Note that project ideas are often quite flexible, so if you have your own idea or would like to discuss a variant we’d love to hear from you.

# Vulnerability Research and Discovery

Vulnerability research (VR) is the process of analyzing a system to find, understand, or exploit one or more vulnerabilities. In the Secure Software Lab, we are interested in developing and applying advanced program analysis techniques to improve the automation of VR. These techniques include, but are not limited to, fuzzing (e.g., improving tools such as AFL or its successors, developing new program instrumentation using the LLVM compiler toolchain), symbolic execution (e.g., building upon angr and S2E), and source/binary static analysis (e.g., developing Ghidra plugins).

VR projects are undertaken in conjunction with the ANU/ASD CoLab and as such students must be Australian citizens. For more details please contact Adrian Herrera or the CoLab.

Hypothesis is an advanced and widely-used property-based testing framework, used by thousands of companies and open source projects. Zac Hatfield-Dodds has many related projects for students interested in advanced testing and verification, whether on the formal-methods or fuzzing sides of the field. All projects require strong Python skills; experience with open source workflows (git, CI, etc) are helpful but not required.

## Fuzzing with Protobuf or ASN.1 schemas

Many programs communicate via binary formats such as ASN.1 or protocol buffers, and so automatically generating valid messages is an attractive testing technique. In this project, you would write a function to take a schema, optionally with customised ‘strategies’ for sub-structures, and return a Hypothesis strategy which generates such messages.

To take this from a six-unit to a twelve-unit project, you could search for bugs, characterise how it combines with e.g. targeted PBT (like IJON or FuzzFactory) or the benefits of integrated test-case reduction, etc. - it’s well-defined to start with but there’s plenty of extensions if you finish the core quickly. hypothesis-010 and hypothesis-jsonschema are good examples of related projects.

## Extending the Hypothesis Ghostwriter

The Hypothesis Ghostwriter can automatically generate property-based tests for six different properties based on source-code introspection - without needing to execute the test like previous work (Randoop, Evosuite, QuickSpec, FuzzGen, etc).

Enhancements could add more properties, or use more sophisticated introspection logic to determine the required argument types for unannotated parameters. For a larger, probably Honours-scale, project you could implement an alternative mode which does include execution of the speculative tests - allowing users to trade away runtime and safety for improved test code.

## Data-Science Entomology

Essentially all scientific data analysis is now done in software, which implies almost any published results could be wrong due to software bugs - and we would probably not know if this was the case. Fortunately, property-based testing can help, and Python libraries like Numpy and Pandas have already added Hypothesis to their test suites - but so far only have a few property tests.

Writing tests - and patches for the discovered bugs - would be a significant contribution to the scientific community. Depending on the project and scientific domain this project may also require strong skills in C or C++; and depending on scope could be a six-unit or twelve-unit project. An Honours-sized extension to this project could involve identifying bugs which do in fact change the results of published papers. Such bugs certainly exist, but scientists’ code is often irreproducible even without bugs!

## Generating graph data-structures

There is significant interest in generating graph and graph-like data structures with Hypothesis (see e.g. hypothesis-geojson, hypothesis-geometry, and hypothesis-networkx). There is also a substantial literature on random generation of these structures, but this prior art generally focusses on generating uniform distributions (vs heuristic ‘bug-finding’ distributions). Designing the generation strategy for locality of mutation and efficient shrinking is also a novel-for-graphs problem, though there is plenty of prior art in Hypothesis for other objects.

This is likely to be a twelve-unit project to produce and ship useful open source code; though proof-of-concept and deeper projects are both possible.

## Did you implement the model you verified?

TLA+ formally verifies properties of your system design, but not the implementation. We can gain confidence that we have implemented the spec by checking execution traces from testing or even production traffic, or with a custom simulation harness.

A demonstration system connecting model-based testing to TLA+ verification would be a significant advance in the state of the art, suitable for an Honours student or as a twelve-unit project if you are already comfortable with TLA+.

## Generalising or abstracting failing inputs

Reporting generalised examples is as much a user-interface design problem as it is an implementation challenge: can we show more information about the cause of an error, without showing too much (or worse - wrong information!). I would recommend this as a twelve-unit project.

Async programming frameworks like Trio or the standard-library asyncio offer substantial speedups for IO-bound code, but may introduce new bugs that only manifest under unusual scheduler orders of the subtasks. rr chaos mode is interesting prior art, albeit for threads rather than an async task model.

So what if we could use a custom scheduler for tests which attempts to induce bugs? I think this would uncover many bugs, but raises some further research questions. For example: should we generate distinct inputs, or only task schedules? The former covers more states; the latter admits the property “same result regardless of schedule”. Are adversarial schedules useful on real-world programs, or do they just reveal uninteresting failures? And what about fault injection (adding delays or possible IO errors)? This could reveal many bugs, but requires substantial work to give an acceptable user experience.

This project could be a good twelve-unit or Honours project, or you could use a semester project to prototype and scope your Honours research.

## Repurposing PyPy’s tracing JIT to measure code coverage

Fuzzing in Python usually sees a 2x-70x slowdown due to the overhead of coverage measurement; so reducing this would obviously be very valuable. PyPy is a Python implementation based on a just-in-time compiler (JIT) - which offers a large speedup on repetitive workloads like fuzzing, but considerably worse relative performance with coverage. Having the JIT tracer output coverage measurements directly could eliminate this overhead entirely, improving fuzzing performance by two orders of magnitude.

This project would be good for an ambitious Honours student with an interest in programming language implementation, especially just-in-time compilation, and would likely lead to a published paper and widespready use if successfully implemented.

Updated:  16 Apr 2021/ Responsible Officer:  Director, Research School of Computer Science/ Page Contact:  Antony Hosking