The Secure Software Lab is conducting both advanced research projects and student projects. All projects are focused on improving the security of software systems.
- Research Projects
- Student Projects
Grail: Grammar Learning Project
Fuzzing is an important security testing tool. Grammar-based fuzzing is a significantly improved form of fuzzing whereby input mutations are performed having sensitivity to the type and structure of the input. Compared to black-box mutational fuzzing, grammar fuzzing results in more sophisticated testing. The problem is that for many targets of interest the grammar (file format, network protocol, etc) is not available. Even if there is a published grammar the target software’s parser implementation may be significantly different.
The holy grail of many reverse engineering tasks is to automatically reverse engineer the grammar of a network protocol or file format. Hence this project: Grail - will research the means by which we can automatically learn a parser’s implemented grammar by observing it consume input.
Grail’s approach is to divide and conquer the various complex parts of the problem. Instead of working with large complex parser our approach is to develop discrete learning capabilities for grammar operators (concatenation, alternation, composition, etc) and then put them together to learn harder more realistic grammars. Success for us doesn’t require perfect learning - even modest under-constrained grammars lead to significantly improved fuzzing campaigns.
This work is challenging and deep. It is suitable for advanced under-graduates and ambitious post-graduates. For more information contact us.
Moonshine: Pre-Fuzzing Corpus Design and Construction
Mutational black-box fuzzing is an important class of fuzzing technology. This type of fuzzing relies on having a large number of “seeds” (input files, network packets, configuration strings, …) that a specific target application uses as input. The fuzzer can select seeds from this corpus and mutate them in the search for crashes which hopefully reveal security vulnerabilities.
How do we do this efficiently?
For example consider that you want to perform a security assessment of the application Adobe Acrobat - the target. Part of that assessment will involve a fuzzing campaign so you crawl the web for as many PDF files you can find. PDF files are very common and you soon have a collection corpus of 100,000 files. Intuitively, many of these files will be functionally similar and therefore lead to very inefficient fuzzing if you use all of them.
Can you choose a subset of seeds that somehow represents all the seeds you have collected?
Moonshine is a research project that seeks to optimise the design of fuzzing corpuses for use by a fuzzer. We have already achieved almost perfect corpus design based on maximising code coverage however code coverage is only one way of designing a fuzzing corpus. What about other measures like execution complexity or interesting library usage or …?
This work is sufficiently deep enough to support several students ranging from under-graduates to post-graduates. If you are interested then contact us.
See our student projects page.