Fuzzing has become one of the most successful and popular methods to automatically find bugs and security vulnerabilities in software programs. To do this, fuzzers randomly generate data which is subsequently used as input for a test program. If the program crashes or trigger a different fail condition the fuzzer stores the respective input for further analysis. Naturally, scientists proposed numerous techniques to improve the fuzzing process. However, due to the randomness of the fuzzing process and general complexity of their evaluation it is not trivial to compare different fuzzers and their performance. In our paper we conducted extensive experiments to empirically quantify the influence of various evaluation parameters including the employed seed set, different run-times, trials as well as test sets. We further present our framework SENF which automatically uses statistical evaluation methods to calculate an overall score of each tested fuzzer.

If you have any further questions regarding our framework, please contact David Paaßen.

Our paper about our experiments and fuzzing evaluation framework is going to be published on ESORICS 2021:

Paaßen, David; Surminski, Sebastian; Rodler, Michael; Davi, Lucas: My Fuzzer Beats Them All! Developing a Framework for Fair Evaluation and Comparison of Fuzzers. In: Proc. of 26th European Symposium on Research in Computer Security. Springer International Publishing, Darmstadt 2021

The full version of our paper with an extended evaluation has been published on arXiv.

To be able to reproduce our study and to help future research we publish all data and our evaluation setup on Github.