Automatically identifying excessive assembly bloat
This project was performed in the context of the Automated Software Testing lecture held by Prof. Zhendong Su at ETH Zurich. I worked on it together with Marco Heiniger, under the supervision of Yann Girsberger.
Compiler flags allow programmers to optimize generated binaries for their specific use-case. Generating fast, easy to debug or short assembly files can be accomplished with flags such as -O3
, -Og
, -Os
. In this project we have focused on finding minimal instances of C code that, while using the space-optimizing flag -Os
result in larger binary size than with the use of different compilers or compilation flags.
This blog post will give a short outline of the process used to find these bloated instances. A detailed analysis as well as source code can be found on our Gitlab page
Assembly bloat example
How GCC bloats an empty printf:
#include <stdio.h>
int a;
int main() { printf("", a); }
GCC 13.1 [-Os]
.LC0:
.string ""
main:
push rax
mov esi, DWORD PTR a[rip]
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
xor eax, eax
pop rdx
ret
a:
.zero 4
Clang 16.0.0 -Os
main:
xor eax, eax
ret
a:
.long 0
More examples, as well as a short analysis can be found in our artifacts branch.
The method
We have conducted our analysis using Diopter, a python wrapper around the following tools:
csmith
: This tool has been used to automatically generate C code for our analysis. By passing different flags we can specify the kinds of structures that the generated code should contain (such as arrays, strings, …).C-Reduce
: This is the main tool used in our analysis. It performs the process of delta debugging. More specifically it takes as inputs C source code as well as a True/False test function. Creduce then attempts to remove lines from the initial C code, while ensuring that the test function always returns True. This test funciton is an “interestingness” test. In general it often checks if a specific bug gets triggered, while in our instance it ensures that the size of the assembly generated using -Os is larger than that generated via other flags such as -O3 or with other compilers (CLANG).c sanitization checker
: We use this after each creduce step, to ensure that we have generated valid C code.
Example of a test function
Following is an example of one of the test functions used. We noticed that one of the best metrics to quantify the amount of remaining C code was obtained by counting the number of nodes in its Abstract Syntax Tree, instead of the number of characters or lines left in the C file.
# Compute the ratio using AST to quantify number of lines
def test_4(self, program: SourceProgram) -> bool:
root = ast_parser.get_ast_tree(program.code)
ratio = helper.get_tree_ratio(program,self.Os,root)
return ratio > self.bestRatio
This test function ensures that the ratio between -O3 and -Os increases in each succesful C-Reduce iteration.