C compilation process

C compilation process

Have you ever wondered what occurs behind the scenes when a C program is compiled?

Everything begins with a single command: "gcc main.c". But what precisely happens when you press enter? In this post, we'll go over the C compilation process step by step to see what happens, and this will give you a good knowledge of what is going on under the hood, and how your code is converted into an executable program, from pre-processing to linking.

Let's roll up our sleeves and dive in!

Gcc

You might be curious about what gcc is. gcc stands for GNU compiler collection. It's a toolchain responsible for translating your programming language source code into machine code, and the process involved in this translation (compilation) is what we will be discussing in this article.

More about the gcc compiler, here.

When we execute a C program through the GNU C Compiler, several things happen. Our program passes through four stages: preprocessing, compiling, assembling, and linking before we get an executable file.

Preprocessing

Here, our code is first preprocessed by a preprocessor which discards all the comments in our code, replaces all the preprocessor directives (e.g., #include, #define,) with their corresponding codes, includes the header files and expands all macros by replacing the macro names with their actual value. if you want to see the output of this preprocessed stage, execute your code with gcc -E. e.g., gcc -E main.c. The output is typically quite large, as it contains all the code from the included header files and other code that was included through preprocessor directives. if you don't want the verbose output, you can pipe the output into the Linux tail command, gcc -E main.c | tail. This will only display the last 10 lines of the preprocessed code to stdout.

Compiling

The output of the preprocessing stage is subsequently passed to the compiler which generates an assembly or machine code that can be executed by the CPU. these assembly codes are based on the instructions in your source code. to get the assembly code or machine instruction of your code in this stage, you execute your program with gcc -S which will provide a .s file. so, executing gcc -S main.c will by default produce a main.s file containing the assembly code or machine instruction that will be executed by the CPU at run time.

Assembling

At this stage, our assembly code inside main.s file is converted to an object code. this is similar to machine code or binary code and it is the most basic level of programming detail that a programmer can see. We may produce object code with gcc -c main.c, which will generate an object file, main.o by default, containing the same instructions in machine-readable, binary form.

Linking

This is the final stage.

While writing a program, you will probably have your source codes across multiple files and also used some library functions in writing those codes. when these files go through the 3 previous stages, we will be left with multiple object files. the job of the linker is to link all the libraries used, and then merge and compress all these files into a single executable file. a.out is the name of the output file or .exe file for a Windows machine. This is generated by typing gcc main.c but if you want to compile and create a preferred executable file, then you add -o after the source file followed by the preferred executable file name. e.g.gcc main.c -o myExeFile, this will generate an executable file named myExeFile.

Conclusion

In conclusion, understanding the C compilation process is essential for any programmer who wants to have full control over their code and to be able to optimize it for performance. The process involves several stages, including preprocessing, compiling, assembling, and linking, each with its own purpose in converting the C code into an executable program. The preprocessor stage removes comments, expands macros, and includes header files, while the compiler generates assembly or machine code. The assembler then converts the assembly code into object code, and the linker merges all object files into a single executable file. By having a clear understanding of the C compilation process, developers can better troubleshoot their code, optimize their programs, and gain deeper insights into the workings of their computer systems.