Compilation Process in C

Learn via video course
FREE
View all courses
C++ Course: Learn the Essentials
C++ Course: Learn the Essentials
by Prateek Narang
1000
5
Start Learning
C++ Course: Learn the Essentials
C++ Course: Learn the Essentials
by Prateek Narang
1000
5
Start Learning
Topics Covered

Overview

The compilation process in C transforms a human-readable code into a machine-readable format. For C programming language, it happens before a program starts executing to check the syntax and semantics of the code. The compilation process in C involves four steps: pre-processing, compiling, assembling, and linking then, we run the obtained executable file to get an output on the screen.

What is a Compilation?

Before diving into the traditional definition of compilation, let us consider an example where there is a person A who speaks Hindi language and person A wants to talk to person B who only knows English language, so now either of them requires a translator to translate their words to communicate with each other. This process is known as translation, or in terms of programming, it is known as compilation process.

The compilation process in C is converting an understandable human code into a Machine understandable code and checking the syntax and semantics of the code to determine any syntax errors or warnings present in our C program. Suppose we want to execute our C Program written in an IDE (Integrated Development Environment). In that case, it has to go through several phases of compilation (translation) to become an executable file that a machine can understand.

What is a compilation

Compilation process in C involves four steps:

  1. Preprocessing
  2. Compiling
  3. Assembling
  4. Linking

Compilation process in C involves four steps:

Now, let us see all the steps involved in a compilation process in C in detail.

The Compilation Process in C

a. Pre-Processing

Pre-processing is the first step in the compilation process in C performed using the pre-processor tool (A pre-written program invoked by the system during the compilation). All the statements starting with the # symbol in a C program are processed by the pre-processor, and it converts our program file into an intermediate file with no # statements. Under following pre-processing tasks are performed :

i. Comments Removal

Comments in a C Program are used to give a general idea about a particular statement or part of code actually, comments are the part of code that is removed during the compilation process by the pre-processor as they are not of particular use for the machine. The comments in the below program will be removed from the program when the pre-processing phase completes.

ii. Macros Expansion

Macros are some constant values or expressions defined using the #define directives in C Language. A macro call leads to the macro expansion. The pre-processor creates an intermediate file where some pre-written assembly level instructions replace the defined expressions or constants (basically matching tokens). To differentiate between the original instructions and the assembly instructions resulting from the macros expansion, a '+' sign is added to every macros expanded statement.

Macros Examples:

Defining a value

Defining an expression

iii. File inclusion

File inclusion in C language is the addition of another file containing some pre-written code into our C Program during the pre-processing. It is done using the #include directive. File inclusion during pre-processing causes the entire content of filename to be added to the source code, replacing the #include<filename> directive, creating a new intermediate file.

Example: If we have to use basic input/output functions like printf() and scanf() in our C program, we have to include a pre-defined standard input output header file i.e. stdio.h.

iv. Conditional Compilation

Conditional compilation is running or avoiding a block of code after checking if a macro is defined or not (a constant value or an expression defined using #define). The preprocessor replaces all the conditional compilation directives with some pre-defined assembly code and passes a newly expanded file to the compiler. Conditional compilation can be performed using commands like #ifdef, #endif, #ifndef, #if, #else and #elif in a C Program. Example :

  • Printing the AGE macro, if AGE macro is defined, else printing Not Defined and ending the conditional compilation block with an #endif directive.

OUTPUT:

You can run and check your code here. (IDE by InterviewBit)

Explanation:

#ifdef directive checks if the macro AGE is defined or not, and as we have commented the #define statement the #ifdef AGE block of code will not execute and control flow will move to the #else block and Not Defined will be printed on the output screen, #endif ensures that the conditional compilation block ends there.

Now let's see the below figure that shows how a pre-processor converts our source code file into an intermediate file. Intermediate file has an extension of .i, and it is the expanded form of our C program containing all the content of header files, macros expansion, and conditional compilation.

preprocessing start sample

b. Compiling

Compiling phase in C uses an inbuilt compiler software to convert the intermediate (.i) file into an Assembly file (.s) having assembly level instructions (low-level code). To boost the performance of the program C compiler translates the intermediate file to make an assembly file.

Assembly code is a simple English-type language used to write low-level instructions (in micro-controller programs, we use assembly language). The whole program code is parsed (syntax analysis) by the compiler software in one go, and it tells us about any syntax errors or warnings present in the source code through the terminal window.

The below image shows an example of how the compiling phase works.

compiling phase works

c. Assembling

Assembly level code (.s file) is converted into a machine-understandable code (in binary/hexadecimal form) using an assembler. Assembler is a pre-written program that translates assembly code into machine code. It takes basic instructions from an assembly code file and converts them into binary/hexadecimal code specific to the machine type known as the object code.

The file generated has the same name as the assembly file and is known as an object file with an extension of .obj in DOS and .o in UNIX OS.

The below image shows an example of how the assembly phase works. An assembly file area.s is translated to an object file area.o having the same name but a different extension.

assembler start sample

d. Linking

Linking is a process of including the library files into our program. Library Files are some predefined files that contain the definition of the functions in the machine language and these files have an extension of .lib. Some unknown statements are written in the object (.o/.obj) file that our operating system can't understand. You can understand this as a book having some words that you don't know, and you will use a dictionary to find the meaning of those words. Similarly, we use Library Files to give meaning to some unknown statements from our object file. The linking process generates an executable file with an extension of .exe in DOS and .out in UNIX OS.

The below image shows an example of how the linking phase works, and we have an object file having machine-level code, it is passed through the linker which links the library files with the object file to generate an executable file.

linker start sample

Example

C program to display Hello World! on the output screen.

OUTPUT:

You can run and check this program here. (IDE by InterviewBit)

Note:

This tiny Hello World! program has to go through several steps of the compilation process to give us the output on the screen.

Explanation:

  • To compile the above code, use this command in the terminal : gcc hello.c -o hello
  • First, the pre-processing of our C Program begins, comments are removed from the program, as there are no macros directives in this program so macro expansion doesn't happen, also we have included a stdio.h header file and during pre-processing, declarations of standard input/output functions like printf(), scanf() etc. is added in our C Program. During the compilation phase of our program, all the statements are converted into assembly-level instructions using the compiler software.
  • Assembly level instructions for the above program (hello.s file) :
  • You can get the above hello.s file using the command: g++ -S hello.c in the terminal.
  • hello.s file is converted into binary code using the assembler program and generates an object file hello.obj in DOS and hello.o in UNIX OS.
  • Now, the linker adds required definitions into the object file using the library files and generates an executable file hello.exe in DOS and hello.out in UNIX OS.
  • When we run hello.exe/hello.out, we get a Hello World! output on the screen.

Flow Diagram of the Program

Let us look at the flow diagram of a program in the compilation process in C :

Compilation Flow Diagram

  • We have a C Program file with an extension of .c i.e. hello.c file.
  • Step 1 is preprocessing of header files, all the statements starting with # (hash symbol) and comments are replaced/removed during the pre-processing with the help of a pre-processor. It generates an intermediate file with .i file extension i.e. a hello.i file.
  • Step 2 is a compilation of hello.i file. Compiler software translates the hello.i file to hello.s with assembly level instructions (low-level code).
  • Step 3, assembly-level code instructions are converted into machine-understandable code (binary/hexadecimal form) by the assembler. The file generated is known as the object file with an extension of .obj/.o i.e. hello.obj/hello.o file.
  • Step 4, Linker is used to link the library files with the object file to define the unknown statements. It generates an executable file with .exe/.out extension i.e. a hello.exe/hello.out file.
  • Next, we can run the hello.exe/hello.out executable file to get the desired output on our output window, i.e., Hello World!.

Conclusion

  • Compilation process in C is also known as the process of converting Human Understandable Code (C Program) into a Machine Understandable Code (Binary Code))
  • Compilation process in C involves four steps: pre-processing, compiling, assembling, and linking.
  • The preprocessor tool helps in comments removal, macros expansion, file inclusion, and conditional compilation. These commands are executed in the first step of the compilation process. Compiler software helps boost the program's performance and translates the intermediate file to an assembly file.
  • Assembler helps convert the assembly file into an object file containing machine-level code.
  • Linker is used for linking the library file with the object file. It is the final step in compilation to generate an executable file.

See Also: