Memory Layout in C
After compiling a C program, a binary executable file(.exe) is created, and when we execute the program, this binary file loads into RAM in an organized manner. After being loaded into the RAM, memory layout in C Program has six components which are text segment, initialized data segment, uninitialized data segment, command-line arguments, stack, and heap. Each of these six different segments stores different parts of code and have their own read, write permissions. If a program tries to access the value stored in any segment differently than it is supposed to, it results in a segmentation fault error.
When we execute a C program, the executable code of the file loads into RAM in an organized manner. Computers do not access program instructions directly from secondary storage because the access time of secondary storage is longer when compared to that of RAM. RAM is faster than secondary storage but has a limited storage capacity, so it is necessary for programmers to utilize this limited storage efficiently. Knowledge of memory layout in C is helpful to programmers because they can decide the amount of memory utilized by the program for its execution.
A C program memory layout in C mainly comprises six components these are heap, stack, code segment, command-line arguments, uninitialized and initialized data segments. Each of these segments has its own read and write permissions. A segmentation fault occurs when a program tries to access any of the segments in a way that is not allowed, which is also a common reason for the program to crash.
Diagram for memory structure of C
The diagram mentioned below shows a visual representation of how RAM loads a program written in C into several segments.
Let us discuss each of these data segments in detail.
- After we compile the program, a binary file generates, which is used to execute our program by loading it into RAM. This binary file contains instructions, and these instructions get stored in the text segment of the memory.
- Text segment has read-only permission that prevents the program from accidental modifications.
- Text segment in RAM is shareable so that a single copy is required in the memory for frequent applications like a text editor, shells, etc.
Initialized data segment
Initialized data segment or data segment is part of the computer's virtual memory space of a C program that contains values of all external, global, static, and constant variables whose values are initialized at the time of variable declaration in the program. Because the values of variables can change during program execution, this memory segment has read-write permission. We can further classify the data segment into the read-write and read-only areas. const variable comes under the read-only area. The remaining types of variables come in the read-write area. For example,
Here, the pointer variable hello comes under the read-write area, and the value of the string literal "Data segment" lies comes under initialized read-only data segment.
In this example, variables global_var and pointer hello are declared outside the scope of main() function because of which they are stored in the read-write part of the initialized data segment but, global variable global_var2 is declared with the keyword const and hence it is stored in the read-only part of initialized data segment. Static variables like a are also stored in this part of the memory.
Uninitialized data segment
An uninitialized data segment is also known as bss (block started by symbol). The program loaded allocates memory for this segment when it loads. Every data in bss is initialized to arithmetic 0 and pointers to null pointer by the kernel before the C program executes. BSS also contains all the static and global variables, initialized with arithmetic 0. Because values of variables stored in bss can be changed, this data segment has read-write permissions.
Here, both the variables global_variable and static_variables are uninitialized. Hence they are stored in the bss segment in the memory layout in C. Before the program execution begins, these values are initialized with value 0 by the kernel. This can be verified by printing the values of the variable as shown in the program.
The stack segment follows the LIFO (Last In First Out) structure and grows down to the lower address, but it depends on computer architecture. Stack grows in the direction opposite to heap. Stack segment stores the value of local variables and values of parameters passed to a function along with some additional information like the instruction's return address, which is to be executed after a function call.
Stack pointer register keeps track of the top of the stack and its value change when push/pop actions are performed on the segment. The values are passed to stack when a function is called stack frame. Stack frame stores the value of function temporary variables and some automatic variables that store extra information like the return address and details of the caller's environment (memory registers). Each time function calls itself recursively, a new stack frame is created, which allows a set of variables of one stack frame to not interfere with other variables of a different instance of the function. This is how recursive functions work.
Let us see an example to understand the variables stored in the stack memory segment.
Here, all the variables are stored in a stack memory layout in C because they are declared inside their parent function's scope. These variables only take the space in memory till their function is executed. For example, in the above code, the first main() starts its execution, and a stack frame for main() is made and pushed into the program stack with data of variables local and name. Then in main, we call foo, then another stack frame is made and pushed for it separately, which contains data of variables a and b. After the execution of foo, its stack frame is popped out, and its variable gets unallocated, and when the program ends, main's stack frame also gets popped out.
Heap is used for memory which is allocated during the run time (dynamically allocated memory). Heap generally begins at the end of bss segment and, they grow and shrink in the opposite direction of the Stack. Commands like malloc, calloc, free, realloc, etc are used to manage allocations in the heap segment which internally use sbrk and brk system calls to change memory allocation within the heap segment. Heap data segment is shared among modules loading dynamically and all the shared libraries in a process.
Here, we create a variable of data type char by allocation memory of size 1 byte (equal to size of char in C) at the time of program execution. Because the variable is created dynamically such variables are initialized in the heap segment of the memory.
When a program executes with arguments passed from the console like argv and argc and other environment variables, the value of these variables gets stored in this memory layout in C.
This example explains how command-line arguments are passed and used in the program. Here, this segment stores the value of variables argc and argv where argc stores the number of arguments passed and argv stores the value of actual parameters along with file name.
The size command is used to check the sizes (in bytes) of these different memory segments. Let us see some examples to visualize the memory layout in C, in detail.
A simple C program
Let us now add a global variable
Adding one global variable increased memory allocated by data segment (Initialized data segment) by 4 bytes, which is the actual memory size of 1 variable of type integer (sizeof(global_variable)).
Let us add one uninitialized static variable that should increase memory occupied by bss.
But if we add a static variable with an initialized value, it will be stored in the data segment.
Similarly, if we add a global variable with an uninitialized value, it will be stored in bss.
Also, we have classified initialized data segment into two parts:
- read-only area
- read-write area
Let us see two C programs to understand this classification.
In the first example, the global variable str is a character array and we can change its value but in the second case, we can not change the character of the string because variable str is string literal and stored in the read-write area of the data segment because of which the second program throws an error.
- When a program in C is executed, binary code is loaded into RAM and is segregated into five different areas which are text segment, initialized data segment, uninitialized data segment, command-line arguments, stack, and heap.
- Code instructions are stored in text segment and this is shareable memory. If arguments are passed when code is executed from the console, the values of arguments are stored in the command line arguments area in memory.
- Initialized data segment stores global, static, external variables that are initialized beforehand in the program. Uninitialized data segment or bss contains all the uninitialized global and static variables.
- Stack stores all local variables and arguments of functions. They also store a function return address of the instruction, which is to be executed after a function call.
- Stack and heap grow opposite to each other.
- Heap stores all dynamically allocated memory in the program and is managed by commands like malloc, calloc, free etc.