What exactly is a "program" and what does it include?

posted 6 min read

In this tutorial, I wanna talk about "program" and "process" terms that are fundamental elements in computer-science. This two terms generally are used exchangeably. But in technically, both are so different. Let's begin with explaining these terms and then deeply see the what it includes:

  • A program is a file containing a range of information that describes how to construct a "process" at run time [1].
  • A process is an instance of an executing program [1].

These are the formal definations. If I try to express with my words:

A program (or binary) is a file that includes the machine code + metadata + debug information (if the program compiled with -g flag) and process is a abstraction point that kernel creates and then allocates hardware resources like RAM.

First thing that you should know is the program format. In recently, there are two type of program formats that the compilers generate:

  • ELF (Executable and Linkable Format)
  • PE (Portable Executable)

ELF Format

ELF is the standard format for UNIX/Linux systems and PE for Windows. I'm currently on Ubuntu 24.10 (x86_64) so that I will explain and use ELF format. As you guest, I don't know really PE format. But there is a good reference handbook that you can look at. It's "Practical Binary Analysis" written by Dennis Andriesse.

After explained the program formats, right now, let's look at the inside of the program:

In general, any program includes these:

  • Executable header
  • Program headers
  • Sections
  • Section headers

Every ELF file starts with an executable header, which is just a structured series of bytes telling you that it's and ELF file, what kind of ELF file it is, and where in the files to find all the other contents [2].

You can exactly see the content of this header with readelf command:

$ readelf -h ./copy
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1160
  Start of program headers:          64 (bytes into file)
  Start of section headers:          17496 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         37
  Section header string table index: 36

In there, first four digits (7f 45) of Magic series define the program format. You will see the probably different series if you are on Windows. Other properties is the self-explaining.

The code and data in an ELF program are logically divided into sections. Each section includes a specific part of your source code. And the sections in the binary are contained in the section header table. Some sections are used to execute the machine instructions and some for other information (like symbol table used by debugger). Let's look at the section headers:

$ readelf --sections --wide ./copy
There are 37 section headers, starting at offset 0x4458:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        0000000000000318 000318 00001c 00   A  0   0  1
  [ 2] .note.gnu.property NOTE            0000000000000338 000338 000030 00   A  0   0  8
  [ 3] .note.gnu.build-id NOTE            0000000000000368 000368 000024 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            000000000000038c 00038c 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000000000003b0 0003b0 000028 00   A  6   0  8
  [ 6] .dynsym           DYNSYM          00000000000003d8 0003d8 000180 18   A  7   1  8
  [ 7] .dynstr           STRTAB          0000000000000558 000558 0000d3 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          000000000000062c 00062c 000020 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         0000000000000650 000650 000030 00   A  7   1  8
  [10] .rela.dyn         RELA            0000000000000680 000680 0000d8 18   A  6   0  8
  [11] .rela.plt         RELA            0000000000000758 000758 0000d8 18  AI  6  24  8
  [12] .init             PROGBITS        0000000000001000 001000 00001b 00  AX  0   0  4
  [13] .plt              PROGBITS        0000000000001020 001020 0000a0 10  AX  0   0 16
  [14] .plt.got          PROGBITS        00000000000010c0 0010c0 000010 10  AX  0   0 16
  [15] .plt.sec          PROGBITS        00000000000010d0 0010d0 000090 10  AX  0   0 16
  [16] .text             PROGBITS        0000000000001160 001160 00043a 00  AX  0   0 16
  [17] .fini             PROGBITS        000000000000159c 00159c 00000d 00  AX  0   0  4
  [18] .rodata           PROGBITS        0000000000002000 002000 000040 00   A  0   0  4
  [19] .eh_frame_hdr     PROGBITS        0000000000002040 002040 000034 00   A  0   0  4
  [20] .eh_frame         PROGBITS        0000000000002078 002078 0000a8 00   A  0   0  8
  [21] .init_array       INIT_ARRAY      0000000000003d78 002d78 000008 08  WA  0   0  8
  [22] .fini_array       FINI_ARRAY      0000000000003d80 002d80 000008 08  WA  0   0  8
  [23] .dynamic          DYNAMIC         0000000000003d88 002d88 0001f0 10  WA  7   0  8
  [24] .got              PROGBITS        0000000000003f78 002f78 000088 08  WA  0   0  8
  [25] .data             PROGBITS        0000000000004000 003000 000010 00  WA  0   0  8
  [26] .bss              NOBITS          0000000000004020 003010 000010 00  WA  0   0 32
  [27] .comment          PROGBITS        0000000000000000 003010 00002b 01  MS  0   0  1
  [28] .debug_aranges    PROGBITS        0000000000000000 00303b 000030 00      0   0  1
  [29] .debug_info       PROGBITS        0000000000000000 00306b 000472 00      0   0  1
  [30] .debug_abbrev     PROGBITS        0000000000000000 0034dd 000184 00      0   0  1
  [31] .debug_line       PROGBITS        0000000000000000 003661 00017e 00      0   0  1
  [32] .debug_str        PROGBITS        0000000000000000 0037df 00031a 01  MS  0   0  1
  [33] .debug_line_str   PROGBITS        0000000000000000 003af9 00012f 01  MS  0   0  1
  [34] .symtab           SYMTAB          0000000000000000 003c28 000438 18     35  18  8
  [35] .strtab           STRTAB          0000000000000000 004060 00028c 00      0   0  1
  [36] .shstrtab         STRTAB          0000000000000000 0042ec 00016a 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

The main topic of this tutorial is the sections. So I wanna explain in deeply each one:

  • .init and .fini: The machine code in these sections are used before/after the main() function in program. Don't forget that we're not writing this sections. Compiler itself creates these to handle low-level (or hardware-specific) works. I don't know actually what both of them do!
  • .text: It is the main area where we focus on binary analysis and reverse-engineering stuffs and includes the your code in machine instruction representation. If you look at there, you see that it has voluminous area and thousands of lines of machine code even if it is small program.
  • .bss, .data and .rodata: These sections are writable and used to hold various variable in source code. .bss includes the static and global uninitialized and .data includes the static and global initialized variables. Also .rodata consists of variables defined with const keyword in code.
  • .dynamic: This is the "road map" for the kernel and dynamic linker when loading and setting up an ELF binary for executions.
  • .init_array and .fini_array: These contains an array of pointers to functions to use as constructors/destructors. You maybe know that how to create constructor and destructor using compiler specific-tool, line one __attribute__((constructor)) void run_before_main(void);.
  • .shstrtab, .symtab, .strtab, .dynsym and .dynstr: The .shstrtab section is simply an array of NULL-terminated strings that contain the names of all the sections in binary. The .symtab contains a symbol table and .strtab contains the symbolic names. The .dynsym and .dynstr are analogous to .symtab and .strtab, except that they contain symbols and strings needed for dynamic linking rather than static linking.
  • .debug_: These sections are used primarily by debugger (GDB) so that it has not include and machine code but just metadata about program that debugger can use later. If you don't compile the program with -g option, these sections will not there.

Below is a part of .text section:

$ objdump -j .text -d ./copy

./copy:     file format elf64-x86-64

(...)

Disassembly of section .text:

0000000000001249 <main>:
    1249:f3 0f 1e fa          endbr64
    124d:55                   push   %rbp
    124e:48 89 e5             mov    %rsp,%rbp
    1251:48 81 ec 40 04 00 00 sub    $0x440,%rsp
    1258:89 bd cc fb ff ff    mov    %edi,-0x434(%rbp)
    125e:48 89 b5 c0 fb ff ff mov    %rsi,-0x440(%rbp)
    1265:64 48 8b 04 25 28 00 mov    %fs:0x28,%rax
    126c:00 00 
    126e:48 89 45 f8          mov    %rax,-0x8(%rbp)
    1272:31 c0                xor    %eax,%eax
    1274:83 bd cc fb ff ff 03 cmpl   $0x3,-0x434(%rbp)
    127b:75 24                jne    12a1 <main+0x58>
    127d:48 8b 85 c0 fb ff ff mov    -0x440(%rbp),%rax
    1284:48 83 c0 08          add    $0x8,%rax
    1288:48 8b 00             mov    (%rax),%rax
    128b:48 8d 15 72 0d 00 00 lea    0xd72(%rip),%rdx        # 2004 <_IO_stdin_used+0x4>

(...)

In here, we see the three columns that show the machine instructions of your program. At left side, you see the addresses of machine instructions. When running the program, stack pointer tracks these addresses. At center, you see the actual machine instructions corresponding to your code. Program counter register tracks this machine instructions. At right side, you see the assembly-level representations. When dealing with reverse-engineering, we inspect these to understand what program does.

Virtual Memory Layout

After discussed the ELF program format, right now, I will explain the virtual memory layout of a process (running program instance). When you run the program, kernel creates a memory layout as below:

I've explained the .text, .data, .bss, .rodata sections previously. Apart from these, you see the stack and heap areas.

Both are the memory areas that kernel uses to execute the machine instructions. Stack area has LIFO (Last-In First-Out) model. When executing the program, kernel pushs/pops the instructions in here. So it grows up to lower address space. Heap area is completely different! It has FIFO (First-In First-Out) model. Kernel uses this memory space for dynamic allocation with malloc()/free() function family. And it grows up to upper address space. The heap address border is claimed by brk() function.

Until here, I wanna give you a general overview about programs and processes. If you want to dive in more, you can check and look at the below references.

[1]. Kerrisk M., The Linux Programming Interface, no starch press, San Francisco, page 113.

[2]. Andriesse D., Practical Binary Analysis, no starch press, San Francisco, page 33.

If you read this far, tweet to the author to show them you care. Tweet a Thanks

1 Comment

0 votes
0

More Posts

Understanding embedded flashing and debugging

Can Gulmez - Oct 14

The Divooka Way - Part 1: Philosophically, How Exactly is Divooka Different and Useful

Methodox - Jul 9

What Is Threat Intelligence and How Does It Work in 2025?

Mahadi Islam - Sep 27

What is StarkNet and How Does It Differ from zkSync?

Web3Dev - Feb 26

Kernel TLS, NIC Offload, and Socket Sharding: What’s New and Who Uses It?

Ozkan Pakdil - Aug 17
chevron_left