El: Advanced Digital Forensics

Abhilasha
Jul 14, 2024
3 min read

Understanding ELF Files in Linux OS

Overview

ELF (Executable and Linking Format) is the primary executable file format used on Linux systems. It is used for user applications, shared libraries, kernel modules, and even the kernel itself. To effectively perform memory forensics and malware analysis on Linux systems, familiarity with the ELF format is essential.

ELF Header

The ELF header, located at the very beginning (offset 0) of a file, is represented by an Elf32_Ehdr or Elf64_Ehdr data structure for 32-bit or 64-bit files, respectively. Key structure members include:

e_ident: Holds file identification information.
The first four bytes are \x7fELF.
The fifth byte indicates whether the file is 32- or 64-bit.
The sixth byte indicates whether the file is little or big endian.
e_type: Indicates the file type (e.g., executable, relocatable image, shared library, core dump).
e_entry: Holds the program entry point address.
e_phoff, e_phentsize, e_phnum: Indicate the file offset, entry size, and number of program header entries.
e_shoff, e_shentsize, e_shnum: Indicate the file offset, entry size, and number of section header entries.
e_shstrndx: Stores the index within the section header table of the strings that map to section names.

Using readelf

The readelf command, distributed with binutils, is used to display information about ELF files.

readelf -h: Displays the ELF header information. bash Copy code readelf -h /bin/ls This command reveals the ELF magic bytes (\x7fELF), the entry point, and the number of program and section headers.

ELF Sections

An ELF binary is divided into multiple sections, each represented by an Elf32_Shdr or Elf64_Shdr structure.

sh_name: Index into the string table of section names.
sh_addr: Virtual address of where the section will be mapped.
sh_offset: Offset within the file.
sh_size: Size of the section in bytes.
readelf -S: Displays section headers, including name, type, address, offset, and size. bash Copy code readelf -S /bin/ls

Common ELF Sections

.text: Contains the executable code.
.data: Contains read/write data (variables).
.rdata: Contains read-only data.
.bss: Contains variables initialized to zero.
.got: Contains the global offset table.

Common Section Types

PROGBITS: Sections loaded into memory upon execution.
NOBITS: Sections without data in the file, but allocated in memory (e.g., .bss).
STRTAB: Holds a string table.
DYNAMIC: Holds dynamic linking information.
HASH: Contains a hash table of symbols.

Packed ELFs

Packed and obfuscated executables often lack section headers, making reverse engineering difficult. The upx tool can be used to pack executables.

Using UPX: bash Copy code cp /bin/ls ls_upx upx ls_upx ./ls_upx /etc readelf -h ls_upx

Program Headers

The program header, starting at e_phoff, is represented by an Elf32_Phdr or Elf64_Phdr structure. It maps the file and its sections into memory at runtime.

p_type: Describes the segment type.
p_vaddr, p_offset: Virtual address and offset within the file.
p_filesz, p_memsz: Segment size on disk and in memory.
readelf -l: Displays the program headers. bash Copy code readelf -l /bin/ls

UPX Effects on Program Headers

Packing with UPX often removes section headers, converting dynamically linked binaries to statically linked ones. This change is reflected in the program headers.

Shared Libraries

Shared libraries (.so files) can be dynamically loaded into applications. Attackers may inject shared libraries into processes for malicious purposes.

readelf -d /bin/bash | grep NEEDED: Lists required shared libraries.
ldd /bin/bash: Lists shared libraries used by the executable.

Linux Address Translation

Virtual address translation on Linux differs from Windows and Mac OS X, enabling easy identification of page tables for address translation.

Processes in Memory

Each Linux process is represented by a task_struct structure, containing information such as file descriptors, memory maps, credentials, etc.

Important Members:
tasks: Reference to the list of active processes.
mm: Memory management data.
pid: Process ID.
parent: Reference to the parent process.
children: List of spawned processes.
cred: Credential information.
comm: Process name.
start_time: Process creation time.

Enumerating Processes

The task_struct structures are stored in kmem_cache, using SLAB or SLUB allocators.

Active Process List

The kernel maintains a list of active processes via the init_task variable, which points to the task_struct of the swapper process (PID 0).

Volatility Framework: python Copy code python vol.py --profile=LinuxDebian-3_2x64 -f debian.lime linux_pslist