Joe1sn's Cabin

【免杀】PE文件格式解析

公众号:https://mp.weixin.qq.com/s/RrEZJDeKXdVqtjZPDs7AyQ

或许我们的公众号会有更多你感兴趣的内容

img

PE文件格式解析

假如说我们要自己写一个exe文件的加载器,或者你曾好奇过反汇编软件的原理,这就需要对exe对应的PE(Portable Executable)文件格式加以理解。这里以windows10中自带的notepad.exe进行讲解。

​ 这里:https://learn.microsoft.com/zh-cn/windows/win32/debug/pe-format
是微软官方对PE格式的官方文档,读者可自行了解。

如何确定是一个PE文件-DOS头

对于一个PE文件,首先是他的文件头,也叫DOS 头,结构体定义如下

1
2
3
4
5
typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
WORD e_magic; // Magic number
//....
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
  • e_magic:魔数,如果为MZ这个值说明DOS头正确
  • e_lfanew:指向下一个头,即NT头的位置,计算方式:NT_HEADER=FileStart+e_lfanewNT\_HEADER = FileStart+e\_lfanew

image-20250103095435314

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const std::string filePath = "notepad.exe";
std::ifstream inputFile(filePath, std::ios::in | std::ios::binary);
if (!inputFile.is_open()) {
std::cerr << "cant open: " << filePath << std::endl;
return 1;
}

std::ostringstream peFileString;
peFileString << inputFile.rdbuf();
std::string fileContent = peFileString.str();
inputFile.close();

IMAGE_DOS_HEADER* dosHeader = \
(PIMAGE_DOS_HEADER)(DWORD64)(&fileContent[0]);
std::cout << std::hex;
std::cout << "PE e_magic : 0x" << dosHeader->e_magic << "\n";
std::cout << "PE e_lfanew : 0x" << dosHeader->e_lfanew << "\n";

如何判断程序位数,找到代码段等

上面说到我们通过e_lfanew找到了NT Header,这里我们先假设他是一个64位程序的PE文件

1
2
3
4
5
typedef struct _IMAGE_NT_HEADERS64 {
DWORD Signature; //标签,说明这是NT头
IMAGE_FILE_HEADER FileHeader; //文件头,重要
IMAGE_OPTIONAL_HEADER64 OptionalHeader; //可选项头
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

这里主要功能是通过FileHeader来实现的

1
2
3
4
5
6
7
8
9
typedef struct _IMAGE_FILE_HEADER {
WORD Machine;
WORD NumberOfSections;
DWORD TimeDateStamp;
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

判断程序位数-NT头

首先,使用IMAGE_FILE_HEADER.Machine可以很轻松的判断

1
2
3
4
5
6
7
8
IMAGE_NT_HEADERS64* ntTempHeader = \
(PIMAGE_NT_HEADERS64)(DWORD64)(&fileContent[dosHeader->e_lfanew]);
SetConsoleOutputCP(CP_UTF8);
std::cout << "程序位数 : ";
if (ntTempHeader->FileHeader.Machine == 0x8664)
std::cout << "64位\n";
else if (ntTempHeader->FileHeader.Machine == 0x014c)
std::cout << "32位\n";

image-20250103100814694

更多架构的类型说明在:https://learn.microsoft.com/zh-cn/windows/win32/debug/pe-format#machine-types

解析代码段-段头部

这里由于已知是64位程序继续按照64位的讲下去

  1. 使用IMAGE_FILE_HEADER.NumberOfSections获得区段数量
  2. 找到各个section的位置,计算公式:section[i]=NT_HEADER+ntHeadSize+isectionHeadSizesection[i]=NT\_HEADER+ntHeadSize+i*sectionHeadSize
1
2
3
4
5
6
7
8
9
10
std::cout << "区段数量: " << ntTempHeader->FileHeader.NumberOfSections << "\n";
int nt_head_file_start = dosHeader->e_lfanew;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
std::cout << " Section[" << i << "] address: 0x" <<
nt_head_file_start \
+ sizeof(IMAGE_NT_HEADERS64) \
+ i * sizeof(IMAGE_SECTION_HEADER)
<< "\n";
}

image-20250103101928624

再对每个IMAGE_SECTION_HEADER进行解析就能得到相关信息

那么关于IMAGE_SECTION_HEADER

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

这里来到第一个难点:VirtualAddress,简称 VA,这里用.text段也即第一个section来探索

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int nt_head_file_start = dosHeader->e_lfanew;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
DWORD64 sectionFileAddr = \
nt_head_file_start \
+ sizeof(IMAGE_NT_HEADERS64) \
+ i * sizeof(IMAGE_SECTION_HEADER);

IMAGE_SECTION_HEADER* sectionHeader = \
(PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);
std::cout << " Section[" << i << "] address: 0x" << \
sectionFileAddr << "\n";
std::cout << " Section[" << i << "] name : " << \
sectionHeader->Name << "\n";
std::cout << " Section[" << i << "] VA : " << \
sectionHeader->VirtualAddress << "\n";
}

image-20250103104342974

在CFF Explorer中,这些区段的低位和我们解析的地址是相同的,说明:VA是程序运行后,相对于rebase的偏移,在微软的文档中是这样说明的:

加载到内存中的节的第一个字节的地址,相对于映像基。 对于对象文件,这是应用重定位之前第一个字节的地址。

同样的VirtualSize选项可以得到改区段的大小,从而得到区段结束位置

1
Misc.VirtualSize

加载到内存中的节的总大小(以字节为单位)。 如果此值大于 SizeOfRawData 成员,则节将填充零。 此字段仅对可执行映像有效,对于对象文件,应设置为 0。

利用在IMAGE_SECTION_HEADER中有一个PointerToRawData可以找到相关数据在文件中的位置,

FileAddr=section.PointerToRawDataFileAddr=section.PointerToRawData,就是直接从文件最开始相加就得到位置了

1
2
3
4
5
6
7
8
9
10
11
12
std::cout << "区段数量: " << ntTempHeader->FileHeader.NumberOfSections << "\n";
int nt_head_file_start = dosHeader->e_lfanew;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
DWORD64 sectionFileAddr = nt_head_file_start + sizeof(IMAGE_NT_HEADERS64) + i * sizeof(IMAGE_SECTION_HEADER);
IMAGE_SECTION_HEADER* sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);
std::cout << " Section[" << i << "] address: 0x" << sectionFileAddr << "\n";
std::cout << " name : " << sectionHeader->Name << "\n";
std::cout << " VA : 0x" << sectionHeader->VirtualAddress << "\n";
std::cout << " ptr2RawData : 0x" << sectionHeader->PointerToRawData << "\n";
}

image-20250103113642421

image-20250103113625530

image-20250103113815257

如何找到导入的函数和DLL-导入表

在编程中会使用到其他dll文件的函数,例如kernel.dllCRuntimeLib.dll,PE文件通过一个导入目录(IAT)进行索引

在编写程序时,我们往往需要在程序中内置部分资源,甚至在上述的section遍历中我们也发现了许多的section,每一个不同的section都会存储不同类型的数据,比如

  • 代码段:包含可执行的程序代码。
  • 数据段:包含初始化数据。
  • 导入表:指定外部符号,告诉操作系统需要哪些外部函数。
  • 导出表:列出了程序导出的函数,供其他程序调用。
  • 资源表:包含程序使用的各种资源,如图标、对话框、字符串等。

回到一开始 NT头的OptionalHeader部分

1
2
3
4
5
6
typedef struct _IMAGE_OPTIONAL_HEADER64 {
//...
DWORD AddressOfEntryPoint;
//...
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
  • AddressOfEntryPoint:指向入口点函数(相对于图像基址)的指针
  • DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]:指向数据目录中第一 个IMAGE_DATA_DIRECTORY 结构的指针

DataDirectory 存储了这些不同区域在文件中的位置和大小,因此它是一个“目录”或“索引”,指示每个数据段的位置。关于具体类型在该数组下的取值可以参考:https://learn.microsoft.com/zh-cn/windows/win32/api/winnt/ns-winnt-image_optional_header32

具体的IMAGE_DATA_DIRECTORY结构体如下:

1
2
3
4
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

选择几个查看一下

1
2
3
4
std::cout << "---OptionalHeader.DataDirectories\n";
std::cout << "基本重定位表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress << "\n";
std::cout << "全局指针的相对虚拟地址 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_GLOBALPTR].VirtualAddress << "\n";
std::cout << "导入地址表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IAT].VirtualAddress << "\n";

image-20250103111009787

image-20250103120523637

**注意:**这里的IMAGE_DIRECTORY_ENTRY_IATIMAGE_DIRECTORY_ENTRY_IMPORT是两个东西

image-20250103125031177

后续使用的是IMAGE_DIRECTORY_ENTRY_IMPORT

貌似依然是在内存中的虚拟地址,我们姑且将其称为DVA(datadirectory virtual adresss)。

那么如何通过IMAGE_OPTIONAL_HEADER64找到IAT呢?首先IAT的相关信息是一定存在文件中的,就不可避免地要计算偏移,将这个偏移设置为RVA

结合IMAGE_SECTION_HEADER中有一个PointerToRawData

RVA=DVAVA+PtrToRawDataRVA=DVA-VA+PtrToRawData

这里以导入地址表为例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
DWORD64 iatDVA = ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;

DWORD64 iatVA = 0;
DWORD64 iatRVA = 0;
DWORD64 sectionFileAddr;
IMAGE_SECTION_HEADER* sectionHeader;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
sectionFileAddr = nt_head_file_start + sizeof(IMAGE_NT_HEADERS64) + i * sizeof(IMAGE_SECTION_HEADER);
sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);

if (sectionHeader->VirtualAddress > iatDVA)
break;
iatVA = sectionHeader->VirtualAddress;
iatRVA = iatDVA - iatVA + sectionHeader->PointerToRawData;
}

如何使用RVA?

RVA就是从文件开始IAT的基地址,所有IMAGE_IMPORT_DESCRIPTOR结构体都是从此开始找的。但是并没有相关变量说明了总数,那么只能依靠IMAGE_IMPORT_DESCRIPTOR的变量名来查找了,自然想到的就是Name为0就退出

对于IMAGE_IMPORT_DESCRIPTOR结构体

1
2
3
4
5
6
7
8
9
10
11
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
union {
DWORD Characteristics; // 0 for terminating null import descriptor
DWORD OriginalFirstThunk; // RVA to original unbound IAT (PIMAGE_THUNK_DATA)
} DUMMYUNIONNAME;
DWORD TimeDateStamp;
DWORD ForwarderChain; // -1 if no forwarders
DWORD Name;
DWORD FirstThunk; // RVA to IAT (if bound this IAT has actual addresses)
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;

那么想办法打印IMAGE_IMPORT_DESCRIPTOR的结构体名称,可以由公式NameOffset=NameVA+PtrToRawDataNameOffset = Name -VA+PtrToRawData,**这个公式很重要,**直接打印文件NameOffset偏移的数据即可

首先重新编写从VA到RVA的函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
DWORD64 dwVAToRVA(DWORD64 StartVA, DWORD64 nt_head_file_start, std::string& fileContent, PIMAGE_NT_HEADERS64 ntTempHeader) {

DWORD64 iatDVA = StartVA;

DWORD64 iatVA = 0;
DWORD64 RVA = 0;
DWORD64 sectionFileAddr;
IMAGE_SECTION_HEADER* sectionHeader;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
sectionFileAddr = nt_head_file_start + sizeof(IMAGE_NT_HEADERS64) + i * sizeof(IMAGE_SECTION_HEADER);
sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);

if (sectionHeader->VirtualAddress > iatDVA)
break;
iatVA = sectionHeader->VirtualAddress;
RVA = iatDVA - iatVA + sectionHeader->PointerToRawData;
}
return RVA;
}

在main函数中打印

1
2
3
4
5
6
7
8
9
10
size_t i = 0;
PIMAGE_IMPORT_DESCRIPTOR temp = PIMAGE_IMPORT_DESCRIPTOR((DWORD64)&fileContent[0] + iatRVA + i * sizeof(IMAGE_IMPORT_DESCRIPTOR));
for (i = 1; temp->Name != 0; i++)
{
DWORD64 VA = dwVAToRVA(temp->Name, nt_head_file_start, fileContent, ntTempHeader);
std::cout << " DLL: " << (char*)(&fileContent[VA]) << "\n";

temp = PIMAGE_IMPORT_DESCRIPTOR(&fileContent[0] + iatRVA + i * sizeof(IMAGE_IMPORT_DESCRIPTOR));

}

image-20250103132758136

同理,IMAGE_IMPORT_DESCRIPTOR中的FirstThunk也可以用于打印导入的函数名,也是将FirstThunk转为RVA,然后直接在源文件中查找RVA位置的字符串即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
size_t i = 0;
PIMAGE_IMPORT_DESCRIPTOR temp = PIMAGE_IMPORT_DESCRIPTOR((DWORD64)&fileContent[0] + iatRVA + i * sizeof(IMAGE_IMPORT_DESCRIPTOR));
for (i = 1; temp->Name != 0; i++)
{
DWORD64 VA = dwVAToRVA(temp->Name, nt_head_file_start, fileContent, ntTempHeader);
std::cout << " DLL: " << (char*)(&fileContent[VA]) << "\n";

VA = *PDWORD64(&fileContent[0] + dwVAToRVA(temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader));
for (size_t i = 1; VA != 0; i++)
{
PIMAGE_IMPORT_BY_NAME IatName = PIMAGE_IMPORT_BY_NAME(&fileContent[0] + dwVAToRVA(VA, nt_head_file_start, fileContent, ntTempHeader));
std::cout << " -Function " << IatName->Name << std::endl;
VA = *PDWORD64(&fileContent[0] + dwVAToRVA(temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader) + i * sizeof(DWORD64));
}

temp = PIMAGE_IMPORT_DESCRIPTOR(&fileContent[0] + iatRVA + i * sizeof(IMAGE_IMPORT_DESCRIPTOR));

}

image-20250103133415024

其他类型的IMAGE_DATA_DIRECTORY也是使用类似的方法就可以找到在静态二进制文件中的位置

最后

完整code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
#include <iostream>
#include <Windows.h>
#include <fstream>
#include <sstream>
#include <string>

DWORD64 dwVAToRVA(DWORD64 StartVA, DWORD64 nt_head_file_start, std::string& fileContent, PIMAGE_NT_HEADERS64 ntTempHeader) {

DWORD64 iatDVA = StartVA;

DWORD64 iatVA = 0;
DWORD64 RVA = 0;
DWORD64 sectionFileAddr;
IMAGE_SECTION_HEADER* sectionHeader;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
sectionFileAddr = nt_head_file_start + sizeof(IMAGE_NT_HEADERS64) + i * sizeof(IMAGE_SECTION_HEADER);
sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);

if (sectionHeader->VirtualAddress > iatDVA)
break;
iatVA = sectionHeader->VirtualAddress;
RVA = iatDVA - iatVA + sectionHeader->PointerToRawData;
}
return RVA;
}

int main() {
const std::string filePath = "notepad.exe";
std::ifstream inputFile(filePath, std::ios::in | std::ios::binary);
if (!inputFile.is_open()) {
std::cerr << "cant open: " << filePath << std::endl;
return 1;
}

std::ostringstream peFileString;
peFileString << inputFile.rdbuf();
std::string fileContent = peFileString.str();
inputFile.close();

IMAGE_DOS_HEADER* dosHeader = (PIMAGE_DOS_HEADER)(DWORD64)(&fileContent[0]);
std::cout << std::hex;
std::cout << "PE e_magic : 0x" << dosHeader->e_magic << "\n";
std::cout << "PE e_lfanew : 0x" << dosHeader->e_lfanew << "\n";

IMAGE_NT_HEADERS64* ntTempHeader = (PIMAGE_NT_HEADERS64)(DWORD64)(&fileContent[dosHeader->e_lfanew]);

SetConsoleOutputCP(CP_UTF8);
std::cout << "程序位数 : ";
if (ntTempHeader->FileHeader.Machine == 0x8664)
std::cout << "64位\n";
else if (ntTempHeader->FileHeader.Machine == 0x014c)
std::cout << "32位\n";

std::cout << "区段数量: " << ntTempHeader->FileHeader.NumberOfSections << "\n";
int nt_head_file_start = dosHeader->e_lfanew;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
DWORD64 sectionFileAddr = nt_head_file_start + sizeof(IMAGE_NT_HEADERS64) + i * sizeof(IMAGE_SECTION_HEADER);
IMAGE_SECTION_HEADER* sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);
std::cout << " Section[" << i << "] address: 0x" << sectionFileAddr << "\n";
std::cout << " name : " << sectionHeader->Name << "\n";
std::cout << " VA : 0x" << sectionHeader->VirtualAddress << "\n";
std::cout << " ptr2RawData : 0x" << sectionHeader->PointerToRawData << "\n";
}

std::cout << "---OptionalHeader.DataDirectories\n";
std::cout << "基本重定位表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress << "\n";
std::cout << "全局指针的相对虚拟地址 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_GLOBALPTR].VirtualAddress << "\n";
std::cout << "导入地址表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IAT].VirtualAddress << "\n";

DWORD64 iatDVA = ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;

DWORD64 iatVA = 0;
DWORD64 iatRVA = 0;
DWORD64 sectionFileAddr;
IMAGE_SECTION_HEADER* sectionHeader;
for (size_t i = 0; i < ntTempHeader->FileHeader.NumberOfSections; i++)
{
sectionFileAddr = nt_head_file_start + sizeof(IMAGE_NT_HEADERS64) + i * sizeof(IMAGE_SECTION_HEADER);
sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]);

if (sectionHeader->VirtualAddress > iatDVA)
break;
iatVA = sectionHeader->VirtualAddress;
iatRVA = iatDVA - iatVA + sectionHeader->PointerToRawData;
}

size_t i = 0;
PIMAGE_IMPORT_DESCRIPTOR temp = PIMAGE_IMPORT_DESCRIPTOR((DWORD64)&fileContent[0] + iatRVA + i * sizeof(IMAGE_IMPORT_DESCRIPTOR));
for (i = 1; temp->Name != 0; i++)
{
DWORD64 VA = dwVAToRVA(temp->Name, nt_head_file_start, fileContent, ntTempHeader);
std::cout << " DLL: " << (char*)(&fileContent[VA]) << "\n";

VA = *PDWORD64(&fileContent[0] + dwVAToRVA(temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader));
for (size_t i = 1; VA != 0; i++)
{
PIMAGE_IMPORT_BY_NAME IatName = PIMAGE_IMPORT_BY_NAME(&fileContent[0] + dwVAToRVA(VA, nt_head_file_start, fileContent, ntTempHeader));
std::cout << " -Function " << IatName->Name << std::endl;
VA = *PDWORD64(&fileContent[0] + dwVAToRVA(temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader) + i * sizeof(DWORD64));
}

temp = PIMAGE_IMPORT_DESCRIPTOR(&fileContent[0] + iatRVA + i * sizeof(IMAGE_IMPORT_DESCRIPTOR));

}

return 0;
}