公众号:https://mp.weixin.qq.com/s/RrEZJDeKXdVqtjZPDs7AyQ
或许我们的公众号会有更多你感兴趣的内容
PE文件格式解析
假如说我们要自己写一个exe文件的加载器,或者你曾好奇过反汇编软件的原理,这就需要对exe对应的PE(Portable Executable)文件格式加以理解。这里以windows10中自带的notepad.exe进行讲解。
这里:https://learn.microsoft.com/zh-cn/windows/win32/debug/pe-format
是微软官方对PE格式的官方文档,读者可自行了解。
如何确定是一个PE文件-DOS头
对于一个PE文件,首先是他的文件头,也叫DOS 头,结构体定义如下
1 2 3 4 5 typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header WORD e_magic; // Magic number //.... LONG e_lfanew; // File address of new exe header } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
e_magic
:魔数,如果为MZ
这个值说明DOS头正确
e_lfanew
:指向下一个头,即NT
头的位置,计算方式:N T _ H E A D E R = F i l e S t a r t + e _ l f a n e w NT\_HEADER = FileStart+e\_lfanew N T _ H E A D E R = F i l e S t a r t + e _ l f a n e w
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 const std ::string filePath = "notepad.exe" ;std ::ifstream inputFile (filePath, std ::ios::in | std ::ios::binary) ;if (!inputFile.is_open()) { std ::cerr << "cant open: " << filePath << std ::endl ; return 1 ; } std ::ostringstream peFileString;peFileString << inputFile.rdbuf(); std ::string fileContent = peFileString.str();inputFile.close(); IMAGE_DOS_HEADER* dosHeader = \ (PIMAGE_DOS_HEADER)(DWORD64)(&fileContent[0 ]); std ::cout << std ::hex;std ::cout << "PE e_magic : 0x" << dosHeader->e_magic << "\n" ;std ::cout << "PE e_lfanew : 0x" << dosHeader->e_lfanew << "\n" ;
如何判断程序位数,找到代码段等
上面说到我们通过e_lfanew
找到了NT Header
,这里我们先假设他是一个64位程序的PE文件
1 2 3 4 5 typedef struct _IMAGE_NT_HEADERS64 { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER64 OptionalHeader; } IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;
这里主要功能是通过FileHeader
来实现的
1 2 3 4 5 6 7 8 9 typedef struct _IMAGE_FILE_HEADER { WORD Machine; WORD NumberOfSections; DWORD TimeDateStamp; DWORD PointerToSymbolTable; DWORD NumberOfSymbols; WORD SizeOfOptionalHeader; WORD Characteristics; } IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
判断程序位数-NT头
首先,使用IMAGE_FILE_HEADER.Machine
可以很轻松的判断
1 2 3 4 5 6 7 8 IMAGE_NT_HEADERS64* ntTempHeader = \ (PIMAGE_NT_HEADERS64)(DWORD64)(&fileContent[dosHeader->e_lfanew]); SetConsoleOutputCP (CP_UTF8);std::cout << "程序位数 : " ; if (ntTempHeader->FileHeader.Machine == 0x8664 ) std::cout << "64位\n" ; else if (ntTempHeader->FileHeader.Machine == 0x014c ) std::cout << "32位\n" ;
更多架构的类型说明在:https://learn.microsoft.com/zh-cn/windows/win32/debug/pe-format#machine-types
解析代码段-段头部
这里由于已知是64位程序继续按照64位的讲下去
使用IMAGE_FILE_HEADER.NumberOfSections
获得区段数量
找到各个section
的位置,计算公式:s e c t i o n [ i ] = N T _ H E A D E R + n t H e a d S i z e + i ∗ s e c t i o n H e a d S i z e section[i]=NT\_HEADER+ntHeadSize+i*sectionHeadSize s e c t i o n [ i ] = N T _ H E A D E R + n t H e a d S i z e + i ∗ s e c t i o n H e a d S i z e
1 2 3 4 5 6 7 8 9 10 std ::cout << "区段数量: " << ntTempHeader->FileHeader.NumberOfSections << "\n" ;int nt_head_file_start = dosHeader->e_lfanew;for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++){ std ::cout << " Section[" << i << "] address: 0x" << nt_head_file_start \ + sizeof (IMAGE_NT_HEADERS64) \ + i * sizeof (IMAGE_SECTION_HEADER) << "\n"; }
再对每个IMAGE_SECTION_HEADER
进行解析就能得到相关信息
那么关于IMAGE_SECTION_HEADER
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 typedef struct _IMAGE_SECTION_HEADER { BYTE Name[IMAGE_SIZEOF_SHORT_NAME]; union { DWORD PhysicalAddress; DWORD VirtualSize; } Misc; DWORD VirtualAddress; DWORD SizeOfRawData; DWORD PointerToRawData; DWORD PointerToRelocations; DWORD PointerToLinenumbers; WORD NumberOfRelocations; WORD NumberOfLinenumbers; DWORD Characteristics; } IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
这里来到第一个难点:VirtualAddress
,简称 VA ,这里用.text
段也即第一个section来探索
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 int nt_head_file_start = dosHeader->e_lfanew;for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++){ DWORD64 sectionFileAddr = \ nt_head_file_start \ + sizeof (IMAGE_NT_HEADERS64) \ + i * sizeof (IMAGE_SECTION_HEADER); IMAGE_SECTION_HEADER* sectionHeader = \ (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); std::cout << " Section[" << i << "] address: 0x" << \ sectionFileAddr << "\n" ; std::cout << " Section[" << i << "] name : " << \ sectionHeader->Name << "\n" ; std::cout << " Section[" << i << "] VA : " << \ sectionHeader->VirtualAddress << "\n" ; }
在CFF Explorer中,这些区段的低位和我们解析的地址是相同的,说明:VA是程序运行后,相对于rebase的偏移 ,在微软的文档中是这样说明的:
加载到内存中的节的第一个字节的地址,相对于映像基。 对于对象文件,这是应用重定位之前第一个字节的地址。
同样的VirtualSize
选项可以得到改区段的大小,从而得到区段结束位置
加载到内存中的节的总大小(以字节为单位)。 如果此值大于 SizeOfRawData 成员,则节将填充零。 此字段仅对可执行映像有效,对于对象文件,应设置为 0。
利用在IMAGE_SECTION_HEADER
中有一个PointerToRawData
可以找到相关数据在文件中的位置,
F i l e A d d r = s e c t i o n . P o i n t e r T o R a w D a t a FileAddr=section.PointerToRawData F i l e A d d r = s e c t i o n . P o i n t e r T o R a w D a t a ,就是直接从文件最开始相加就得到位置了
1 2 3 4 5 6 7 8 9 10 11 12 std::cout << "区段数量: " << ntTempHeader->FileHeader.NumberOfSections << "\n" ; int nt_head_file_start = dosHeader->e_lfanew;for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++){ DWORD64 sectionFileAddr = nt_head_file_start + sizeof (IMAGE_NT_HEADERS64) + i * sizeof (IMAGE_SECTION_HEADER); IMAGE_SECTION_HEADER* sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); std::cout << " Section[" << i << "] address: 0x" << sectionFileAddr << "\n" ; std::cout << " name : " << sectionHeader->Name << "\n" ; std::cout << " VA : 0x" << sectionHeader->VirtualAddress << "\n" ; std::cout << " ptr2RawData : 0x" << sectionHeader->PointerToRawData << "\n" ; }
如何找到导入的函数和DLL-导入表
在编程中会使用到其他dll文件的函数,例如kernel.dll
、CRuntimeLib.dll
,PE文件通过一个导入目录(IAT )进行索引
在编写程序时,我们往往需要在程序中内置部分资源,甚至在上述的section遍历中我们也发现了许多的section,每一个不同的section都会存储不同类型的数据,比如
代码段:包含可执行的程序代码。
数据段:包含初始化数据。
导入表:指定外部符号,告诉操作系统需要哪些外部函数。
导出表:列出了程序导出的函数,供其他程序调用。
资源表:包含程序使用的各种资源,如图标、对话框、字符串等。
回到一开始 NT头的OptionalHeader
部分
1 2 3 4 5 6 typedef struct _IMAGE_OPTIONAL_HEADER64 { DWORD AddressOfEntryPoint; IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; } IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
AddressOfEntryPoint
:指向入口点函数(相对于图像基址)的指针
DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]
:指向数据目录中第一 个IMAGE_DATA_DIRECTORY 结构的指针
DataDirectory
存储了这些不同区域在文件中的位置和大小,因此它是一个“目录”或“索引”,指示每个数据段的位置。关于具体类型在该数组下的取值可以参考:https://learn.microsoft.com/zh-cn/windows/win32/api/winnt/ns-winnt-image_optional_header32
具体的IMAGE_DATA_DIRECTORY
结构体如下:
1 2 3 4 typedef struct _IMAGE_DATA_DIRECTORY { DWORD VirtualAddress; DWORD Size; } IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
选择几个查看一下
1 2 3 4 std::cout << "---OptionalHeader.DataDirectories\n" ; std::cout << "基本重定位表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress << "\n" ; std::cout << "全局指针的相对虚拟地址 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_GLOBALPTR].VirtualAddress << "\n" ; std::cout << "导入地址表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IAT].VirtualAddress << "\n" ;
**注意:**这里的IMAGE_DIRECTORY_ENTRY_IAT
和IMAGE_DIRECTORY_ENTRY_IMPORT
是两个东西
后续使用的是IMAGE_DIRECTORY_ENTRY_IMPORT
貌似依然是在内存中的虚拟地址,我们姑且将其称为DVA
(datadirectory virtual adresss)。
那么如何通过IMAGE_OPTIONAL_HEADER64
找到IAT呢?首先IAT
的相关信息是一定存在文件中的,就不可避免地要计算偏移,将这个偏移设置为RVA
结合IMAGE_SECTION_HEADER
中有一个PointerToRawData
R V A = D V A − V A + P t r T o R a w D a t a RVA=DVA-VA+PtrToRawData R V A = D V A − V A + P t r T o R a w D a t a
这里以导入地址表为例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 DWORD64 iatDVA = ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress; DWORD64 iatVA = 0 ; DWORD64 iatRVA = 0 ; DWORD64 sectionFileAddr; IMAGE_SECTION_HEADER* sectionHeader; for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++){ sectionFileAddr = nt_head_file_start + sizeof (IMAGE_NT_HEADERS64) + i * sizeof (IMAGE_SECTION_HEADER); sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); if (sectionHeader->VirtualAddress > iatDVA) break ; iatVA = sectionHeader->VirtualAddress; iatRVA = iatDVA - iatVA + sectionHeader->PointerToRawData; }
如何使用RVA?
RVA就是从文件开始IAT的基地址,所有IMAGE_IMPORT_DESCRIPTOR
结构体都是从此开始找的。但是并没有相关变量说明了总数,那么只能依靠IMAGE_IMPORT_DESCRIPTOR
的变量名来查找了,自然想到的就是Name为0就退出
对于IMAGE_IMPORT_DESCRIPTOR
结构体
1 2 3 4 5 6 7 8 9 10 11 typedef struct _IMAGE_IMPORT_DESCRIPTOR { union { DWORD Characteristics; DWORD OriginalFirstThunk; } DUMMYUNIONNAME; DWORD TimeDateStamp; DWORD ForwarderChain; DWORD Name; DWORD FirstThunk; } IMAGE_IMPORT_DESCRIPTOR; typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;
那么想办法打印IMAGE_IMPORT_DESCRIPTOR
的结构体名称,可以由公式N a m e O f f s e t = N a m e − V A + P t r T o R a w D a t a NameOffset = Name -VA+PtrToRawData N a m e O f f s e t = N a m e − V A + P t r T o R a w D a t a ,**这个公式很重要,**直接打印文件NameOffset
偏移的数据即可
首先重新编写从VA到RVA的函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DWORD64 dwVAToRVA (DWORD64 StartVA, DWORD64 nt_head_file_start, std::string& fileContent, PIMAGE_NT_HEADERS64 ntTempHeader) { DWORD64 iatDVA = StartVA; DWORD64 iatVA = 0 ; DWORD64 RVA = 0 ; DWORD64 sectionFileAddr; IMAGE_SECTION_HEADER* sectionHeader; for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++) { sectionFileAddr = nt_head_file_start + sizeof (IMAGE_NT_HEADERS64) + i * sizeof (IMAGE_SECTION_HEADER); sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); if (sectionHeader->VirtualAddress > iatDVA) break ; iatVA = sectionHeader->VirtualAddress; RVA = iatDVA - iatVA + sectionHeader->PointerToRawData; } return RVA; }
在main函数中打印
1 2 3 4 5 6 7 8 9 10 size_t i = 0 ;PIMAGE_IMPORT_DESCRIPTOR temp = PIMAGE_IMPORT_DESCRIPTOR ((DWORD64)&fileContent[0 ] + iatRVA + i * sizeof (IMAGE_IMPORT_DESCRIPTOR)); for (i = 1 ; temp->Name != 0 ; i++){ DWORD64 VA = dwVAToRVA (temp->Name, nt_head_file_start, fileContent, ntTempHeader); std::cout << " DLL: " << (char *)(&fileContent[VA]) << "\n" ; temp = PIMAGE_IMPORT_DESCRIPTOR (&fileContent[0 ] + iatRVA + i * sizeof (IMAGE_IMPORT_DESCRIPTOR)); }
同理,IMAGE_IMPORT_DESCRIPTOR
中的FirstThunk
也可以用于打印导入的函数名,也是将FirstThunk
转为RVA,然后直接在源文件中查找RVA位置的字符串即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 size_t i = 0 ;PIMAGE_IMPORT_DESCRIPTOR temp = PIMAGE_IMPORT_DESCRIPTOR ((DWORD64)&fileContent[0 ] + iatRVA + i * sizeof (IMAGE_IMPORT_DESCRIPTOR)); for (i = 1 ; temp->Name != 0 ; i++){ DWORD64 VA = dwVAToRVA (temp->Name, nt_head_file_start, fileContent, ntTempHeader); std::cout << " DLL: " << (char *)(&fileContent[VA]) << "\n" ; VA = *PDWORD64 (&fileContent[0 ] + dwVAToRVA (temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader)); for (size_t i = 1 ; VA != 0 ; i++) { PIMAGE_IMPORT_BY_NAME IatName = PIMAGE_IMPORT_BY_NAME (&fileContent[0 ] + dwVAToRVA (VA, nt_head_file_start, fileContent, ntTempHeader)); std::cout << " -Function " << IatName->Name << std::endl; VA = *PDWORD64 (&fileContent[0 ] + dwVAToRVA (temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader) + i * sizeof (DWORD64)); } temp = PIMAGE_IMPORT_DESCRIPTOR (&fileContent[0 ] + iatRVA + i * sizeof (IMAGE_IMPORT_DESCRIPTOR)); }
其他类型的IMAGE_DATA_DIRECTORY
也是使用类似的方法就可以找到在静态二进制文件中的位置
最后
完整code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 #include <iostream> #include <Windows.h> #include <fstream> #include <sstream> #include <string> DWORD64 dwVAToRVA (DWORD64 StartVA, DWORD64 nt_head_file_start, std::string& fileContent, PIMAGE_NT_HEADERS64 ntTempHeader) { DWORD64 iatDVA = StartVA; DWORD64 iatVA = 0 ; DWORD64 RVA = 0 ; DWORD64 sectionFileAddr; IMAGE_SECTION_HEADER* sectionHeader; for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++) { sectionFileAddr = nt_head_file_start + sizeof (IMAGE_NT_HEADERS64) + i * sizeof (IMAGE_SECTION_HEADER); sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); if (sectionHeader->VirtualAddress > iatDVA) break ; iatVA = sectionHeader->VirtualAddress; RVA = iatDVA - iatVA + sectionHeader->PointerToRawData; } return RVA; } int main () { const std::string filePath = "notepad.exe" ; std::ifstream inputFile (filePath, std::ios::in | std::ios::binary) ; if (!inputFile.is_open ()) { std::cerr << "cant open: " << filePath << std::endl; return 1 ; } std::ostringstream peFileString; peFileString << inputFile.rdbuf (); std::string fileContent = peFileString.str (); inputFile.close (); IMAGE_DOS_HEADER* dosHeader = (PIMAGE_DOS_HEADER)(DWORD64)(&fileContent[0 ]); std::cout << std::hex; std::cout << "PE e_magic : 0x" << dosHeader->e_magic << "\n" ; std::cout << "PE e_lfanew : 0x" << dosHeader->e_lfanew << "\n" ; IMAGE_NT_HEADERS64* ntTempHeader = (PIMAGE_NT_HEADERS64)(DWORD64)(&fileContent[dosHeader->e_lfanew]); SetConsoleOutputCP (CP_UTF8); std::cout << "程序位数 : " ; if (ntTempHeader->FileHeader.Machine == 0x8664 ) std::cout << "64位\n" ; else if (ntTempHeader->FileHeader.Machine == 0x014c ) std::cout << "32位\n" ; std::cout << "区段数量: " << ntTempHeader->FileHeader.NumberOfSections << "\n" ; int nt_head_file_start = dosHeader->e_lfanew; for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++) { DWORD64 sectionFileAddr = nt_head_file_start + sizeof (IMAGE_NT_HEADERS64) + i * sizeof (IMAGE_SECTION_HEADER); IMAGE_SECTION_HEADER* sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); std::cout << " Section[" << i << "] address: 0x" << sectionFileAddr << "\n" ; std::cout << " name : " << sectionHeader->Name << "\n" ; std::cout << " VA : 0x" << sectionHeader->VirtualAddress << "\n" ; std::cout << " ptr2RawData : 0x" << sectionHeader->PointerToRawData << "\n" ; } std::cout << "---OptionalHeader.DataDirectories\n" ; std::cout << "基本重定位表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress << "\n" ; std::cout << "全局指针的相对虚拟地址 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_GLOBALPTR].VirtualAddress << "\n" ; std::cout << "导入地址表 : 0x" << ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IAT].VirtualAddress << "\n" ; DWORD64 iatDVA = ntTempHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress; DWORD64 iatVA = 0 ; DWORD64 iatRVA = 0 ; DWORD64 sectionFileAddr; IMAGE_SECTION_HEADER* sectionHeader; for (size_t i = 0 ; i < ntTempHeader->FileHeader.NumberOfSections; i++) { sectionFileAddr = nt_head_file_start + sizeof (IMAGE_NT_HEADERS64) + i * sizeof (IMAGE_SECTION_HEADER); sectionHeader = (PIMAGE_SECTION_HEADER)(DWORD64)(&fileContent[sectionFileAddr]); if (sectionHeader->VirtualAddress > iatDVA) break ; iatVA = sectionHeader->VirtualAddress; iatRVA = iatDVA - iatVA + sectionHeader->PointerToRawData; } size_t i = 0 ; PIMAGE_IMPORT_DESCRIPTOR temp = PIMAGE_IMPORT_DESCRIPTOR ((DWORD64)&fileContent[0 ] + iatRVA + i * sizeof (IMAGE_IMPORT_DESCRIPTOR)); for (i = 1 ; temp->Name != 0 ; i++) { DWORD64 VA = dwVAToRVA (temp->Name, nt_head_file_start, fileContent, ntTempHeader); std::cout << " DLL: " << (char *)(&fileContent[VA]) << "\n" ; VA = *PDWORD64 (&fileContent[0 ] + dwVAToRVA (temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader)); for (size_t i = 1 ; VA != 0 ; i++) { PIMAGE_IMPORT_BY_NAME IatName = PIMAGE_IMPORT_BY_NAME (&fileContent[0 ] + dwVAToRVA (VA, nt_head_file_start, fileContent, ntTempHeader)); std::cout << " -Function " << IatName->Name << std::endl; VA = *PDWORD64 (&fileContent[0 ] + dwVAToRVA (temp->FirstThunk, nt_head_file_start, fileContent, ntTempHeader) + i * sizeof (DWORD64)); } temp = PIMAGE_IMPORT_DESCRIPTOR (&fileContent[0 ] + iatRVA + i * sizeof (IMAGE_IMPORT_DESCRIPTOR)); } return 0 ; }