初窥门径：AFL源码研究

本文最后更新于 2025年1月11日晚上

AFL, 作为fuzz类工具的开山鼻祖，笔者认为对于其源码十分有研究的价值。本文即是记录笔者在阅读AFL源码过程中对于重点和大体流程的解读与理解。可能存在个人理解上的错误，欢迎指正。

源码批注版项目：

1	`https://github.com/k3ppf0r/AFL`

项目结构

重点需要关注AFL 的根目录下afl-fuzz.c、afl-gcc.c、afl-tmin.c 等核心程序的实现上，独立功能llvm_mode、qemu_mode被放在单独的文件夹中，其中afl-fuzz.c 是项目核心，代码量有 8k 行左右。

项目模块主要功能：

插桩模块
插桩模块负责在目标程序中插入代码以收集执行路径的信息，这是 AFL 实现模糊测试的基础。AFL 提供了多种插桩模式，以适应不同的编译环境和需求：
- 源码级插桩：通过 afl-as.h, afl-as.c 和 afl-gcc.c 文件实现，针对源码插桩，编译器可以使用 gcc， clang；
- LLVM 插桩模式：由 llvm_mode 目录下的文件提供支持，llvm 插桩模式，针对源码插桩，编译器使用 clang；
- QEMU 插桩模式：通过 qemu_mode 模块实现，针对二进制文件进行插桩，无需源代码即可对已编译的应用程序进行模糊测试。
Fuzzer 模块
核心模糊测试逻辑由 afl-fuzz.c 文件中的代码实现，它是 AFL 的主体部分，负责管理和驱动整个模糊测试过程，包括生成输入、监控程序行为以及基于反馈调整未来的测试用例。
辅助工具模块
AFL 还配备了一系列辅助工具，用于增强模糊测试的效果和分析能力：
- afl-analyze：分析给定的测试用例，帮助识别有意义的数据字段
- afl-plot：生成图表来可视化模糊测试的状态和进度
- afl-tmin：对测试用例进行最小化处理，去除不必要的部分，使得测试用例尽可能简洁且有效
- afl-cmin：对语料库进行精简，保留最具代表性的样本，减少冗余并优化后续的模糊测试效率
- afl-showmap：跟踪单个测试用例的执行路径，显示程序执行过程中触发的基本块覆盖情况
- afl-whatsup：汇总并行运行的多个模糊测试实例的结果
- afl-gotcpu：检查当前系统的 CPU 使用状态
头文件说明
为了支持上述功能的实现，AFL 依赖于一些关键头文件：
- alloc-inl.h：定义了带有检测功能的内存分配和释放操作
- config.h：包含配置选项的定义
- debug.h：提供了与调试信息相关的宏定义
- hash.h：实现了哈希函数
- types.h：定义了一些常用的数据类型和宏

查看项目的Makefile,可以看到相关生成的目标文件的命名细节和配置细节：


PROGNAME    = afl
# #不需要转义，但能working. e.g. echo '#define VERSION "2.57b"' | grep '^\#\d\efine VERSION '
VERSION     = $(shell grep '^\#define VERSION ' config.h | cut -d '"' -f2)

PREFIX     ?= /usr/local
BIN_PATH    = $(PREFIX)/bin
HELPER_PATH = $(PREFIX)/lib/afl
DOC_PATH    = $(PREFIX)/share/doc/afl
MISC_PATH   = $(PREFIX)/share/afl

# PROGS intentionally omit afl-as, which gets installed elsewhere.

PROGS       = afl-gcc afl-fuzz afl-showmap afl-tmin afl-gotcpu afl-analyze
SH_PROGS    = afl-plot afl-cmin afl-whatsup

CFLAGS     ?= -O3 -funroll-loops
CFLAGS     += -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign \
	      -DAFL_PATH=\"$(HELPER_PATH)\" -DDOC_PATH=\"$(DOC_PATH)\" \
	      -DBIN_PATH=\"$(BIN_PATH)\"

ifneq "$(filter Linux GNU%,$(shell uname))" ""
  LDFLAGS  += -ldl
endif

ifeq "$(findstring clang, $(shell $(CC) --version 2>/dev/null))" ""
  TEST_CC   = afl-gcc
else
  TEST_CC   = afl-clang
endif

COMM_HDR    = alloc-inl.h config.h debug.h types.h

all: test_x86 $(PROGS) afl-as test_build all_done

ifndef AFL_NO_X86

test_x86:
	@echo "[*] Checking for the ability to compile x86 code..."
	@echo 'main() { __asm__("xorb %al, %al"); }' | $(CC) -w -x c - -o .test || ( echo; echo "Oops, looks like your compiler can't generate x86 code."; echo; echo "Don't panic! You can use the LLVM or QEMU mode, but see docs/INSTALL first."; echo "(To ignore this error, set AFL_NO_X86=1 and try again.)"; echo; exit 1 )
	@rm -f .test
	@echo "[+] Everything seems to be working, ready to compile."

else

test_x86:
	@echo "[!] Note: skipping x86 compilation checks (AFL_NO_X86 set)."

endif

afl-gcc: afl-gcc.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)
	set -e; for i in afl-g++ afl-clang afl-clang++; do ln -sf afl-gcc $$i; done

afl-as: afl-as.c afl-as.h $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)
	ln -sf afl-as as

afl-fuzz: afl-fuzz.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-showmap: afl-showmap.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-tmin: afl-tmin.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-analyze: afl-analyze.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-gotcpu: afl-gotcpu.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

ifndef AFL_NO_X86

test_build: afl-gcc afl-as afl-showmap
	@echo "[*] Testing the CC wrapper and instrumentation output..."
	unset AFL_USE_ASAN AFL_USE_MSAN; AFL_QUIET=1 AFL_INST_RATIO=100 AFL_PATH=. ./$(TEST_CC) $(CFLAGS) test-instr.c -o test-instr $(LDFLAGS)
	./afl-showmap -m none -q -o .test-instr0 ./test-instr < /dev/null
	echo 1 | ./afl-showmap -m none -q -o .test-instr1 ./test-instr
	@rm -f test-instr
	@cmp -s .test-instr0 .test-instr1; DR="$$?"; rm -f .test-instr0 .test-instr1; if [ "$$DR" = "0" ]; then echo; echo "Oops, the instrumentation does not seem to be behaving correctly!"; echo; echo "Please ping <lcamtuf@google.com> to troubleshoot the issue."; echo; exit 1; fi
	@echo "[+] All right, the instrumentation seems to be working!"

else

test_build: afl-gcc afl-as afl-showmap
	@echo "[!] Note: skipping build tests (you may need to use LLVM or QEMU mode)."

endif

创建符号链接，使得 afl-g++, afl-clang, 和 afl-clang++ 都指向 afl-gcc
创建符号链接，使得 as 指向 afl-as

可能遇到的问题

clangd 插件找不到常量

解决办法：
clangd 可以从 compile_commands.json 中获取编译 flag，使用下面的命令就能生成：

1 2	`sudo apt install bear bear -- make`

vscode 快捷键

f12: 锁定变量\函数，跳转到定义
ctrl+click: 跳转到定义

afl-gcc

afl-gcc 是对 GCC\CLANG 的一个wrapper ,通过 Makefile 文件可以看到是各种命名编译器的实际指向, 内部逻辑是首先找到afl-as（预处理汇编器），接着根据系统环境变量和提供的编译参数来去设置定制化编译参数，最后去调用下游汇编器 GCC\CLANG

main 函数核心代码如下：

1
2
3

find_as(argv[0]);
edit_params(argc, argv);
execvp(cc_params[0], (char**)cc_params);

其中主要有如下三个函数的调用：

find_as(argv[0]) ：查找汇编器路径，会从环境变量$AFL_PATH、argv[0] 所在的目录下、配置变量 AFL_PATH 依次寻找，缺省下为本地项目路径/afl-as
edit_params(argc, argv)：处理传入的编译参数，将处理好的参数放入 cc_params[] 数组，并根据argv[0]确定使用哪一个编译器
execvp(cc_params[0], (cahr**)cc_params) : 执行真正的编译器： GCC\CLANG

edit_params

逻辑如下：

首先处理编译器的选择问题：
- 分析 argv[0] ，提取到变量 name ( 其值为最后一个/后的字符串 ), 接着开始与固定字符串afl-clang进行对比，来确定自己需要调用哪个下游编译器。
- 例如，如果 argv[0] 是 /afl/afl-clang , 走入对应判断分支后,先获取环境变量AFL_CC的值，如果存在就填入cc_params[0]；否则将afl-clang赋值给cc_params[0]。这里可以看到 AFL 允许用户自己指定下游编译器，如果 AFL_CC 和 AFL_CXX 等变量都存在，则会覆盖掉默认编译器。
接下来，进入 while 循环, 遍历从argv[1]开始的argv参数：
- 如果扫描到 -B ，-B选项用于设置编译器的搜索路径，直接跳过。（find_as已处理as_path）
- 如果扫描到 -integrated-as，跳过
- 如果扫描到 -pipe，跳过
- 如果扫描到 -fsanitize=address 和 -fsanitize=memory 告诉 gcc 检查内存访问的错误，比如数组越界之类，设置 asan_set = 1
- 如果扫描到 FORTIFY_SOURCE ，设置 fortify_set = 1 。FORTIFY_SOURCE 主要进行缓冲区溢出问题的检查，检查的常见函数有 memcpy, mempcpy, memmove, memset, strcpy, stpcpy, strncpy, strcat, strncat, sprintf, vsprintf, snprintf, gets 等
接下来，跳出 while 循环,设置其他参数：
- 设置 -B as_path
- 如果是 clang_mode ，则设置 -no-integrated-as
- 如果存在环境变量 AFL_HARDEN，则设置-fstack-protector-all。且如果没有设置 fortify_set ，追加 -D_FORTIFY_SOURCE=2
编译器优化相关参数，通过多个 if/elif 进行判断:
- if + elif:
  - 如果 asan_set 在前面被设置为 1，即手动设置了-fsanitize=memory或者-fsanitize=address,则设置环境变量 AFL_USE_ASAN="1"
  - 如果 asan_set 不为 1 且存在 AFL_USE_ASAN 环境变量，则
    设置 -U_FORTIFY_SOURCE -fsanitize=address
  - 如果不存在 AFL_USE_ASAN 环境变量，但存在 AFL_USE_MSAN 环境变量，则
    设置 -U_FORTIFY_SOURCE -fsanitize=memory
    不能同时指定AFL_USE_ASAN或者AFL_USE_MSAN，也不能同时指定 AFL_USE_MSAN 和 AFL_HARDEN，因为这样运行时速度过慢
- 如果不存在 AFL_DONT_OPTIMIZE 环境变量，则
  设置-g -O3 -funroll-loops -D__AFL_COMPILER=1 -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION=1
- 如果存在 AFL_NO_BUILTIN 环境变量，则表示允许进行优化，
  设置
  1
  2
  3
  4
  5
  6
  7
  -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp -fno-builtin-strncasecmp -fno-builtin-memcmp -fno-builtin-strstr -fno-builtin-strcasestr

afl-as

在afl-gcc 的封装中，其中一个目的就是将原生 GNU as 替换为 afl-as。

afl-as 也是一个原生 GNU as 的 wrapper。它的作用是预处理由 GCC/clang 生成的汇编文件，并注入包含在 afl-as.h 中的插桩代码。在上文 edit_params 函数中，已经通过-B afl-as 参数指定它参与到编译工具链中。

main 函数核心代码如下：

u8* inst_ratio_str = getenv("AFL_INST_RATIO");
struct timeval tv;
struct timezone tz;
gettimeofday(&tv, &tz);
rand_seed = tv.tv_sec ^ tv.tv_usec ^ getpid();
// 初始化随机数种子
srandom(rand_seed);
edit_params(argc, argv);
// 在汇编指令序列上插桩  
if (!just_version) add_instrumentation();
// few 
// 子进程
if (!(pid = fork())) {
  execvp(as_params[0], (char**)as_params);
  FATAL("Oops, failed to execute '%s' - check your PATH", as_params[0]);

}
if (pid < 0) PFATAL("fork() failed");
if (waitpid(pid, &status, 0) <= 0) PFATAL("waitpid() failed");
if (!getenv("AFL_KEEP_ASSEMBLY")) unlink(modified_file);
exit(WEXITSTATUS(status));

main函数逻辑

通过 gettimeofday(&tv,&tz);获取时区和时间，然后设置 srandom() 的随机种子
调用 edit_params 函数，进行参数处理,确定 as 程序的名字，默认就是 GNU as，但用户也可以提供 AFL_AS 来覆盖,设置临时文件 modified_file 路径为 /tmp/.afl-pid-timestamp.s
调用 add_instrumentation() 函数，这是实际的插桩函数
fork 一个子进程来执行 execvp(as_params[0], (char**)as_params);。这里采用的是 fork 一个子进程的方式来执行插桩。这其实是因为我们的 execvp 执行的时候，会用 as_params[0] 来完全替换掉当前进程空间中的程序，如果不通过子进程来执行实际的 as，那么后续就无法在执行完实际的 as 之后，还能 unlink 掉 modified_file
调用 waitpid(pid, &status, 0) 等待子进程执行结束
判断是否设置 AFL_KEEP_ASSEMBLY ，如果没有设置这个环境变量，就 unlink 掉 modified_file(已插完桩的文件)。设置该环境变量主要是为了防止 afl-as 删掉插桩后的汇编文件，设置为 1 则会保留插桩后的汇编文件

add_instrumentation 插桩逻辑

判断 input_file 是否为空，如果不为空则尝试打开文件获取 fd 赋值给 inf，失败则抛出异常；input_file 为空则 inf 设置为标准输入；
打开 modified_file ，获取 fd 赋值给 outfd，失败返回异常；进一步验证该文件是否可写，不可写返回异常；

while 循环读取 inf 指向文件的每一行到 line 数组，每行最多 MAX_LINE = 8192个字节（含末尾的‘\0’），从line数组里将读取到的内容写入到 outf 指向的文件，然后进入到真正的插桩逻辑。这里需要注意的是，插桩只向 .text 段插入，：

首先跳过标签、宏、注释；

这里结合部分关键代码进行解释。需要注意的是，变量 instr_ok 本质上是一个 flag，用于表示是否位于.text段。变量设置为 1，表示位于 .text 中，如果不为 1，则表示不再。于是，如果instr_ok 为 1，就会在分支处执行插桩逻辑，否则就不插桩。

首先判断读入的行是否以‘\t’ 开头，本质上是在匹配.s文件中声明的段，然后判断line[1]是否为.：

if (line[0] == '\t' && line[1] == '.') {
 
      /* OpenBSD puts jump tables directly inline with the code, which is
         a bit annoying. They use a specific format of p2align directives
         around them, so we use that as a signal. */
 
      if (!clang_mode && instr_ok && !strncmp(line + 2, "p2align ", 8) &&
          isdigit(line[10]) && line[11] == '\n') skip_next_label = 1;
 
      if (!strncmp(line + 2, "text\n", 5) ||
          !strncmp(line + 2, "section\t.text", 13) ||
          !strncmp(line + 2, "section\t__TEXT,__text", 21) ||
          !strncmp(line + 2, "section __TEXT,__text", 21)) {
        instr_ok = 1;
        continue;
      }
 
      if (!strncmp(line + 2, "section\t", 8) ||
          !strncmp(line + 2, "section ", 8) ||
          !strncmp(line + 2, "bss\n", 4) ||
          !strncmp(line + 2, "data\n", 5)) {
        instr_ok = 0;
        continue;
      }
 
    }

‘\t’开头，且line[1]=='.'，检查是否为 p2align 指令，如果是，则设置 skip_next_label = 1；
尝试匹配 "text\n" "section\t.text" "section\t__TEXT,__text" "section __TEXT,__text" 其中任意一个，匹配成功，设置 instr_ok = 1，表示位于 .text 段中，continue 跳出，进行下一次遍历；
尝试匹配"section\t" "section " "bss\n" "data\n" 其中任意一个，匹配成功，设置 instr_ok = 0，表位于其他段中，continue 跳出，进行下一次遍历；

接下来通过几个 if 判断，来设置一些标志信息，包括 off-flavor assembly，Intel/AT&T的块处理方式、ad-hoc __asm__块的处理方式等；

/* Detect off-flavor assembly (rare, happens in gdb). When this is
   encountered, we set skip_csect until the opposite directive is
   seen, and we do not instrument. */
 
if (strstr(line, ".code")) {
 
  if (strstr(line, ".code32")) skip_csect = use_64bit;
  if (strstr(line, ".code64")) skip_csect = !use_64bit;
 
}
 
/* Detect syntax changes, as could happen with hand-written assembly.
   Skip Intel blocks, resume instrumentation when back to AT&T. */
 
if (strstr(line, ".intel_syntax")) skip_intel = 1;
if (strstr(line, ".att_syntax")) skip_intel = 0;
 
/* Detect and skip ad-hoc __asm__ blocks, likewise skipping them. */
 
if (line[0] == '#' || line[1] == '#') {
 
  if (strstr(line, "#APP")) skip_app = 1;
  if (strstr(line, "#NO_APP")) skip_app = 0;
 
}

AFL 在插桩时重点关注的内容包括：^main, ^.L0, ^.LBB0_0, ^\tjnz foo （_main 函数， gcc 和 clang 下的分支标记，条件跳转分支标记），这些内容通常标志了程序的流程变化，因此 AFL 会重点在这些位置进行插桩：

对于形如\tj[^m].格式的指令，即条件跳转指令，且R(100)产生的随机数小于插桩密度inst_ratio，直接使用fprintf将trampoline_fmt_64(插桩部分的指令) 写入 outf 指向的文件，写入大小为小于 MAP_SIZE的随机数——R(MAP_SIZE)

，然后插桩计数ins_lines加一，continue 跳出，进行下一次遍历；

/* If we're in the right mood for instrumenting, check for function
   names or conditional labels. This is a bit messy, but in essence,
   we want to catch:
 
     ^main:      - function entry point (always instrumented)
     ^.L0:       - GCC branch label
     ^.LBB0_0:   - clang branch label (but only in clang mode)
     ^\tjnz foo  - conditional branches
 
   ...but not:
 
     ^# BB#0:    - clang comments
     ^ # BB#0:   - ditto
     ^.Ltmp0:    - clang non-branch labels
     ^.LC0       - GCC non-branch labels
     ^.LBB0_0:   - ditto (when in GCC mode)
     ^\tjmp foo  - non-conditional jumps
 
   Additionally, clang and GCC on MacOS X follow a different convention
   with no leading dots on labels, hence the weird maze of #ifdefs
   later on.
 
 */
 
if (skip_intel || skip_app || skip_csect || !instr_ok ||
    line[0] == '#' || line[0] == ' ') continue;
 
/* Conditional branch instruction (jnz, etc). We append the instrumentation
   right after the branch (to instrument the not-taken path) and at the
   branch destination label (handled later on). */
 
if (line[0] == '\t') {
 
  if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {
 
    fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
            R(MAP_SIZE));
 
    ins_lines++;
 
  }
 
  continue;
 
}

对于 label 的相关评估，有一些 label 可能是一些分支的目的地，需要自己的评判

首先检查该行中是否存在:，然后检查是否以.开始

如果以.开始，则代表想要插桩^.L0:或者 ^.LBB0_0:这样的 branch label，即 style jump destination
1. 检查 line[2]是否为数字或者如果是在 clang_mode 下，比较从line[1]开始的三个字节是否为LBB.，前述所得结果和R(100) < inst_ratio)相与。如果结果为真，则设置instrument_next = 1；
否则代表这是一个 function，插桩^func:，function entry point，直接设置instrument_next = 1（defer mode）。

    /* Label of some sort. This may be a branch destination, but we need to
       tread carefully and account for several different formatting
       conventions. */
 
#ifdef __APPLE__
 
    /* Apple: L: */
 
    if ((colon_pos = strstr(line, ":"))) {
 
      if (line[0] == 'L' && isdigit(*(colon_pos - 1))) {
 
#else
 
    /* Everybody else: .L: */
 
    if (strstr(line, ":")) {
 
      if (line[0] == '.') {
 
#endif /* __APPLE__ */
 
        /* .L0: or LBB0_0: style jump destination */
 
#ifdef __APPLE__
 
        /* Apple: L / LBB */
 
        if ((isdigit(line[1]) || (clang_mode && !strncmp(line, "LBB", 3)))
            && R(100) < inst_ratio) {
 
#else
 
        /* Apple: .L / .LBB */
 
        if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
            && R(100) < inst_ratio) {
 
#endif /* __APPLE__ */
 
          /* An optimization is possible here by adding the code only if the
             label is mentioned in the code in contexts other than call / jmp.
             That said, this complicates the code by requiring two-pass
             processing (messy with stdin), and results in a speed gain
             typically under 10%, because compilers are generally pretty good
             about not generating spurious intra-function jumps.
 
             We use deferred output chiefly to avoid disrupting
             .Lfunc_begin0-style exception handling calculations (a problem on
             MacOS X). */
 
          if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;
 
        }
 
      } else {
 
        /* Function label (always instrumented, deferred mode). */
 
        instrument_next = 1;
 
      }
    }
  }

上述过程完成后，来到 while 循环的下一个循环，在 while 的开头，可以看到对以 defered mode 进行插桩的位置进行了真正的插桩处理：

if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
    instrument_next && line[0] == '\t' && isalpha(line[1])) {
 
  fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
          R(MAP_SIZE));
 
  instrument_next = 0;
  ins_lines++;
 
}

这里对 instr_ok, instrument_next 变量进行了检验是否为 1，而且进一步校验是否位于 .text 段中，且设置了 defered mode 进行插桩，则就进行插桩操作，写入 trampoline_fmt_64/32 。

至此，插桩函数 add_instrumentation 的主要逻辑已梳理完成。

afl-as.h 桩代码逻辑

关于这部分的分析，最好的学习办法是设置 AFL_KEEP_ASSEMBLY=1 去编译 ,对比插桩前后汇编代码差异

# afl-gcc
AFL_KEEP_ASSEMBLY=1 AFL_DONT_OPTIMIZE=1 ../afl-gcc test_as.c -o test_as -O0 -fno-asynchronous-unwind-tables
ls -al /tmp/.afl-16830-1734879283.s
cp /tmp/.afl-16830-1734879283.s ./
# 原生gcc
gcc -S test_as.c -O0 -fno-asynchronous-unwind-tables  -o test_as1.o

插桩后的汇编代码会在每一个基本块入口处插入了一段代码，且在程序末尾，写有 AFL 桩代码变量定义和桩代码函数实现

关于instrumentation trampoline 代码带注释分析

/* --- AFL TRAMPOLINE (64-BIT) --- */

.align 4

leaq -(128+24)(%rsp), %rsp
movq %rdx,  0(%rsp)
movq %rcx,  8(%rsp)
movq %rax, 16(%rsp)
movq $0x0000eb91, %rcx // 向ecx中存入识别代码块的随机桩代码id
call __afl_maybe_log
movq 16(%rsp), %rax
movq  8(%rsp), %rcx
movq  0(%rsp), %rdx
leaq (128+24)(%rsp), %rsp

// RCX: 识别代码块的随机桩代码id
__afl_maybe_log:
// 将标志位加载到 AH
// 将 OF 位保存在 al
  lahf
  seto  %al

  /* Check if SHM region is already mapped. */

  movq  __afl_area_ptr(%rip), %rdx
  //按位与=> 检查rdx是否为0  如果结果为0 ZF is 1
  testq %rdx, %rdx
  je    __afl_setup

__afl_store:

  /* Calculate and store hit for the code location specified in rcx. */
  // 更新rcx的值为__afl_prev_loc(%rip) ^ rcx
  // 更新__afl_prev_loc(%rip)的值为 原rcx
  xorq __afl_prev_loc(%rip), %rcx
  xorq %rcx, __afl_prev_loc(%rip)
  shrq $1, __afl_prev_loc(%rip)

  // [rdx +rcx*1] += 1  增加 hit count
  incb (%rdx, %rcx, 1)

__afl_return:

  addb $127, %al
  sahf
  ret

.align 8

__afl_setup:

  /* Do not retry setup if we had previous failures. */

  cmpb $0, __afl_setup_failure(%rip)
  jne __afl_return

  /* Check out if we have a global pointer on file. */

  movq  __afl_global_area_ptr@GOTPCREL(%rip), %rdx
  movq  (%rdx), %rdx
  // 判断rdx 是否为0
  testq %rdx, %rdx
  // 为0 跳到__afl_setup_first
  je    __afl_setup_first

  movq %rdx, __afl_area_ptr(%rip)
  jmp  __afl_store

__afl_setup_first:

  /* Save everything that is not yet saved and that may be touched by
     getenv() and several other libcalls we'll be relying on. */
  // 
  leaq -352(%rsp), %rsp

  movq %rax,   0(%rsp)
  movq %rcx,   8(%rsp)
  movq %rdi,  16(%rsp)
  movq %rsi,  32(%rsp)
  movq %r8,   40(%rsp)
  movq %r9,   48(%rsp)
  movq %r10,  56(%rsp)
  movq %r11,  64(%rsp)

  movq %xmm0,  96(%rsp)
  movq %xmm1,  112(%rsp)
  movq %xmm2,  128(%rsp)
  movq %xmm3,  144(%rsp)
  movq %xmm4,  160(%rsp)
  movq %xmm5,  176(%rsp)
  movq %xmm6,  192(%rsp)
  movq %xmm7,  208(%rsp)
  movq %xmm8,  224(%rsp)
  movq %xmm9,  240(%rsp)
  movq %xmm10, 256(%rsp)
  movq %xmm11, 272(%rsp)
  movq %xmm12, 288(%rsp)
  movq %xmm13, 304(%rsp)
  movq %xmm14, 320(%rsp)
  movq %xmm15, 336(%rsp)

  /* Map SHM, jumping to __afl_setup_abort if something goes wrong. */

  /* The 64-bit ABI requires 16-byte stack alignment. We'll keep the
     original stack ptr in the callee-saved r12. */

  // 堆栈对齐
  pushq %r12
  movq  %rsp, %r12
  subq  $16, %rsp
  andq  $0xfffffffffffffff0, %rsp
  //0x5555555557d7 (.AFL_VARS) => getenv("__AFL_SHM_ID") 如果不存在返回0
  leaq .AFL_SHM_ENV(%rip), %rdi
call getenv@PLT

  testq %rax, %rax
  je    __afl_setup_abort

  // 把 __AFL_SHM_ID 从字符串转成整数
  movq  %rax, %rdi
call atoi@PLT

  xorq %rdx, %rdx   /* shmat flags    */
  xorq %rsi, %rsi   /* requested addr */
  // rax 返回值
  movq %rax, %rdi   /* SHM ID         */
call shmat@PLT

  cmpq $-1, %rax
  je   __afl_setup_abort
  /*
    　　上面的汇编代码中，如果发现 attach 失败，则进入 AFL 的错误处理流程。不过读到这里，我们有一个小疑问：一片共享内存，首先应当由 shmget() 创建，再由 shmat() 映射到当前程序的虚拟内存空间中。那么，这片虚拟内存是什么时候创建的？答案是 afl-fuzz 在 setup_shm() 流程中调用 shmget() 创建了虚拟内存，并将 shm id 写入 __AFL_SHM_ID 环境变量。如果我们并非通过 afl-fuzz 运行程序，自然这片虚拟内存不会被创建，也不会存在 __AFL_SHM_ID 环境变量了。
  */

  /* Store the address of the SHM region. */

  movq %rax, %rdx
  movq %rax, __afl_area_ptr(%rip)

  movq __afl_global_area_ptr@GOTPCREL(%rip), %rdx
  movq %rax, (%rdx)
  movq %rax, %rdx

__afl_forkserver:

  /* Enter the fork server mode to avoid the overhead of execve() calls. We
     push rdx (area ptr) twice to keep stack alignment neat. */

  pushq %rdx
  pushq %rdx

  /* Phone home and tell the parent that we're OK. (Note that signals with
     no SA_RESTART will mess it up). If this fails, assume that the fd is
     closed because we were execve()d from an instrumented binary, or because
     the parent doesn't want to use the fork server. */
  // 等待 fuzzer 通过控制管道发送过来的命令，读入到 __afl_temp 中
  movq $4, %rdx               /* length    */
  leaq __afl_temp(%rip), %rsi /* data      */
  movq $(198 + 1), %rdi       /* file desc */
call write@PLT

  cmpq $4, %rax
  jne  __afl_fork_resume

__afl_fork_wait_loop:

  /* Wait for parent by reading from the pipe. Abort if read fails. */

  movq $4, %rdx               /* length    */
  leaq __afl_temp(%rip), %rsi /* data      */
  movq $198, %rdi             /* file desc */
call read@PLT
  cmpq $4, %rax
  jne  __afl_die

  /* Once woken up, create a clone of our process. This is an excellent use
     case for syscall(__NR_clone, 0, CLONE_PARENT), but glibc boneheadedly
     caches getpid() results and offers no way to update the value, breaking
     abort(), raise(), and a bunch of other things :-( */

call fork@PLT
  cmpq $0, %rax
  //jump if rax < 0
  jl   __afl_die
  // child
  je   __afl_fork_resume

  /* In parent process: write PID to pipe, then wait for child. */
  //将子进程的 pid 保存到 __afl_fork_pid 变量
  movl %eax, __afl_fork_pid(%rip)

  //向 fd 199 写入子进程的 pid
  movq $4, %rdx                   /* length    */
  leaq __afl_fork_pid(%rip), %rsi /* data      */
  movq $(198 + 1), %rdi             /* file desc */
call write@PLT

  movq $0, %rdx                   /* no flags  */
  leaq __afl_temp(%rip), %rsi     /* status    */
  movq __afl_fork_pid(%rip), %rdi /* PID       */
call waitpid@PLT
  cmpq $0, %rax
  jle  __afl_die

  /* Relay wait status to pipe, then loop back. */
  // 写入状态管道告知 fuzzer
  movq $4, %rdx               /* length    */
  leaq __afl_temp(%rip), %rsi /* data      */
  movq $(198 + 1), %rdi         /* file desc */
call write@PLT

  jmp  __afl_fork_wait_loop

__afl_fork_resume:

  /* In child process: close fds, resume execution. */

  movq $198, %rdi
call close@PLT

  movq $(198 + 1), %rdi
call close@PLT

  popq %rdx
  popq %rdx

  movq %r12, %rsp
  popq %r12

  movq  0(%rsp), %rax
  movq  8(%rsp), %rcx
  movq 16(%rsp), %rdi
  movq 32(%rsp), %rsi
  movq 40(%rsp), %r8
  movq 48(%rsp), %r9
  movq 56(%rsp), %r10
  movq 64(%rsp), %r11

  movq  96(%rsp), %xmm0
  movq 112(%rsp), %xmm1
  movq 128(%rsp), %xmm2
  movq 144(%rsp), %xmm3
  movq 160(%rsp), %xmm4
  movq 176(%rsp), %xmm5
  movq 192(%rsp), %xmm6
  movq 208(%rsp), %xmm7
  movq 224(%rsp), %xmm8
  movq 240(%rsp), %xmm9
  movq 256(%rsp), %xmm10
  movq 272(%rsp), %xmm11
  movq 288(%rsp), %xmm12
  movq 304(%rsp), %xmm13
  movq 320(%rsp), %xmm14
  movq 336(%rsp), %xmm15

  leaq 352(%rsp), %rsp

  jmp  __afl_store

__afl_die:

  xorq %rax, %rax
call _exit@PLT

__afl_setup_abort:

  /* Record setup failure so that we don't keep calling
     shmget() / shmat() over and over again. */

  incb __afl_setup_failure(%rip)

  movq %r12, %rsp
  popq %r12

  movq  0(%rsp), %rax
  movq  8(%rsp), %rcx
  movq 16(%rsp), %rdi
  movq 32(%rsp), %rsi
  movq 40(%rsp), %r8
  movq 48(%rsp), %r9
  movq 56(%rsp), %r10
  movq 64(%rsp), %r11

  movq  96(%rsp), %xmm0
  movq 112(%rsp), %xmm1
  movq 128(%rsp), %xmm2
  movq 144(%rsp), %xmm3
  movq 160(%rsp), %xmm4
  movq 176(%rsp), %xmm5
  movq 192(%rsp), %xmm6
  movq 208(%rsp), %xmm7
  movq 224(%rsp), %xmm8
  movq 240(%rsp), %xmm9
  movq 256(%rsp), %xmm10
  movq 272(%rsp), %xmm11
  movq 288(%rsp), %xmm12
  movq 304(%rsp), %xmm13
  movq 320(%rsp), %xmm14
  movq 336(%rsp), %xmm15

  leaq 352(%rsp), %rsp

  jmp __afl_return

.AFL_VARS:

  .lcomm   __afl_area_ptr, 8
  .lcomm   __afl_prev_loc, 8
  .lcomm   __afl_fork_pid, 4
  .lcomm   __afl_temp, 4
  .lcomm   __afl_setup_failure, 1
  .comm    __afl_global_area_ptr, 8, 8

.AFL_SHM_ENV:
  .asciz "__AFL_SHM_ID"

/* --- END --- */

afl-tmin

afl-tmin是一个简单的测试用例最小化器，它接受一个输入文件，并试图删除尽可能多的数据，同时保持二进制文件处于崩溃状态或产生一致的仪器输出（模式是根据最初观察到的行为自动选择的）

afl-tmin 用于将测试用例最小化，同时保持触发相同的代码路径或行为的能力

main 函数核心代码如下：

setup_shm();
setup_signal_handlers();
set_up_environment();

find_binary(argv[optind]);
detect_file_args(argv + optind);
read_initial_file();
run_target(use_argv, in_data, in_len, 1);
minimize(use_argv);
unlink(prog_in);

main 函数逻辑

初始化共享内存，初始化共享内存（Shared Memory, shm）
setup_signal_handlers() 设置信号处理程序是为了确保程序能够在接收到特定信号（如用户中断 Ctrl+C）时优雅地结束运行，而不是突然终止，这样可以避免数据丢失或文件损坏
set_up_environment() 配置环境变量，比如设置临时文件的位置、启用地址/内存 sanitizer (ASAN/MSAN) 工具来检测潜在的安全问题，以及可能预加载某些库（通过 LD_PRELOAD 环境变量）
find_binary(argv[optind]) 查找并确认将要模糊测试的目标程序（二进制文件）。argv[optind] 是命令行参数数组的一部分，包含了目标程序的路径
detect_file_args(argv + optind); 分析命令行参数，识别出哪些参数代表文件路径，并特别处理占位符 @@，将其替换为实际的输入文件位置
read_initial_file(); 读取提供的初始输入文件内容，这些内容将被用作模糊测试的基础或作为最小化过程的起点
run_target(use_argv, in_data, in_len, 1); 使用指定的参数和输入数据对目标程序进行一次无操作运行（dry run），目的是检查是否存在超时或崩溃的问题，以确保后续最小化过程的正确性
minimize(use_argv); 执行核心的最小化逻辑，尝试减少输入文件的大小而不改变其触发相同代码路径或行为的能力
unlink(prog_in); 最后，删除创建的临时输入文件（prog_in）

可以看出，重点函数是 minimize 最小化逻辑，其中 run_target函数逻辑里有一个hit count机制的实现，也比较关键，故我们重点分析这两个函数

run_target

/* Execute target application. Returns 0 if the changes are a dud, or
   1 if they should be kept. */

static u8 run_target(char** argv, u8* mem, u32 len, u8 first_run) {

  //  运行前的准备
  static struct itimerval it;
  int status = 0;

  s32 prog_in_fd;
  u32 cksum;

  memset(trace_bits, 0, MAP_SIZE);
  MEM_BARRIER();

  prog_in_fd = write_to_file(prog_in, mem, len);

  child_pid = fork();

  if (child_pid < 0) PFATAL("fork() failed");

  if (!child_pid) {
    //  子进程代码
      struct rlimit r;
    // stdin 指向输入文件、stdout 和 stderr 指向 /dev/null
    if (dup2(use_stdin ? prog_in_fd : dev_null_fd, 0) < 0 ||
        dup2(dev_null_fd, 1) < 0 ||
        dup2(dev_null_fd, 2) < 0) {

      *(u32*)trace_bits = EXEC_FAIL_SIG;
      PFATAL("dup2() failed");
    }

    close(dev_null_fd);
    close(prog_in_fd);

    setsid();

    // 设置内存限制
    if (mem_limit) {

      r.rlim_max = r.rlim_cur = ((rlim_t)mem_limit) << 20;

    #ifdef RLIMIT_AS
        setrlimit(RLIMIT_AS, &r); /* Ignore errors */
    #else
        setrlimit(RLIMIT_DATA, &r); /* Ignore errors */
    #endif /* ^RLIMIT_AS */

    }

    r.rlim_max = r.rlim_cur = 0;
    setrlimit(RLIMIT_CORE, &r); /* Ignore errors */

    // execv 目标程序
    execv(target_path, argv);

    // 如果程序运行到了这里，说明 execv 失败了，把 shm 前四个字节设为 0xfee1dead
    *(u32*)trace_bits = EXEC_FAIL_SIG;
    exit(0);
  }


  close(prog_in_fd);
  // 设置 timeout
  child_timed_out = 0;
  it.it_value.tv_sec = (exec_tmout / 1000);
  it.it_value.tv_usec = (exec_tmout % 1000) * 1000;
  // SIGALRM 的 handler 此前已被设为「kill 掉 child_pid」
  setitimer(ITIMER_REAL, &it, NULL);
  if (waitpid(child_pid, &status, 0) <= 0) FATAL("waitpid() failed");

  // 取消定时器
  child_pid = 0;
  it.it_value.tv_sec = 0;
  it.it_value.tv_usec = 0;
  setitimer(ITIMER_REAL, &it, NULL);

  MEM_BARRIER();
  // 接下来分析 shm

  // execv 是否成功
  if (*(u32*)trace_bits == EXEC_FAIL_SIG)
    FATAL("Unable to execute '%s'", argv[0]);

  // 对 hit count 分桶
  classify_counts(trace_bits);
  apply_mask((u32*)trace_bits, (u32*)mask_bitmap);
  total_execs++;

  if (stop_soon) {
    SAYF(cRST cLRD "\n+++ Minimization aborted by user +++\n" cRST);
    close(write_to_file(out_file, in_data, in_len));
    exit(1);
  }

  // 如果执行超时，则返回 0，表示这个 input 应该忽略
  /* Always discard inputs that time out. */
  if (child_timed_out) {
    missed_hangs++;
    return 0;
  }


  if (WIFSIGNALED(status) ||
      (WIFEXITED(status) && WEXITSTATUS(status) == MSAN_ERROR) ||
      (WIFEXITED(status) && WEXITSTATUS(status) && exit_crash)) {
    // 若发现目标程序 crash
    
    // 如果是初次运行就 crash，说明该使用 crash mode
    if (first_run) crash_mode = 1;

    if (crash_mode) {
      // 如果是 crash mode，且不要求 crash 路径与原 input 相同，则立即报告 input 有效，该保留
      if (!exact_mode) return 1;
    } else {
      // 是 non-crash mode，但现在 crash 了，说明这个 input 与原 input 路径不同，该丢弃
      missed_crashes++;
      return 0;
    }

  } else if (crash_mode) {
    // 发现目标程序没有 crash，而目前处于 crash mode，则本 input 与原 input 路径不同，该丢弃
    missed_paths++;
    return 0;
  }

  // 对 shm 计算 hash，注意这里的 shm 已经是 hit count 分桶处理之后的了
  cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

  if (first_run) orig_cksum = cksum;

  // 若本 input 的 shm hash 与原始 input 相同，则本 input 有效，予以保留
  if (orig_cksum == cksum) return 1;

  missed_paths++;
  return 0;
}

其中父进程创建了子进程，子进程执行目标程序，子进程执行完毕之后，父进程会读取子进程的 shm，并根据 shm 的内容执行 hit count机制 ，（shm 是 AFL 中用来记录程序执行过程中每个位置是否被触发了代码路径的，shm 的大小为 MAP_SIZE，MAP_SIZE 默认为 1M，每个字节代表一个位置，如果该位置被触发了代码路径，则该字节的值为 1，否则为 0）判断本 input 是否有效，如果无效，则丢弃，如果有效，则保留。

hit count 分桶机制

关键代码：

classify_counts(trace_bits);
apply_mask((u32*)trace_bits, (u32*)mask_bitmap);
// ...
cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

上述代码计算出了运行 hash。先看 classify_counts 函数，它负责将 shm 的 hit count 分桶处理，将 hit count 改为其所在的桶 id

AFL 设计了 8 个 hit count 桶，规则为输入走某个基本块的时候，如果命中的是 0、1、2、3、4-7、8-15、16-31、32-127、128+ 这些区间内，则认为是不同的命中，也就落入不同的桶内。

/* Classify tuple counts. This is a slow & naive version, but good enough here. */

static const u8 count_class_lookup[256] = {

  [0]           = 0,
  [1]           = 1,
  [2]           = 2,
  [3]           = 4,
  [4 ... 7]     = 8,
  [8 ... 15]    = 16,
  [16 ... 31]   = 32,
  [32 ... 127]  = 64,
  [128 ... 255] = 128

};

static void classify_counts(u8* mem) {
  u32 i = MAP_SIZE;

  if (edges_only) {
    // 若打开 -e 开关，则只要命中，无论命中多少次，都视为相同。
    while (i--) {
      if (*mem) *mem = 1;
      mem++;
    }
  } else {
    // 将 hit count 改为其所在的桶 id
    while (i--) {
      *mem = count_class_lookup[*mem];
      mem++;
    }
  }
}

接着算shm 的消息摘要（hash）来表达它，有 hash32 函数完成。

minimize

结构如下：

/* Actually minimize! */

static void minimize(char** argv) {

next_pass:
  // bblock deletion
  
  // alphabet minimization

  // character minimization

  if (changed_any) goto next_pass;

finalize_all:

  // 输出结果
}

minimize 最小化处理的手法：

块归一化（Block Normalization）：首先，输入被划分为大约 128 个块（除了最后一个块外，块长度为 2 的幂次）, 尝试将输入数据中的非零块替换为全零，如果这样的更改不影响目标程序的行为，则应用这些更改。
块删除（Block Deletion）：首先将 input 分为 16 个块, 通过逐步减小删除块的大小来移除尽可能多的数据片段，只要这样做不会改变目标程序的行为。
字符集最小化（Alphabet Minimization）：减少输入中使用的不同字符 (共有 256 种) 的数量，用特定字符（如’0’）替代那些不影响程序行为的字符。
字符简化（Character Minimization）：从前往后扫描逐个检查每个字符，如果可以安全地将其替换为一个简单字符（如’0’），则进行替换。
重复上述步骤：直到没有进一步的变化为止，确保每次修改后的结果都能使文件更小或者更简单，但依然能够触发相同的程序行为。
最终统计与输出：计算并报告最小化过程对文件大小的影响、简化了多少字符以及执行了多少次测试。

参考

非常感谢前人分析的优秀文章！这对我的学习帮助很大

安全研究 > Fuzz

#安全研究 #AFL #源码

初窥门径：AFL源码研究

https://k3ppf0r.github.io/2024/12/31/安全研究/Fuzz/初窥门径：AFL源码研究/

作者

k3ppf0r

发布于

2024年12月31日

许可协议

吃透重点：AFL源码研究二上一篇

个人提交开源系统CVE流程下一篇