吃透重点：AFL源码研究

本文最后更新于 2025年1月1日凌晨

AFL, 作为fuzz类工具的开山鼻祖，笔者认为对于其源码十分有研究的价值。本文即是记录笔者在阅读AFL源码过程中对于重点和大体流程的解读与理解。可能存在个人理解上的错误，欢迎指正。

源码批注版项目：

1	`https://github.com/k3ppf0r/AFL`

项目结构

重点需要关注AFL 的根目录下afl-fuzz.c、afl-gcc.c、afl-tmin.c 等核心程序的实现上，独立功能llvm_mode、qemu_mode被放在单独的文件夹中，其中afl-fuzz.c 是项目核心，代码量有 8k 行左右。

项目模块主要功能：

插桩模块
插桩模块负责在目标程序中插入代码以收集执行路径的信息，这是 AFL 实现模糊测试的基础。AFL 提供了多种插桩模式，以适应不同的编译环境和需求：
- 源码级插桩：通过 afl-as.h, afl-as.c 和 afl-gcc.c 文件实现，针对源码插桩，编译器可以使用 gcc， clang；
- LLVM 插桩模式：由 llvm_mode 目录下的文件提供支持，llvm 插桩模式，针对源码插桩，编译器使用 clang；
- QEMU 插桩模式：通过 qemu_mode 模块实现，针对二进制文件进行插桩，无需源代码即可对已编译的应用程序进行模糊测试。
Fuzzer 模块
核心模糊测试逻辑由 afl-fuzz.c 文件中的代码实现，它是 AFL 的主体部分，负责管理和驱动整个模糊测试过程，包括生成输入、监控程序行为以及基于反馈调整未来的测试用例。
辅助工具模块
AFL 还配备了一系列辅助工具，用于增强模糊测试的效果和分析能力：
- afl-analyze：分析给定的测试用例，帮助识别有意义的数据字段
- afl-plot：生成图表来可视化模糊测试的状态和进度
- afl-tmin：对测试用例进行最小化处理，去除不必要的部分，使得测试用例尽可能简洁且有效
- afl-cmin：对语料库进行精简，保留最具代表性的样本，减少冗余并优化后续的模糊测试效率
- afl-showmap：跟踪单个测试用例的执行路径，显示程序执行过程中触发的基本块覆盖情况
- afl-whatsup：汇总并行运行的多个模糊测试实例的结果
- afl-gotcpu：检查当前系统的 CPU 使用状态
头文件说明
为了支持上述功能的实现，AFL 依赖于一些关键头文件：
- alloc-inl.h：定义了带有检测功能的内存分配和释放操作
- config.h：包含配置选项的定义
- debug.h：提供了与调试信息相关的宏定义
- hash.h：实现了哈希函数
- types.h：定义了一些常用的数据类型和宏

查看项目的Makefile,可以看到相关生成的目标文件的命名细节和配置细节：


PROGNAME    = afl
# #不需要转义，但能working. e.g. echo '#define VERSION "2.57b"' | grep '^\#\d\efine VERSION '
VERSION     = $(shell grep '^\#define VERSION ' config.h | cut -d '"' -f2)

PREFIX     ?= /usr/local
BIN_PATH    = $(PREFIX)/bin
HELPER_PATH = $(PREFIX)/lib/afl
DOC_PATH    = $(PREFIX)/share/doc/afl
MISC_PATH   = $(PREFIX)/share/afl

# PROGS intentionally omit afl-as, which gets installed elsewhere.

PROGS       = afl-gcc afl-fuzz afl-showmap afl-tmin afl-gotcpu afl-analyze
SH_PROGS    = afl-plot afl-cmin afl-whatsup

CFLAGS     ?= -O3 -funroll-loops
CFLAGS     += -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign \
	      -DAFL_PATH=\"$(HELPER_PATH)\" -DDOC_PATH=\"$(DOC_PATH)\" \
	      -DBIN_PATH=\"$(BIN_PATH)\"

ifneq "$(filter Linux GNU%,$(shell uname))" ""
  LDFLAGS  += -ldl
endif

ifeq "$(findstring clang, $(shell $(CC) --version 2>/dev/null))" ""
  TEST_CC   = afl-gcc
else
  TEST_CC   = afl-clang
endif

COMM_HDR    = alloc-inl.h config.h debug.h types.h

all: test_x86 $(PROGS) afl-as test_build all_done

ifndef AFL_NO_X86

test_x86:
	@echo "[*] Checking for the ability to compile x86 code..."
	@echo 'main() { __asm__("xorb %al, %al"); }' | $(CC) -w -x c - -o .test || ( echo; echo "Oops, looks like your compiler can't generate x86 code."; echo; echo "Don't panic! You can use the LLVM or QEMU mode, but see docs/INSTALL first."; echo "(To ignore this error, set AFL_NO_X86=1 and try again.)"; echo; exit 1 )
	@rm -f .test
	@echo "[+] Everything seems to be working, ready to compile."

else

test_x86:
	@echo "[!] Note: skipping x86 compilation checks (AFL_NO_X86 set)."

endif

afl-gcc: afl-gcc.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)
	set -e; for i in afl-g++ afl-clang afl-clang++; do ln -sf afl-gcc $$i; done

afl-as: afl-as.c afl-as.h $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)
	ln -sf afl-as as

afl-fuzz: afl-fuzz.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-showmap: afl-showmap.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-tmin: afl-tmin.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-analyze: afl-analyze.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

afl-gotcpu: afl-gotcpu.c $(COMM_HDR) | test_x86
	$(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

ifndef AFL_NO_X86

test_build: afl-gcc afl-as afl-showmap
	@echo "[*] Testing the CC wrapper and instrumentation output..."
	unset AFL_USE_ASAN AFL_USE_MSAN; AFL_QUIET=1 AFL_INST_RATIO=100 AFL_PATH=. ./$(TEST_CC) $(CFLAGS) test-instr.c -o test-instr $(LDFLAGS)
	./afl-showmap -m none -q -o .test-instr0 ./test-instr < /dev/null
	echo 1 | ./afl-showmap -m none -q -o .test-instr1 ./test-instr
	@rm -f test-instr
	@cmp -s .test-instr0 .test-instr1; DR="$$?"; rm -f .test-instr0 .test-instr1; if [ "$$DR" = "0" ]; then echo; echo "Oops, the instrumentation does not seem to be behaving correctly!"; echo; echo "Please ping <lcamtuf@google.com> to troubleshoot the issue."; echo; exit 1; fi
	@echo "[+] All right, the instrumentation seems to be working!"

else

test_build: afl-gcc afl-as afl-showmap
	@echo "[!] Note: skipping build tests (you may need to use LLVM or QEMU mode)."

endif

创建符号链接，使得 afl-g++, afl-clang, 和 afl-clang++ 都指向 afl-gcc
创建符号链接，使得 as 指向 afl-as

可能遇到的问题

clangd 插件找不到常量

解决办法：
clangd 可以从 compile_commands.json 中获取编译 flag，使用下面的命令就能生成：

1 2	`sudo apt install bear bear -- make`

vscode 快捷键

f12: 锁定变量\函数，跳转到定义
ctrl+click: 跳转到定义

重点一：编译插桩过程

afl-gcc

afl-gcc 是对 GCC\CLANG 的一个wrapper ,通过 Makefile 文件可以看到是各种命名编译器的实际指向, 内部逻辑是首先找到afl-as（预处理汇编器），接着根据系统环境变量和提供的编译参数来去设置定制化编译参数，最后去调用下游汇编器 GCC\CLANG

main 函数核心代码如下：

1
2
3

find_as(argv[0]);
edit_params(argc, argv);
execvp(cc_params[0], (char**)cc_params);

其中主要有如下三个函数的调用：

find_as(argv[0]) ：查找汇编器路径，会从环境变量$AFL_PATH、argv[0] 所在的目录下、配置变量 AFL_PATH 依次寻找，缺省下为本地项目路径/afl-as
edit_params(argc, argv)：处理传入的编译参数，将处理好的参数放入 cc_params[] 数组，并根据argv[0]确定使用哪一个编译器
execvp(cc_params[0], (cahr**)cc_params) : 执行真正的编译器： GCC\CLANG

edit_params

重点看下 edit_params:
以下源码删除了对于 apple 系统下的处理

static void edit_params(u32 argc, char** argv) {

  u8 fortify_set = 0, asan_set = 0;
  u8 *name;

#if defined(__FreeBSD__) && defined(__x86_64__)
  u8 m32_set = 0;
#endif

  cc_params = ck_alloc((argc + 128) * sizeof(u8*));

  name = strrchr(argv[0], '/');
  if (!name) name = argv[0]; else name++;

  if (!strncmp(name, "afl-clang", 9)) {

    clang_mode = 1;

    setenv(CLANG_ENV_VAR, "1", 1);

    if (!strcmp(name, "afl-clang++")) {
      u8* alt_cxx = getenv("AFL_CXX");
      cc_params[0] = alt_cxx ? alt_cxx : (u8*)"clang++";
    } else {
      u8* alt_cc = getenv("AFL_CC");
      cc_params[0] = alt_cc ? alt_cc : (u8*)"clang";
    }

  } else {

    /* With GCJ and Eclipse installed, you can actually compile Java! The
       instrumentation will work (amazingly). Alas, unhandled exceptions do
       not call abort(), so afl-fuzz would need to be modified to equate
       non-zero exit codes with crash conditions when working with Java
       binaries. Meh. */

    if (!strcmp(name, "afl-g++")) {
      u8* alt_cxx = getenv("AFL_CXX");
      cc_params[0] = alt_cxx ? alt_cxx : (u8*)"g++";
    } else if (!strcmp(name, "afl-gcj")) {
      u8* alt_cc = getenv("AFL_GCJ");
      cc_params[0] = alt_cc ? alt_cc : (u8*)"gcj";
    } else {
      u8* alt_cc = getenv("AFL_CC");
      cc_params[0] = alt_cc ? alt_cc : (u8*)"gcc";
    }

  }

  // 忽略-B指定编译器工具链中各个组件的搜索路径 -integrated-as集成汇编器 -pipe管道传递前后端
  // -fsanitize=address：启用地址sanitizer，它可以检测如缓冲区溢出、使用后释放（use-after-free）等内存错误。
  // -fsanitize=memory：启用内存sanitizer，它提供了更广泛的内存错误检测，包括未定义行为的内存访问等。
  // FORTIFY_SOURCE 是GCC的一个宏，用于在编译时提供额外的安全检查，以防止一些常见的安全漏洞，如缓冲区溢出、边界检查
  while (--argc) {
    u8* cur = *(++argv);

    if (!strncmp(cur, "-B", 2)) {

      if (!be_quiet) WARNF("-B is already set, overriding");

      if (!cur[2] && argc > 1) { argc--; argv++; }
      continue;

    }

    if (!strcmp(cur, "-integrated-as")) continue;

    if (!strcmp(cur, "-pipe")) continue;

#if defined(__FreeBSD__) && defined(__x86_64__)
    if (!strcmp(cur, "-m32")) m32_set = 1;
#endif

    if (!strcmp(cur, "-fsanitize=address") ||
        !strcmp(cur, "-fsanitize=memory")) asan_set = 1;

    if (strstr(cur, "FORTIFY_SOURCE")) fortify_set = 1;
    // 剩余放入参数列表
    cc_params[cc_par_cnt++] = cur;

  }
  // 通过指定 -B as_path 这个 flag，使得下游编译器在汇编过程中，以 afl-as 得以替换原生的汇编器
  cc_params[cc_par_cnt++] = "-B";
  cc_params[cc_par_cnt++] = as_path;

  if (clang_mode)
    cc_params[cc_par_cnt++] = "-no-integrated-as";

  if (getenv("AFL_HARDEN")) {

    cc_params[cc_par_cnt++] = "-fstack-protector-all";

    if (!fortify_set)
      cc_params[cc_par_cnt++] = "-D_FORTIFY_SOURCE=2";

  }

  if (asan_set) {

    /* Pass this on to afl-as to adjust map density. */

    setenv("AFL_USE_ASAN", "1", 1);

  } else if (getenv("AFL_USE_ASAN")) {

    if (getenv("AFL_USE_MSAN"))
      FATAL("ASAN and MSAN are mutually exclusive");

    if (getenv("AFL_HARDEN"))
      FATAL("ASAN and AFL_HARDEN are mutually exclusive");

    cc_params[cc_par_cnt++] = "-U_FORTIFY_SOURCE";
    cc_params[cc_par_cnt++] = "-fsanitize=address";

  } else if (getenv("AFL_USE_MSAN")) {

    if (getenv("AFL_USE_ASAN"))
      FATAL("ASAN and MSAN are mutually exclusive");

    if (getenv("AFL_HARDEN"))
      FATAL("MSAN and AFL_HARDEN are mutually exclusive");

    cc_params[cc_par_cnt++] = "-U_FORTIFY_SOURCE";
    cc_params[cc_par_cnt++] = "-fsanitize=memory";


  }

  if (!getenv("AFL_DONT_OPTIMIZE")) {

#if defined(__FreeBSD__) && defined(__x86_64__)

    /* On 64-bit FreeBSD systems, clang -g -m32 is broken, but -m32 itself
       works OK. This has nothing to do with us, but let's avoid triggering
       that bug. */

    if (!clang_mode || !m32_set)
      cc_params[cc_par_cnt++] = "-g";

#else

      cc_params[cc_par_cnt++] = "-g";

#endif

    cc_params[cc_par_cnt++] = "-O3";
    cc_params[cc_par_cnt++] = "-funroll-loops";

    /* Two indicators that you're building for fuzzing; one of them is
       AFL-specific, the other is shared with libfuzzer. */

    cc_params[cc_par_cnt++] = "-D__AFL_COMPILER=1";
    cc_params[cc_par_cnt++] = "-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION=1";

  }
  // -fno-builtin 意味着GCC不会尝试用内置编译的代码替换库函数，采用glibc 标准实现
  if (getenv("AFL_NO_BUILTIN")) {

    cc_params[cc_par_cnt++] = "-fno-builtin-strcmp";
    cc_params[cc_par_cnt++] = "-fno-builtin-strncmp";
    cc_params[cc_par_cnt++] = "-fno-builtin-strcasecmp";
    cc_params[cc_par_cnt++] = "-fno-builtin-strncasecmp";
    cc_params[cc_par_cnt++] = "-fno-builtin-memcmp";
    cc_params[cc_par_cnt++] = "-fno-builtin-strstr";
    cc_params[cc_par_cnt++] = "-fno-builtin-strcasestr";

  }

  cc_params[cc_par_cnt] = NULL;

}

逻辑如下：

首先处理编译器的选择问题：
- 分析 argv[0] ，提取到变量 name ( 其值为最后一个/后的字符串 ), 接着开始与固定字符串afl-clang进行对比，来确定自己需要调用哪个下游编译器。
- 例如，如果 argv[0] 是 /afl/afl-clang , 走入对应判断分支后,先获取环境变量AFL_CC的值，如果存在就填入cc_params[0]；否则将afl-clang赋值给cc_params[0]。这里可以看到 AFL 允许用户自己指定下游编译器，如果 AFL_CC 和 AFL_CXX 等变量都存在，则会覆盖掉默认编译器。
接下来，进入 while 循环, 遍历从argv[1]开始的argv参数：
- 如果扫描到 -B ，-B选项用于设置编译器的搜索路径，直接跳过。（find_as已处理as_path）
- 如果扫描到 -integrated-as，跳过
- 如果扫描到 -pipe，跳过
- 如果扫描到 -fsanitize=address 和 -fsanitize=memory 告诉 gcc 检查内存访问的错误，比如数组越界之类，设置 asan_set = 1
- 如果扫描到 FORTIFY_SOURCE ，设置 fortify_set = 1 。FORTIFY_SOURCE 主要进行缓冲区溢出问题的检查，检查的常见函数有 memcpy, mempcpy, memmove, memset, strcpy, stpcpy, strncpy, strcat, strncat, sprintf, vsprintf, snprintf, gets 等
接下来，跳出 while 循环,设置其他参数：
- 设置 -B as_path
- 如果是 clang_mode ，则设置 -no-integrated-as
- 如果存在环境变量 AFL_HARDEN，则设置-fstack-protector-all。且如果没有设置 fortify_set ，追加 -D_FORTIFY_SOURCE=2
编译器优化相关参数，通过多个 if/elif 进行判断:
- if + elif:
  - 如果 asan_set 在前面被设置为 1，即手动设置了-fsanitize=memory或者-fsanitize=address,则设置环境变量 AFL_USE_ASAN="1"
  - 如果 asan_set 不为 1 且存在 AFL_USE_ASAN 环境变量，则
    设置 -U_FORTIFY_SOURCE -fsanitize=address
  - 如果不存在 AFL_USE_ASAN 环境变量，但存在 AFL_USE_MSAN 环境变量，则
    设置 -U_FORTIFY_SOURCE -fsanitize=memory
    不能同时指定AFL_USE_ASAN或者AFL_USE_MSAN，也不能同时指定 AFL_USE_MSAN 和 AFL_HARDEN，因为这样运行时速度过慢
- 如果不存在 AFL_DONT_OPTIMIZE 环境变量，则
  设置-g -O3 -funroll-loops -D__AFL_COMPILER=1 -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION=1
- 如果存在 AFL_NO_BUILTIN 环境变量，则表示允许进行优化，
  设置
  1
  2
  3
  4
  5
  6
  7
  -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp -fno-builtin-strncasecmp -fno-builtin-memcmp -fno-builtin-strstr -fno-builtin-strcasestr

afl-as

在afl-gcc 的封装中，其中一个目的就是将原生 GNU as 替换为 afl-as。

afl-as 也是一个原生 GNU as 的 wrapper。它的作用是预处理由 GCC/clang 生成的汇编文件，并注入包含在 afl-as.h 中的插桩代码。在上文 edit_params 函数中，已经通过-B afl-as 参数指定它参与到编译工具链中。

main 函数核心代码如下：

u8* inst_ratio_str = getenv("AFL_INST_RATIO");
struct timeval tv;
struct timezone tz;
gettimeofday(&tv, &tz);
rand_seed = tv.tv_sec ^ tv.tv_usec ^ getpid();
// 初始化随机数种子
srandom(rand_seed);
edit_params(argc, argv);
// 在汇编指令序列上插桩  
if (!just_version) add_instrumentation();
// few 
// 子进程
if (!(pid = fork())) {
  execvp(as_params[0], (char**)as_params);
  FATAL("Oops, failed to execute '%s' - check your PATH", as_params[0]);

}
if (pid < 0) PFATAL("fork() failed");
if (waitpid(pid, &status, 0) <= 0) PFATAL("waitpid() failed");
if (!getenv("AFL_KEEP_ASSEMBLY")) unlink(modified_file);
exit(WEXITSTATUS(status));

main函数逻辑

通过 gettimeofday(&tv,&tz);获取时区和时间，然后设置 srandom() 的随机种子
调用 edit_params 函数，进行参数处理,确定 as 程序的名字，默认就是 GNU as，但用户也可以提供 AFL_AS 来覆盖,设置临时文件 modified_file 路径为 /tmp/.afl-pid-timestamp.s
调用 add_instrumentation() 函数，这是实际的插桩函数
fork 一个子进程来执行 execvp(as_params[0], (char**)as_params);。这里采用的是 fork 一个子进程的方式来执行插桩。这其实是因为我们的 execvp 执行的时候，会用 as_params[0] 来完全替换掉当前进程空间中的程序，如果不通过子进程来执行实际的 as，那么后续就无法在执行完实际的 as 之后，还能 unlink 掉 modified_file
调用 waitpid(pid, &status, 0) 等待子进程执行结束
判断是否设置 AFL_KEEP_ASSEMBLY ，如果没有设置这个环境变量，就 unlink 掉 modified_file(已插完桩的文件)。设置该环境变量主要是为了防止 afl-as 删掉插桩后的汇编文件，设置为 1 则会保留插桩后的汇编文件

add_instrumentation 插桩逻辑

判断 input_file 是否为空，如果不为空则尝试打开文件获取 fd 赋值给 inf，失败则抛出异常；input_file 为空则 inf 设置为标准输入；
打开 modified_file ，获取 fd 赋值给 outfd，失败返回异常；进一步验证该文件是否可写，不可写返回异常；

while 循环读取 inf 指向文件的每一行到 line 数组，每行最多 MAX_LINE = 8192个字节（含末尾的‘\0’），从line数组里将读取到的内容写入到 outf 指向的文件，然后进入到真正的插桩逻辑。这里需要注意的是，插桩只向 .text 段插入，：

首先跳过标签、宏、注释；

这里结合部分关键代码进行解释。需要注意的是，变量 instr_ok 本质上是一个 flag，用于表示是否位于.text段。变量设置为 1，表示位于 .text 中，如果不为 1，则表示不再。于是，如果instr_ok 为 1，就会在分支处执行插桩逻辑，否则就不插桩。

首先判断读入的行是否以‘\t’ 开头，本质上是在匹配.s文件中声明的段，然后判断line[1]是否为.：

if (line[0] == '\t' && line[1] == '.') {
 
      /* OpenBSD puts jump tables directly inline with the code, which is
         a bit annoying. They use a specific format of p2align directives
         around them, so we use that as a signal. */
 
      if (!clang_mode && instr_ok && !strncmp(line + 2, "p2align ", 8) &&
          isdigit(line[10]) && line[11] == '\n') skip_next_label = 1;
 
      if (!strncmp(line + 2, "text\n", 5) ||
          !strncmp(line + 2, "section\t.text", 13) ||
          !strncmp(line + 2, "section\t__TEXT,__text", 21) ||
          !strncmp(line + 2, "section __TEXT,__text", 21)) {
        instr_ok = 1;
        continue;
      }
 
      if (!strncmp(line + 2, "section\t", 8) ||
          !strncmp(line + 2, "section ", 8) ||
          !strncmp(line + 2, "bss\n", 4) ||
          !strncmp(line + 2, "data\n", 5)) {
        instr_ok = 0;
        continue;
      }
 
    }

‘\t’开头，且line[1]=='.'，检查是否为 p2align 指令，如果是，则设置 skip_next_label = 1；
尝试匹配 "text\n" "section\t.text" "section\t__TEXT,__text" "section __TEXT,__text" 其中任意一个，匹配成功，设置 instr_ok = 1，表示位于 .text 段中，continue 跳出，进行下一次遍历；
尝试匹配"section\t" "section " "bss\n" "data\n" 其中任意一个，匹配成功，设置 instr_ok = 0，表位于其他段中，continue 跳出，进行下一次遍历；

接下来通过几个 if 判断，来设置一些标志信息，包括 off-flavor assembly，Intel/AT&T的块处理方式、ad-hoc __asm__块的处理方式等；

/* Detect off-flavor assembly (rare, happens in gdb). When this is
   encountered, we set skip_csect until the opposite directive is
   seen, and we do not instrument. */
 
if (strstr(line, ".code")) {
 
  if (strstr(line, ".code32")) skip_csect = use_64bit;
  if (strstr(line, ".code64")) skip_csect = !use_64bit;
 
}
 
/* Detect syntax changes, as could happen with hand-written assembly.
   Skip Intel blocks, resume instrumentation when back to AT&T. */
 
if (strstr(line, ".intel_syntax")) skip_intel = 1;
if (strstr(line, ".att_syntax")) skip_intel = 0;
 
/* Detect and skip ad-hoc __asm__ blocks, likewise skipping them. */
 
if (line[0] == '#' || line[1] == '#') {
 
  if (strstr(line, "#APP")) skip_app = 1;
  if (strstr(line, "#NO_APP")) skip_app = 0;
 
}

AFL 在插桩时重点关注的内容包括：^main, ^.L0, ^.LBB0_0, ^\tjnz foo （_main 函数， gcc 和 clang 下的分支标记，条件跳转分支标记），这些内容通常标志了程序的流程变化，因此 AFL 会重点在这些位置进行插桩：

对于形如\tj[^m].格式的指令，即条件跳转指令，且R(100)产生的随机数小于插桩密度inst_ratio，直接使用fprintf将trampoline_fmt_64(插桩部分的指令) 写入 outf 指向的文件，写入大小为小于 MAP_SIZE的随机数——R(MAP_SIZE)

，然后插桩计数ins_lines加一，continue 跳出，进行下一次遍历；

/* If we're in the right mood for instrumenting, check for function
   names or conditional labels. This is a bit messy, but in essence,
   we want to catch:
 
     ^main:      - function entry point (always instrumented)
     ^.L0:       - GCC branch label
     ^.LBB0_0:   - clang branch label (but only in clang mode)
     ^\tjnz foo  - conditional branches
 
   ...but not:
 
     ^# BB#0:    - clang comments
     ^ # BB#0:   - ditto
     ^.Ltmp0:    - clang non-branch labels
     ^.LC0       - GCC non-branch labels
     ^.LBB0_0:   - ditto (when in GCC mode)
     ^\tjmp foo  - non-conditional jumps
 
   Additionally, clang and GCC on MacOS X follow a different convention
   with no leading dots on labels, hence the weird maze of #ifdefs
   later on.
 
 */
 
if (skip_intel || skip_app || skip_csect || !instr_ok ||
    line[0] == '#' || line[0] == ' ') continue;
 
/* Conditional branch instruction (jnz, etc). We append the instrumentation
   right after the branch (to instrument the not-taken path) and at the
   branch destination label (handled later on). */
 
if (line[0] == '\t') {
 
  if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {
 
    fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
            R(MAP_SIZE));
 
    ins_lines++;
 
  }
 
  continue;
 
}

对于 label 的相关评估，有一些 label 可能是一些分支的目的地，需要自己的评判

首先检查该行中是否存在:，然后检查是否以.开始

如果以.开始，则代表想要插桩^.L0:或者 ^.LBB0_0:这样的 branch label，即 style jump destination
1. 检查 line[2]是否为数字或者如果是在 clang_mode 下，比较从line[1]开始的三个字节是否为LBB.，前述所得结果和R(100) < inst_ratio)相与。如果结果为真，则设置instrument_next = 1；
否则代表这是一个 function，插桩^func:，function entry point，直接设置instrument_next = 1（defer mode）。

    /* Label of some sort. This may be a branch destination, but we need to
       tread carefully and account for several different formatting
       conventions. */
 
#ifdef __APPLE__
 
    /* Apple: L: */
 
    if ((colon_pos = strstr(line, ":"))) {
 
      if (line[0] == 'L' && isdigit(*(colon_pos - 1))) {
 
#else
 
    /* Everybody else: .L: */
 
    if (strstr(line, ":")) {
 
      if (line[0] == '.') {
 
#endif /* __APPLE__ */
 
        /* .L0: or LBB0_0: style jump destination */
 
#ifdef __APPLE__
 
        /* Apple: L / LBB */
 
        if ((isdigit(line[1]) || (clang_mode && !strncmp(line, "LBB", 3)))
            && R(100) < inst_ratio) {
 
#else
 
        /* Apple: .L / .LBB */
 
        if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
            && R(100) < inst_ratio) {
 
#endif /* __APPLE__ */
 
          /* An optimization is possible here by adding the code only if the
             label is mentioned in the code in contexts other than call / jmp.
             That said, this complicates the code by requiring two-pass
             processing (messy with stdin), and results in a speed gain
             typically under 10%, because compilers are generally pretty good
             about not generating spurious intra-function jumps.
 
             We use deferred output chiefly to avoid disrupting
             .Lfunc_begin0-style exception handling calculations (a problem on
             MacOS X). */
 
          if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;
 
        }
 
      } else {
 
        /* Function label (always instrumented, deferred mode). */
 
        instrument_next = 1;
 
      }
    }
  }

上述过程完成后，来到 while 循环的下一个循环，在 while 的开头，可以看到对以 defered mode 进行插桩的位置进行了真正的插桩处理：

if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
    instrument_next && line[0] == '\t' && isalpha(line[1])) {
 
  fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
          R(MAP_SIZE));
 
  instrument_next = 0;
  ins_lines++;
 
}

这里对 instr_ok, instrument_next 变量进行了检验是否为 1，而且进一步校验是否位于 .text 段中，且设置了 defered mode 进行插桩，则就进行插桩操作，写入 trampoline_fmt_64/32 。

至此，插桩函数 add_instrumentation 的主要逻辑已梳理完成。

afl-as.h 桩代码逻辑

关于这部分的分析，最好的学习办法是设置 AFL_KEEP_ASSEMBLY=1 去编译 ,对比插桩前后汇编代码差异

# afl-gcc
AFL_KEEP_ASSEMBLY=1 AFL_DONT_OPTIMIZE=1 ../afl-gcc test_as.c -o test_as -O0 -fno-asynchronous-unwind-tables
ls -al /tmp/.afl-16830-1734879283.s
cp /tmp/.afl-16830-1734879283.s ./
# 原生gcc
gcc -S test_as.c -O0 -fno-asynchronous-unwind-tables  -o test_as1.o

插桩后的汇编代码会在每一个基本块入口处插入了一段代码，且在程序末尾，写有 AFL 桩代码变量定义和桩代码函数实现

参考

非常感谢前人分析的优秀文章！这对我的学习帮助很大

安全研究 > Fuzz

#安全研究 #AFL #源码

吃透重点：AFL源码研究

https://k3ppf0r.github.io/2024/12/31/安全研究/Fuzz/吃透重点：AFL源码研究/

作者

k3ppf0r

发布于

2024年12月31日

许可协议

虚拟机与主机通信相关配置下一篇