ble55ing的技术专栏 code analysis ,fuzzing technique and ctf

clang插桩

2019-04-19
ble55ing


clang+llvm框架

clang是一套结构化编译器的前端,llvm是一个底层虚拟机后端,整体组成了一个编译器架构。clang进行词法分析和语法分析,然后交由llvm进行目标代码的生成。

clang的静态分析很大程度上通过AST(Abstract Syntax Tree  ,抽象语法树 )来展现,clang的模块化结构清晰,代码相对简单,所以很适合进行功能的扩展,因此选用了clang来进行源码分析和插桩。

抽象语法树AST

clang的抽象语法树可以很好的表示程序的结构和逻辑,是clang对源程序进行分析后的产物。接下来结合一个实例展示一下这个抽象语法树是一个什么样的概念。相关文件存放在https://github.com/ble55ing/clang/tree/master/clang-insert

首先来看一下源码,这是一个包含两个函数的源文件

#include "11.h"
#include <stdio.h>

int ab(int a){
    int b =0;
    if (a==b) printf("in ab");
    return 0;
}
int aa(int a){
    if (a==0) {

	for (;a<1;a++) printf("a==0");
	ab(a);
    }
    else if (a==1) printf("a==1");
    else if (a==2) printf("a==2");
    else if (a==3) printf("a==3");
    else if (a==4) printf("a==4");

    return 5;
}

然后我们来看一下它生成的结构化语法树

|-FunctionDecl 0xcc95d58 prev 0xcc3ddb0 <11.c:4:1, line:8:1> line:4:5 used ab 'int (int)'

| |-ParmVarDecl 0xcc95cc8 <col:8, col:12> col:12 used a 'int'

| `-CompoundStmt 0xcc97138 <col:14, line:8:1>

|   |-DeclStmt 0xcc95e90 <line:5:5, col:13>

|   | `-VarDecl 0xcc95e10 <col:5, col:12> col:9 used b 'int' cinit

|   |   `-IntegerLiteral 0xcc95e70 <col:12> 'int' 0

|   |-IfStmt 0xcc96090 <line:6:5, col:29>

|   | |-<<<NULL>>>

|   | |-<<<NULL>>>

|   | |-BinaryOperator 0xcc95f28 <col:9, col:12> 'int' '=='

|   | | |-ImplicitCastExpr 0xcc95ef8 <col:9> 'int' <LValueToRValue>

|   | | | `-DeclRefExpr 0xcc95ea8 <col:9> 'int' lvalue ParmVar 0xcc95cc8 'a' 'int'

|   | | `-ImplicitCastExpr 0xcc95f10 <col:12> 'int' <LValueToRValue>

|   | |   `-DeclRefExpr 0xcc95ed0 <col:12> 'int' lvalue Var 0xcc95e10 'b' 'int'

|   | |-CallExpr 0xcc96030 <col:15, col:29> 'int'

|   | | |-ImplicitCastExpr 0xcc96018 <col:15> 'int (*)(const char *, ...)' <FunctionToPointerDecay>

|   | | | `-DeclRefExpr 0xcc95f50 <col:15> 'int (const char *, ...)' Function 0xcc87a10 'printf' 'int (const char *, ...)'

|   | | `-ImplicitCastExpr 0xcc96078 <col:22> 'const char *' <BitCast>

|   | |   `-ImplicitCastExpr 0xcc96060 <col:22> 'char *' <ArrayToPointerDecay>

|   | |     `-StringLiteral 0xcc95fb8 <col:22> 'char [6]' lvalue "in ab"

|   | `-<<<NULL>>>

|   `-ReturnStmt 0xcc97120 <line:7:5, col:12>

|     `-IntegerLiteral 0xcc97100 <col:12> 'int' 0

`-FunctionDecl 0xcc97210 prev 0xcc3dc28 <line:9:1, line:21:1> line:9:5 aa 'int (int)'

  |-ParmVarDecl 0xcc97180 <col:8, col:12> col:12 used a 'int'

  `-CompoundStmt 0xcc97d00 <col:14, line:21:1>

    |-IfStmt 0xcc97c90 <line:10:5, line:18:33>

    | |-<<<NULL>>>

    | |-<<<NULL>>>

    | |-BinaryOperator 0xcc97310 <line:10:9, col:12> 'int' '=='

    | | |-ImplicitCastExpr 0xcc972f8 <col:9> 'int' <LValueToRValue>

    | | | `-DeclRefExpr 0xcc972b0 <col:9> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    | | `-IntegerLiteral 0xcc972d8 <col:12> 'int' 0

    | |-CompoundStmt 0xcc97628 <col:15, line:14:5>

    | | |-ForStmt 0xcc97510 <line:12:2, col:30>

    | | | |-<<<NULL>>>

    | | | |-<<<NULL>>>

    | | | |-BinaryOperator 0xcc97398 <col:8, col:10> 'int' '<'

    | | | | |-ImplicitCastExpr 0xcc97380 <col:8> 'int' <LValueToRValue>

    | | | | | `-DeclRefExpr 0xcc97338 <col:8> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    | | | | `-IntegerLiteral 0xcc97360 <col:10> 'int' 1

    | | | |-UnaryOperator 0xcc973e8 <col:12, col:13> 'int' postfix '++'

    | | | | `-DeclRefExpr 0xcc973c0 <col:12> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    | | | `-CallExpr 0xcc974b0 <col:17, col:30> 'int'

    | | |   |-ImplicitCastExpr 0xcc97498 <col:17> 'int (*)(const char *, ...)' <FunctionToPointerDecay>

    | | |   | `-DeclRefExpr 0xcc97408 <col:17> 'int (const char *, ...)' Function 0xcc87a10 'printf' 'int (const char *, ...)'

    | | |   `-ImplicitCastExpr 0xcc974f8 <col:24> 'const char *' <BitCast>

    | | |     `-ImplicitCastExpr 0xcc974e0 <col:24> 'char *' <ArrayToPointerDecay>

    | | |       `-StringLiteral 0xcc97468 <col:24> 'char [5]' lvalue "a==0"

    | | `-CallExpr 0xcc975e0 <line:13:2, col:6> 'int'

    | |   |-ImplicitCastExpr 0xcc975c8 <col:2> 'int (*)(int)' <FunctionToPointerDecay>

    | |   | `-DeclRefExpr 0xcc97548 <col:2> 'int (int)' Function 0xcc95d58 'ab' 'int (int)'

    | |   `-ImplicitCastExpr 0xcc97610 <col:5> 'int' <LValueToRValue>

    | |     `-DeclRefExpr 0xcc97570 <col:5> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    | `-IfStmt 0xcc97c58 <line:15:10, line:18:33>

    |   |-<<<NULL>>>

    |   |-<<<NULL>>>

    |   |-BinaryOperator 0xcc976b0 <line:15:14, col:17> 'int' '=='

    |   | |-ImplicitCastExpr 0xcc97698 <col:14> 'int' <LValueToRValue>

    |   | | `-DeclRefExpr 0xcc97650 <col:14> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    |   | `-IntegerLiteral 0xcc97678 <col:17> 'int' 1

    |   |-CallExpr 0xcc97748 <col:20, col:33> 'int'

    |   | |-ImplicitCastExpr 0xcc97730 <col:20> 'int (*)(const char *, ...)' <FunctionToPointerDecay>

    |   | | `-DeclRefExpr 0xcc976d8 <col:20> 'int (const char *, ...)' Function 0xcc87a10 'printf' 'int (const char *, ...)'

    |   | `-ImplicitCastExpr 0xcc97790 <col:27> 'const char *' <BitCast>

    |   |   `-ImplicitCastExpr 0xcc97778 <col:27> 'char *' <ArrayToPointerDecay>

    |   |     `-StringLiteral 0xcc97700 <col:27> 'char [5]' lvalue "a==1"

    |   `-IfStmt 0xcc97c20 <line:16:10, line:18:33>

    |     |-<<<NULL>>>

    |     |-<<<NULL>>>

    |     |-BinaryOperator 0xcc97808 <line:16:14, col:17> 'int' '=='

    |     | |-ImplicitCastExpr 0xcc977f0 <col:14> 'int' <LValueToRValue>

    |     | | `-DeclRefExpr 0xcc977a8 <col:14> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    |     | `-IntegerLiteral 0xcc977d0 <col:17> 'int' 2

    |     |-CallExpr 0xcc978a0 <col:20, col:33> 'int'

    |     | |-ImplicitCastExpr 0xcc97888 <col:20> 'int (*)(const char *, ...)' <FunctionToPointerDecay>

    |     | | `-DeclRefExpr 0xcc97830 <col:20> 'int (const char *, ...)' Function 0xcc87a10 'printf' 'int (const char *, ...)'

    |     | `-ImplicitCastExpr 0xcc978e8 <col:27> 'const char *' <BitCast>

    |     |   `-ImplicitCastExpr 0xcc978d0 <col:27> 'char *' <ArrayToPointerDecay>

    |     |     `-StringLiteral 0xcc97858 <col:27> 'char [5]' lvalue "a==2"

    |     `-IfStmt 0xcc97be8 <line:17:10, line:18:33>

    |       |-<<<NULL>>>

    |       |-<<<NULL>>>

    |       |-BinaryOperator 0xcc97960 <line:17:14, col:17> 'int' '=='

    |       | |-ImplicitCastExpr 0xcc97948 <col:14> 'int' <LValueToRValue>

    |       | | `-DeclRefExpr 0xcc97900 <col:14> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    |       | `-IntegerLiteral 0xcc97928 <col:17> 'int' 3

    |       |-CallExpr 0xcc979f8 <col:20, col:33> 'int'

    |       | |-ImplicitCastExpr 0xcc979e0 <col:20> 'int (*)(const char *, ...)' <FunctionToPointerDecay>

    |       | | `-DeclRefExpr 0xcc97988 <col:20> 'int (const char *, ...)' Function 0xcc87a10 'printf' 'int (const char *, ...)'

    |       | `-ImplicitCastExpr 0xcc97a40 <col:27> 'const char *' <BitCast>

    |       |   `-ImplicitCastExpr 0xcc97a28 <col:27> 'char *' <ArrayToPointerDecay>

    |       |     `-StringLiteral 0xcc979b0 <col:27> 'char [5]' lvalue "a==3"

    |       `-IfStmt 0xcc97bb0 <line:18:10, col:33>

    |         |-<<<NULL>>>

    |         |-<<<NULL>>>

    |         |-BinaryOperator 0xcc97ab8 <col:14, col:17> 'int' '=='

    |         | |-ImplicitCastExpr 0xcc97aa0 <col:14> 'int' <LValueToRValue>

    |         | | `-DeclRefExpr 0xcc97a58 <col:14> 'int' lvalue ParmVar 0xcc97180 'a' 'int'

    |         | `-IntegerLiteral 0xcc97a80 <col:17> 'int' 4

    |         |-CallExpr 0xcc97b50 <col:20, col:33> 'int'

    |         | |-ImplicitCastExpr 0xcc97b38 <col:20> 'int (*)(const char *, ...)' <FunctionToPointerDecay>

    |         | | `-DeclRefExpr 0xcc97ae0 <col:20> 'int (const char *, ...)' Function 0xcc87a10 'printf' 'int (const char *, ...)'

    |         | `-ImplicitCastExpr 0xcc97b98 <col:27> 'const char *' <BitCast>

    |         |   `-ImplicitCastExpr 0xcc97b80 <col:27> 'char *' <ArrayToPointerDecay>

    |         |     `-StringLiteral 0xcc97b08 <col:27> 'char [5]' lvalue "a==4"

    |         `-<<<NULL>>>

    `-ReturnStmt 0xcc97ce8 <line:20:5, col:12>

      `-IntegerLiteral 0xcc97cc8 <col:12> 'int' 5

对于两个函数,在AST中为两个FunctionDecl结构,每个结构下面延伸出不同的Stmt块,并根据不同的嵌套关系产生缩进。

如果觉得上边这个看起来比较费劲,也可以生成可视化的AST图,如下所示:

基于抽象语法树的插桩

整体的插桩也是基于抽象语法树进行的,是在遍历抽象语法树的过程中对源码中的关键位置插入插桩点,相当于是基于RecursiveASTVisitor类进行了源文件的重写,使用了VisitFunctionDecl和VisitStmt定位函数和处理基本块,加入了代码插桩的部分,然后使用Rewriter将插桩代码写入到文件中。

VisitFunctionDecl函数负责找到函数之后的处理,如果f->hasBody则说明其有着函数定义的部分,接下来就将对其进行分析,如果f->isMain()则表示函数是主函数。具体的相关函数可参考http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html

VisitStmt函数负责遍历Stmt结构块并进行相应的处理,如IfStmt,WhileStmt,ForStmt,SwitchStmt,DeclStmt,CallExpr,ReturnStmt等等,具体的结构块分类可以看http://clang.llvm.org/doxygen/classclang_1_1Stmt.html

参考连接

https://www.ibm.com/developerworks/cn/opensource/os-cn-clang/

http://clang.llvm.org/


下一篇 VulDeePecker分析

Content