什么是抽象语法树？「译」

更新日期: 2019-06-16 阅读: 3.2k 标签: 语法分享

AST 是抽象语法树的缩写词，表示编程语言的语句和表达式中生成的 token。有了 AST，解释器或编译器就可以生成机器码或者对一条指令求值。

小贴士: 通过使用 Bit，你可以将任意的 JS 代码转换为一个可在项目和应用中共享、使用和同步的 api，从而更快地构建并重用更多代码。试一下吧。

假设我们有下面这条简单的表达式：

1 + 2

用 AST 来表示的话，它是这样的：

+ BinaryExpression
 - type: +
 - left_value: 
  LiteralExpr:
   value: 1
 - right_vaue:
  LiteralExpr:
   value: 2

诸如 if 的语句则可以像下面这样表示：

if(2 > 6) {
    var d  = 90
    console.log(d)
}
IfStatement
 - condition
  + BinaryExpression
   - type: >
   - left_value: 2
   - right_value: 6
 - body
  [
    - Assign
        - left: 'd';
        - right: 
            LiteralExpr:
            - value: 90
    - MethodCall:
         - instanceName: console
         - methodName: log
         - args: [
         ]
  ]

这告诉解释器如何解释语句，同时告诉编译器如何生成语句对应的代码。

看看这条表达式： 1 + 2。我们的大脑判定这是一个将左值和右值相加的加法运算。现在，为了让计算机像我们的大脑那样工作，我们必须以类似于大脑看待它的形式来表示它。

我们用一个类来表示，其中的属性告诉解释器运算的全部内容、左值和右值。因为一个二元运算涉及两个值，所以我们给这个类命名为 Binary：

class Binary {  
    constructor(left, operator, right) {  
        this.left = left  
        this.operator = operator  
        this.right = right  
    }  
}

实例化期间，我们将会把 1 传给第一个属性，把 ADD 传给第二个属性，把 2 传给第三个属性：

new Binary('1', 'ADD', '2')

当我们把它传递给解释器的时候，解释器认为这是一个二元运算，接着检查操作符，认为这是一个加法运算，紧接着继续请求实例中的 left 值和 right 值，并将二者相加：

const binExpr = new Binary('1', 'ADD', '2')

if(binExpr.operator == 'ADD') {  
    return binExpr.left + binExpr.right  
}  
// 返回 `3`

看，AST 可以像大脑那样执行表达式和语句。

单数字、字符串、布尔值等都是表达式，它们可以在 AST 中表示并求值。

23343
false
true
"nnamdi"

拿 1 举例：

我们在 AST 的 Literal（字面量）类中来表示它。一个字面量就是一个单词或者数字，Literal 类用一个属性来保存它：

class Literal {  
    constructor(value) {  
        this.value = value  
    }  
}

我们可以像下面这样表示 Literal 中的 1：

new Literal(1)

当解释器对它求值时，它会请求 Literal 实例中 value 属性的值：

const oneLit = new Literal(1)  
oneLit.value  
// `1`

在我们的二元表达式中，我们直接传递了值

new Binary('1', 'ADD', '2')

这其实并不合理。因为正如我们在上面看到的，1 和 2 都是一条表达式，一条基本的表达式。作为字面量，它们同样需要被求值，并且用 Literal 类来表示。

const oneLit = new Literal('1')  
const twoLit = new Literal('2')

因此，二元表达式会将 oneLit 和 twoLit 分别作为左属性和右属性。

// ...  
new Binary(oneLit, 'ADD', twoLit)

在求值阶段，左属性和右属性同样需要进行求值，以获得各自的值：

const oneLit = new Literal('1')  
const twoLit = new Literal('2')  
const binExpr = new Binary(oneLit, 'ADD', twoLit)

if(binExpr.operator == 'ADD') {  
    return binExpr.left.value + binExpr.right.value  
}  
// 返回 `3`

其它语句在 AST 中也大多是用二元来表示的，例如 if 语句。

我们知道，在 if 语句中，只有条件为真的时候代码块才会执行。

if(9 > 7) {  
    log('Yay!!')  
}

上面的 if 语句中，代码块执行的条件是 9 必须大于 7，之后我们可以在终端上看到输出 Yay!!。

为了让解释器或者编译器这样执行，我们将会在一个包含 condition、 body 属性的类中来表示它。condition 保存着解析后必须为真的条件，body 则是一个数组，它包含着 if 代码块中的所有语句。解释器将会遍历该数组并执行里面的语句。

class IfStmt {  
    constructor(condition, body) {  
        this.condition = condition  
        this.body = body  
    }  
}

现在，让我们在 IfStmt 类中表示下面的语句

if(9 > 7) {  
    log('Yay!!')  
}

条件是一个二元运算，这将表示为：

const cond = new Binary(new Literal(9), "GREATER", new Literal(7))

就像之前一样，但愿你还记得？这回是一个 GREATER 运算。

if 语句的代码块只有一条语句：一个函数调用。函数调用同样可以在一个类中表示，它包含的属性有：用于指代所调用函数的 name 以及用于表示传递的参数的 args：

class FuncCall {  
    constructor(name, args) {  
        this.name = name  
        this.args = args  
    }  
}

因此，log("Yay!!") 调用可以表示为：

const logFuncCall = new FuncCall('log', [])

现在，把这些组合在一起，我们的 if 语句就可以表示为：

const cond = new Binary(new Literal(9), "GREATER", new Literal(7));  
const logFuncCall = new FuncCall('log', []);

const ifStmt = new IfStmt(cond, [  
    logFuncCall  
])

解释器可以像下面这样解释 if 语句：

const ifStmt = new IfStmt(cond, [  
    logFuncCall  
])

function interpretIfStatement(ifStmt) {  
    if(evalExpr(ifStmt.conditon)) {  
        for(const stmt of ifStmt.body) {  
            evalStmt(stmt)  
        }  
    }  
}

interpretIfStatement(ifStmt)

输出：

Yay!!

因为 9 > 7 :)

我们通过检查 condition 解析后是否为真来解释 if 语句。如果为真，我们遍历 body 数组并执行里面的语句。

执行 AST

使用访问者模式对 AST 进行求值。访问者模式是设计模式的一种，允许一组对象的算法在一个地方实现。

ASTs，Literal，Binary，IfStmnt 是一组相关的类，每一个类都需要携带方法以使解释器获得它们的值或者对它们求值。

访问者模式让我们能够创建单个类，并在类中编写 AST 的实现，将类提供给 AST。每个 AST 都有一个公有的方法，解释器会通过实现类实例对其进行调用，之后 AST 类将在传入的实现类中调用相应的方法，从而计算其 AST。

class Literal {  
    constructor(value) {  
        this.value = value  
    }

    visit(visitor) {  
        return visitor.visitLiteral(this)  
    }  
}

class Binary {  
    constructor(left, operator, right) {  
        this.left = left  
        this.operator = operator  
        this.right = right  
    }

    visit(visitor) {  
        return visitor.visitBinary(this)  
    }  
}

看，AST Literal 和 Binary 都有访问方法，但是在方法里面，它们调用访问者实例的方法来对自身求值。Literal 调用 visitLiteral，Binary 则调用 visitBinary。

现在，将 Vistor 作为实现类，它将实现 visitLiteral 和 visitBinary 方法：

class Visitor {

    visitBinary(binExpr) {  
        // ...  
        log('not yet implemented')  
    }

    visitLiteral(litExpr) {  
        // ...  
        log('not yet implemented')  
    }  
}

visitBinary 和 visitLiteral 在 Vistor 类中将会有自己的实现。因此，当一个解释器想要解释一个二元表达式时，它将调用二元表达式的访问方法，并传递 Vistor 类的实例：

const binExpr = new Binary(...)  
const visitor = new Visitor()

binExpr.visit(visitor)

访问方法将调用访问者的 visitBinary，并将其传递给方法，之后打印 not yet implemented。这称为双重分派。

调用 Binary 的访问方法。
它 (Binary) 反过来调用 Visitor 实例的visitBinary。

我们把 visitLiteral 的完整代码写一下。由于 Literal 实例的 value 属性保存着值，所以这里只需返回这个值就好：

class Visitor {

    visitBinary(binExpr) {  
        // ...  
        log('not yet implemented')  
    }

    visitLiteral(litExpr) {  
        return litExpr.value  
    }  
}

对于 visitBinary，我们知道 Binary 类有操作符、左属性和右属性。操作符表示将对左右属性进行的操作。我们可以编写实现如下：

class Visitor {

    visitBinary(binExpr) {  
        switch(binExpr.operator) {  
            case 'ADD':  
            // ...  
        }  
    }

    visitLiteral(litExpr) {  
        return litExpr.value  
    }  
}

注意，左值和右值都是表达式，可能是字面量表达式、二元表达式、调用表达式或者其它的表达式。我们并不能确保二元运算的左右两边总是字面量。每一个表达式必须有一个用于对表达式求值的访问方法，因此在上面的 visitBinary 方法中，我们通过调用各自对应的 visit 方法对 Binary 的左属性和右属性进行求值：

class Visitor {

    visitBinary(binExpr) {  
        switch(binExpr.operator) {  
            case 'ADD':  
                return binExpr.left.visit(this) + binExpr.right.visit(this)  
        }  
    }

    visitLiteral(litExpr) {  
        return litExpr.value  
    }  
}

因此，无论左值和右值保存的是哪一种表达式，最后都可以进行传递。

因此，如果我们有下面这些语句:

const oneLit = new Literal('1')  
const twoLit = new Literal('2')  
const binExpr = new Binary(oneLit, 'ADD', twoLit)  
const visitor = new Visitor()

binExpr.visit(visitor)

在这种情况下，二元运算保存的是字面量。

访问者的 visitBinary 将会被调用，同时将 binExpr 传入，在 Vistor 类中，visitBinary 将 oneLit 作为左值，将 twoLit 作为右值。由于 oneLit 和 twoLit 都是 Literal 的实例，因此它们的访问方法会被调用，同时将 Visitor 类传入。对于 oneLit，其 Literal 类内部又会调用 Vistor 类的 visitLiteral 方法，并将 oneLit 传入，而 Vistor 中的 visitLiteral 方法返回 Literal 类的 value 属性，也就是 1。同理，对于 twoLit 来说，返回的是 2。

因为执行了 switch 语句中的 case 'ADD'，所以返回的值会相加，最后返回 3。

如果我们将 binExpr.visit(visitor) 传给 console.log，它将会打印 3

console.log(binExpr.visit(visitor))  
// 3

如下，我们传递一个 3 分支的二元运算：

1 + 2 + 3

首先，我们选择 1 + 2，那么其结果将作为左值，即 + 3。

上述可以用 Binary 类表示为：

new Binary (new Literal(1), 'ADD', new Binary(new Literal(2), 'ADD', new Literal(3)))

可以看到，右值不是字面量，而是一个二元表达式。所以在执行加法运算之前，它必须先对这个二元表达式求值，并将其结果作为最终求值时的右值。

const oneLit = new Literal(1)  
const threeLit =new Literal(3)  
const twoLit = new Literal(2)

const binExpr2 = new Binary(twoLit, 'ADD', threeLit)  
const binExpr1 = new Binary (oneLit, 'ADD', binExpr2)

const visitor = new Visitor()

log(binExpr1.visit(visitor))

6

添加 if 语句

将 if 语句带到等式中。为了对一个 if 语句求值，我们将会给 IfStmt 类添加一个 visit 方法，之后它将调用 visitIfStmt 方法：

class IfStmt {  
    constructor(condition, body) {  
        this.condition = condition  
        this.body = body  
    }

    visit(visitor) {  
        return visitor.visitIfStmt(this)  
    }  
}

见识到访问者模式的威力了吗？我们向一些类中新增了一个类，对应地只需要添加相同的访问方法即可，而这将调用它位于 Vistor 类中的对应方法。这种方式将不会破坏或者影响到其它的相关类，访问者模式让我们遵循了开闭原则。

因此，我们在 Vistor 类中实现 visitIfStmt：

class Visitor {  
    // ...

    visitIfStmt(ifStmt) {  
        if(ifStmt.condition.visit(this)) {  
            for(const stmt of ifStmt.body) {  
                stmt.visit(this)  
            }  
        }  
    }  
}

因为条件是一个表达式，所以我们调用它的访问方法对其进行求值。我们使用 JS 中的 if 语句检查返回值，如果为真，则遍历语句的代码块 ifStmt.body，通过调用 visit 方法并传入 Vistor，对数组中每一条语句进行求值。

因此我们可以翻译出这条语句：

if(67 > 90)

添加函数调用和函数声明

接着来添加一个函数调用。我们已经有一个对应的类了：

class FuncCall {  
    constructor(name, args) {  
        this.name = name  
        this.args = args  
    }  
}

添加一个访问方法：

class FuncCall {  
    constructor(name, args) {  
        this.name = name  
        this.args = args  
    }

    visit(visitor) {  
        return visitor.visitFuncCall(this)  
    }  
}

给 Visitor 类添加 visitFuncCall 方法：

class Visitor {  
    // ...

    visitFuncCall(funcCall) {  
        const funcName = funcCall.name  
        const args = []  
        for(const expr of funcCall.args)  
            args.push(expr.visit(this))  
        // ...  
    }  
}

这里有一个问题。除了内置函数之外，还有自定义函数，我们需要为后者创建一个“容器”，并在里面通过函数名保存和引用该函数。

const FuncStore = (  
    class FuncStore {

        constructor() {  
            this.map = new Map()  
        }

        setFunc(name, body) {  
            this.map.set(name, body)  
        }

        getFunc(name) {  
            return this.map.get(name)  
        }  
    }  
    return new FuncStore()  
)()

FuncStore 保存着函数，并从一个 Map 实例中取回这些函数。

class Visitor {  
    // ...

    visitFuncCall(funcCall) {  
        const funcName = funcCall.name  
        const args = []  
        for(const expr of funcCall.args)  
            args.push(expr.visit(this))  
        if(funcName == "log")  
            console.log(...args)  
        if(FuncStore.getFunc(funcName))  
            FuncStore.getFunc(funcName).forEach(stmt => stmt.visit(this))  
    }  
}

看下我们做了什么。如果函数名 funcName（记住，FuncCall 类将函数名保存在 name 属性中）为 log，则运行 JS console.log(...)，并传参给它。如果我们在函数保存中找到了函数，那么就对该函数体进行遍历，依次访问并执行。

现在看看怎么把我们的函数声明放进函数保存中。

函数声明以 fucntion 开头。一般的函数结构是这样的：

function function_name(params) {  
    // function body  
}

因此，我们可以在一个类中用属性表示一个函数声明：name 保存函数函数名，body 则是一个数组，保存函数体中的语句：

class FunctionDeclaration {  
    constructor(name, body) {  
        this.name = name  
        this.body = body  
    }  
}

我们添加一个访问方法，该方法在 Vistor 中被称为 visitFunctionDeclaration：

class FunctionDeclaration {  
    constructor(name, body) {  
        this.name = name  
        this.body = body  
    }

    visit(visitor) {  
        return visitor.visitFunctionDeclaration(this)  
    }  
}

在 Visitor 中：

class Visitor {  
    // ...

    visitFunctionDeclaration(funcDecl) {  
        FuncStore.setFunc(funcDecl.name, funcDecl.body)  
    }  
}

将函数名作为键即可保存函数。

现在，假设我们有下面这个函数：

function addNumbers(a, b) {  
    log(a + b)  
}

function logNumbers() {  
    log(5)  
    log(6)  
}

它可以表示为：

const funcDecl = new FunctionDeclaration('logNumbers', [  
    new FuncCall('log', [new Literal(5)]),  
    new FuncCall('log', [new Literal(6)])  
])

visitor.visitFunctionDeclaration(funcDecl)

现在，我们来调用函数 logNumbers：

const funcCall = new FuncCall('logNumbers', [])  
visitor.visitFuncCall(funcCall)

控制台将会打印：

5
6

结论

理解 AST 的过程是令人望而生畏并且非常消耗脑力的。即使是编写最简单的解析器也需要大量的代码。

注意，我们并没有介绍扫描仪和解析器，而是先行解释了 ASTs 以展示它们的工作过程。如果你能够深入理解 AST 以及它所需要的内容，那么在你开始编写自己的编程语言时，自然就事半功倍了。

熟能生巧，你可以继续添加其它的编程语言特性，例如：

类和对象
方法调用
封装和继承
for-of 语句
while 语句
for-in 语句
其它任何你能想到的有趣特性

原文地址：What is an Abstract Syntax Tree，原文作者：Chidume Nnamdi
译文：https://segmentfault.com/a/1190000019491986

本文内容仅供个人学习、研究或参考使用，不构成任何形式的决策建议、专业指导或法律依据。未经授权，禁止任何单位或个人以商业售卖、虚假宣传、侵权传播等非学习研究目的使用本文内容。如需分享或转载，请保留原文来源信息，不得篡改、删减内容或侵犯相关权益。感谢您的理解与支持！

链接: https://fly63.com/article/detial/3738

上一页: CSS中cursor 鼠标指针光标样式（形状）下一页: 代码重构的基本认知

内容以共享、参考、研究为目的,不存在任何商业目的。其版权属原作者所有,如有侵权或违规,请与小编联系!情况属实本人将予以删除!