Skip to content

语法

🌐 Grammar

JavaScript 的语法是最难解析的之一,本教程详细记录了我在学习它时的所有辛酸与努力。

🌐 JavaScript has one of the most challenging grammar to parse, this tutorial details all the sweat and tears I had while learning it.

LL(1) 文法

🌐 LL(1) Grammar

根据 维基百科

🌐 According to Wikipedia,

LL 语法是一种可以被 LL 解析器解析的上下文无关语法,该解析器从左到右解析输入

第一个 L 意味着从 到右扫描源代码,第二个 L 意味着构建一个 最左推导 树。

🌐 The first L means the scanning the source from Left to right, and the second L means the construction of a Leftmost derivation tree.

上下文无关的,以及 LL(1) 中的 (1) 意味着可以仅通过查看下一个符号而无需其他信息来构建一棵树。

🌐 Context-free and the (1) in LL(1) means a tree can be constructed by just peeking at the next token and nothing else.

LL文法在学术界特别受关注,因为我们是懒惰的人类,我们希望编写能够自动生成解析器的程序,这样就不需要手工编写解析器了。

🌐 LL Grammars are of particular interest in academia because we are lazy human beings and we want to write programs that generate parsers automatically so we don't need to write parsers by hand.

不幸的是,大多数工业编程语言都没有良好的 LL(1) 文法,这同样适用于 JavaScript。

🌐 Unfortunately, most industrial programming languages do not have a nice LL(1) grammar, and this applies to JavaScript too.

INFO

Mozilla 几年前启动了 jsparagus 项目,并用 Python 编写了一个 LALR 解析器生成器。他们在过去两年中没有做太多更新,并且在 js-quirks.md 的结尾发出了一个强烈的信息

今天我们学到了什么?

  • 不要写一个 JS 解析器。
  • JavaScript 有一些语法上的恐怖之处。但嘿,你不会因为避免所有错误就创造出世界上最广泛使用的编程语言。你之所以能做到,是因为在合适的情况下,为合适的用户提供了一个可用的工具。

解析 JavaScript 的唯一可行方法是手工编写递归下降解析器,因为它的语法本质如此,所以在自找麻烦之前,让我们先了解语法中的所有怪癖。

🌐 The only practical way to parse JavaScript is to write a recursive descent parser by hand because of the nature of its grammar, so let's learn all the quirks in the grammar before we shoot ourselves in the foot.

下面的列表从简单开始,会逐渐变得难以理解,所以请拿杯咖啡,慢慢享受阅读时间。

🌐 The list below starts simple and will become difficult to grasp, so please take grab a coffee and take your time.

标识符

🌐 Identifiers

#sec-identifiers 中定义了三种类型的标识符,

🌐 There are three types of identifiers defined in #sec-identifiers,

IdentifierReference[Yield, Await] :
BindingIdentifier[Yield, Await] :
LabelIdentifier[Yield, Await] :

estree 和一些 AST 无法区分上述标识符,规范中也没有用明文解释它们。

BindingIdentifier 是声明,IdentifierReference 是对绑定标识符的引用。 例如在 var foo = bar 中,fooBindingIdentifier,而 bar 是语法中的 IdentifierReference

VariableDeclaration[In, Yield, Await] :
    BindingIdentifier[?Yield, ?Await] Initializer[?In, ?Yield, ?Await] opt

Initializer[In, Yield, Await] :
    = AssignmentExpression[?In, ?Yield, ?Await]

AssignmentExpression 代入到 PrimaryExpression 中,我们得到

🌐 follow AssignmentExpression into PrimaryExpression we get

PrimaryExpression[Yield, Await] :
    IdentifierReference[?Yield, ?Await]

在抽象语法树中以不同方式声明这些标识符将大大简化下游工具,尤其是语义分析。

🌐 Declaring these identifiers differently in the AST will greatly simply downstream tools, especially for semantic analysis.

rust
pub struct BindingIdentifier {
    pub node: Node,
    pub name: Atom,
}

pub struct IdentifierReference {
    pub node: Node,
    pub name: Atom,
}

类和严格模式

🌐 Class and Strict Mode

ECMAScript 类诞生于严格模式之后,所以他们决定类中的所有内容都必须使用严格模式,以简化操作。
#sec-class-definitions 中声明了这一点,只需一个 Node: A class definition is always strict mode code.

🌐 ECMAScript Class is born after strict mode, so they decided that everything inside a class must be strict mode for simplicity. It is stated as such in #sec-class-definitions with just a Node: A class definition is always strict mode code.

通过将严格模式与函数作用域关联起来很容易,但“class”声明没有作用域, 我们需要保留一个额外的状态来解析类。

🌐 It is easy to declare strict mode by associating it with function scopes, but a class declaration does not have a scope, we need to keep an extra state just for parsing classes.

rust
// https://github.com/swc-project/swc/blob/f9c4eff94a133fa497778328fa0734aa22d5697c/crates/swc_ecma_parser/src/parser/class_and_fn.rs#L85
fn parse_class_inner(
    &mut self,
    _start: BytePos,
    class_start: BytePos,
    decorators: Vec<Decorator>,
    is_ident_required: bool,
) -> PResult<(Option<Ident>, Class)> {
    self.strict_mode().parse_with(|p| {
        expect!(p, "class");

传统八进制和严格模式

🌐 Legacy Octal and Use Strict

#sec-string-literals-early-errors 不允许在字符串 "\01" 中使用转义的旧八进制:

EscapeSequence ::
    LegacyOctalEscapeSequence
    NonOctalDecimalEscapeSequence

It is a Syntax Error if the source text matched by this production is strict mode code.

检测这一点的最佳位置是在词法分析器中,它可以向解析器询问严格模式状态,并相应地抛出错误。

🌐 The best place to detect this is inside the lexer, it can ask the parser for strict mode state and throw errors accordingly.

但是,当与指令混合时,这就变得不可能了:

🌐 But, this becomes impossible when mixed with directives:

javascript
https://github.com/tc39/test262/blob/747bed2e8aaafe8fdf2c65e8a10dd7ae64f66c47/test/language/literals/string/legacy-octal-escape-sequence-prologue-strict.js#L16-L19

use strict 声明在转义的遗留八进制之后,但仍然需要抛出语法错误。幸运的是,没有真正的代码会使用带有遗留八进制的指令……除非你想通过上面提到的 test262 测试用例。


非简单参数和严格模式

🌐 Non-simple Parameter and Strict Mode

在非严格模式下 function foo(a, a) { } 允许相同的函数参数,我们可以通过添加 use strict 来禁止它:function foo(a, a) { "use strict" }。后来在 ES6 中,函数参数又增加了其他语法,例如 function foo({ a }, b = c) {}

🌐 Identical function parameters is allowed in non-strict mode function foo(a, a) { }, and we can forbid this by adding use strict: function foo(a, a) { "use strict" }. Later on in es6, other grammars were added to function parameters, for example function foo({ a }, b = c) {}.

现在,如果我们写下以下内容,其中 "01" 是严格模式错误,会发生什么?

🌐 Now, what happens if we write the following where "01" is a strict mode error?

javaScript
function foo(
  value = (function() {
    return "\01";
  }()),
) {
  "use strict";
  return value;
}

更具体地说,如果从解析器的角度来看,参数中存在严格模式语法错误,我们应该怎么做?所以在 #sec-function-definitions-static-semantics-early-errors 中,它只是通过声明来禁止这个情况

🌐 More specifically, what should we do if there is a strict mode syntax error inside the parameters thinking from the parser perspective? So in #sec-function-definitions-static-semantics-early-errors, it just bans this by stating

FunctionDeclaration :
FunctionExpression :

It is a Syntax Error if FunctionBodyContainsUseStrict of FunctionBody is true and IsSimpleParameterList of FormalParameters is false.

Chrome 会以神秘信息“未捕获的语法错误:函数中非简单参数列表的非法'严格使用'指令”抛出此错误。

🌐 Chrome throws this error with a mysterious message "Uncaught SyntaxError: Illegal 'use strict' directive in function with non-simple parameter list".

作者在 这篇博客文章 中对 ESLint 做了更深入的解释。

🌐 A more in-depth explanation is described in this blog post by the author of ESLint.

INFO

有趣的事实是,如果我们在 TypeScript 中针对 es5,上述规则不适用,它会被转译为

🌐 Fun fact, the above rule does not apply if we are targeting es5 in TypeScript, it transpiles to

javaScript
function foo(a, b) {
  "use strict";
  if (b === void 0) b = "\01";
}

括号表达式

🌐 Parenthesized Expression

括号表达式应该没有任何语义意义吗? 例如,((x)) 的 AST 可以只是一个单独的 IdentifierReference,而不是 ParenthesizedExpression -> ParenthesizedExpression -> IdentifierReference。 在 JavaScript 语法中也是这种情况。

🌐 Parenthesized expressions are supposed to not have any semantic meanings? For instance the AST for ((x)) can just be a single IdentifierReference, not ParenthesizedExpression -> ParenthesizedExpression -> IdentifierReference. And this is the case for JavaScript grammar.

但是......谁能想到它竟然能在运行时有意义。 在[本期刊](https://github.com/estree/estree/issues/194)中发现,显示

🌐 But ... who would have thought it can have runtime meanings. Found in this estree issue, it shows that

javascript
> fn = function () {};
> fn.name
< "fn"

> (fn) = function () {};
> fn.name
< ''

所以最终 Acorn 和 Babel 为了兼容性添加了 preserveParens 选项。

🌐 So eventually acorn and babel added the preserveParens option for compatibility.


在 If 语句中的函数声明

🌐 Function Declaration in If Statement

如果我们严格按照 #sec-ecmascript-language-statements-and-declarations 中的语法:

🌐 If we follow the grammar precisely in #sec-ecmascript-language-statements-and-declarations:

Statement[Yield, Await, Return] :
    ... lots of statements

Declaration[Yield, Await] :
    ... declarations

我们为 AST 定义的 Statement 节点显然不会包含 Declaration

🌐 The Statement node we define for our AST would obviously not contain Declaration,

但在附录 B #sec-functiondeclarations-in-ifstatement-statement-clauses 中,它允许在非严格模式下在 if 语句的位置内部进行声明:

🌐 but in Annex B #sec-functiondeclarations-in-ifstatement-statement-clauses, it allows declaration inside the statement position of if statements in non-strict mode:

javascript
if (x) {
  function foo() {}
} else function bar() {}

标签声明是合法的

🌐 Label statement is legit

我们可能从未写过一行标记语句,但在现代 JavaScript 中它是合法的,并且严格模式下也不禁止。

🌐 We probably have never written a single line of labelled statement, but it is legit in modern JavaScript and not banned by strict mode.

以下语法是正确的,它返回一个带标签的语句(而不是对象字面量)。

🌐 The following syntax is correct, it returns a labelled statement (not object literal).

javascript
<Foo
  bar={() => {
    baz: "quaz";
  }}
/>
//   ^^^^^^^^^^^ `LabelledStatement`

let 不是关键词

🌐 let is not a keyword

let 不是关键字,因此它可以出现在任何位置,除非语法明确指出 let 不允许出现在这些位置。 解析器需要查看 let 令牌之后的令牌,并决定它需要解析为什么,例如:

javascript
let a;
let = foo;
let instanceof x;
let + 1;
while (true) let;
a = let[0];

For-in / For-of 与 [In] 上下文

🌐 For-in / For-of and the [In] context

如果我们查看 #prod-ForInOfStatementfor-infor-of 的语法,就会立即感到很难理解如何解析它们。

🌐 If we look at the grammar for for-in and for-of in #prod-ForInOfStatement, it is immediately confusing to understand how to parse these.

对于我们来说,有两个主要的障碍需要理解:[lookahead ≠ let]部分和[+In]部分。

🌐 There are two major obstacles for us to understand: the [lookahead ≠ let] part and the [+In] part.

如果我们已经解析到 for (let,我们需要检查正在查看的标记是:

🌐 If we have parsed to for (let, we need to check the peeking token is:

  • 不是用 in 来禁止 for (let in)
  • {[ 或一个标识符,用于允许 for (let {} = foo)for (let [] = foo)for (let bar = foo)

一旦遇到 ofin 关键字,右侧表达式需要在正确的 [+In] 上下文中传递,以禁止在 #prod-RelationalExpression 中的两个 in 表达式:

🌐 Once reached the of or in keyword, the right-hand side expression needs to be passed with the correct [+In] context to disallow the two in expressions in #prod-RelationalExpression:

RelationalExpression[In, Yield, Await] :
    [+In] RelationalExpression[+In, ?Yield, ?Await] in ShiftExpression[?Yield, ?Await]
    [+In] PrivateIdentifier in ShiftExpression[?Yield, ?Await]

Note 2: The [In] grammar parameter is needed to avoid confusing the in operator in a relational expression with the in operator in a for statement.

而这是整个规范中[In]上下文的唯一应用。

🌐 And this is the only application for the [In] context in the entire specification.

另外需要注意的是,语法 [lookahead ∉ { let, async of }] 禁止 for (async of ...),并且需要明确地防范它。

🌐 Also to note, the grammar [lookahead ∉ { let, async of }] forbids for (async of ...), and it needs to be explicitly guarded against.


块级函数声明

🌐 Block-Level Function Declarations

在附录 B.3.2 #sec-block-level-function-declarations-web-legacy-compatibility-semantics 中,整整一页内容用来解释 FunctionDeclarationBlock 语句中的预期表现。总结起来就是

🌐 In Annex B.3.2 #sec-block-level-function-declarations-web-legacy-compatibility-semantics, an entire page is dedicated to explain how FunctionDeclaration is supposed to behave in Block statements. It boils down to

javascript
https://github.com/acornjs/acorn/blob/11735729c4ebe590e406f952059813f250a4cbd1/acorn/src/scope.js#L30-L35

如果 FunctionDeclaration 的名字在函数声明内部,它需要像 var 声明一样处理。这个代码片段会因为重复声明而报错,因为 bar 在块级作用域内:

🌐 The name of a FunctionDeclaration needs to be treated the same as a var declaration if its inside a function declaration. This code snippet errors with a re-declaration error since bar is inside a block scope:

javascript
function foo() {
  if (true) {
    var bar;
    function bar() {} // redeclaration error
  }
}

同时,下面的代码不会出错,因为它在函数作用域内,函数 bar 被视为一个 var 声明:

🌐 meanwhile, the following does not error because it is inside a function scope, function bar is treated as a var declaration:

javascript
function foo() {
  var bar;
  function bar() {}
}

语法上下文

🌐 Grammar Context

语法语法有5个上下文参数,用于允许或禁止某些结构,分别是 [In][Return][Yield][Await][Default]

🌐 The syntactic grammar has 5 context parameters for allowing and disallowing certain constructs, namely [In], [Return], [Yield], [Await] and [Default].

在解析过程中最好保持上下文,例如在 Biome 中:

🌐 It is best to keep a context during parsing, for example in Biome:

rust
// https://github.com/rome/tools/blob/5a059c0413baf1d54436ac0c149a829f0dfd1f4d/crates/rome_js_parser/src/state.rs#L404-L425

pub(crate) struct ParsingContextFlags: u8 {
    /// Whether the parser is in a generator function like `function* a() {}`
    /// Matches the `Yield` parameter in the ECMA spec
    const IN_GENERATOR = 1 << 0;
    /// Whether the parser is inside a function
    const IN_FUNCTION = 1 << 1;
    /// Whatever the parser is inside a constructor
    const IN_CONSTRUCTOR = 1 << 2;

    /// Is async allowed in this context. Either because it's an async function or top level await is supported.
    /// Equivalent to the `Async` generator in the ECMA spec
    const IN_ASYNC = 1 << 3;

    /// Whether the parser is parsing a top-level statement (not inside a class, function, parameter) or not
    const TOP_LEVEL = 1 << 4;

    /// Whether the parser is in an iteration or switch statement and
    /// `break` is allowed.
    const BREAK_ALLOWED = 1 << 5;

    /// Whether the parser is in an iteration statement and `continue` is allowed.
    const CONTINUE_ALLOWED = 1 << 6;

并根据语法相应地切换和检查这些标志。

🌐 And toggle and check these flags accordingly by following the grammar.

赋值模式 vs 绑定模式

🌐 AssignmentPattern vs BindingPattern

estree中,AssignmentExpression的左边是一个Pattern

🌐 In estree, the left-hand side of an AssignmentExpression is a Pattern:

extend interface AssignmentExpression {
    left: Pattern;
}

VariableDeclarator 的左侧是一个 Pattern

🌐 and the left-hand side of a VariableDeclarator is a Pattern:

interface VariableDeclarator <: Node {
    type: "VariableDeclarator";
    id: Pattern;
    init: Expression | null;
}

Pattern 可以是 IdentifierObjectPatternArrayPattern

🌐 A Pattern can be a Identifier, ObjectPattern, ArrayPattern:

interface Identifier <: Expression, Pattern {
    type: "Identifier";
    name: string;
}

interface ObjectPattern <: Pattern {
    type: "ObjectPattern";
    properties: [ AssignmentProperty ];
}

interface ArrayPattern <: Pattern {
    type: "ArrayPattern";
    elements: [ Pattern | null ];
}

但是从规范的角度来看,我们有以下 JavaScript:

🌐 But from the specification perspective, we have the following JavaScript:

javascript
// AssignmentExpression:
{ foo } = bar;
  ^^^ IdentifierReference
[ foo ] = bar;
  ^^^ IdentifierReference

// VariableDeclarator
var { foo } = bar;
      ^^^ BindingIdentifier
var [ foo ] = bar;
      ^^^ BindingIdentifier

这开始变得令人困惑,因为我们现在有一种情况,即无法直接区分 Identifier 是位于 Pattern 中的 BindingIdentifier 还是 IdentifierReference:

🌐 This starts to become confusing because we now have a situation where we cannot directly distinguish whether the Identifier is a BindingIdentifier or a IdentifierReference inside a Pattern:

rust
enum Pattern {
    Identifier, // Is this a `BindingIdentifier` or a `IdentifierReference`?
    ArrayPattern,
    ObjectPattern,
}

这将导致解析器流水线后续出现各种不必要的代码。例如,在为语义分析设置作用域时,我们需要检查这个 Identifier 的父节点,以确定是否应该将其绑定到作用域中。

🌐 This will lead to all sorts of unnecessary code further down the parser pipeline. For example, when setting up the scope for semantic analysis, we need to inspect the parents of this Identifier to determine whether we should bind it to the scope or not.

更好的解决方案是充分理解规范并决定下一步该做什么。

🌐 A better solution is to fully understand the specification and decide what to do.

AssignmentExpressionVariableDeclaration 的语法定义如下:

🌐 The grammar for AssignmentExpression and VariableDeclaration are defined as:

13.15 Assignment Operators

AssignmentExpression[In, Yield, Await] :
    LeftHandSideExpression[?Yield, ?Await] = AssignmentExpression[?In, ?Yield, ?Await]

13.15.5 Destructuring Assignment

In certain circumstances when processing an instance of the production
AssignmentExpression : LeftHandSideExpression = AssignmentExpression
the interpretation of LeftHandSideExpression is refined using the following grammar:

AssignmentPattern[Yield, Await] :
    ObjectAssignmentPattern[?Yield, ?Await]
    ArrayAssignmentPattern[?Yield, ?Await]
14.3.2 Variable Statement

VariableDeclaration[In, Yield, Await] :
    BindingIdentifier[?Yield, ?Await] Initializer[?In, ?Yield, ?Await]opt
    BindingPattern[?Yield, ?Await] Initializer[?In, ?Yield, ?Await]

该规范通过分别使用 AssignmentPatternBindingPattern 来定义,从而区分这两种语法。

🌐 The specification distinguishes this two grammar by defining them separately with an AssignmentPattern and a BindingPattern.

所以在这种情况下,不要害怕偏离 estree,为我们的解析器定义额外的 AST 节点:

🌐 So in situations like this, do not be afraid to deviate from estree and define extra AST nodes for our parser:

rust
enum BindingPattern {
    BindingIdentifier,
    ObjectBindingPattern,
    ArrayBindingPattern,
}

enum AssignmentPattern {
    IdentifierReference,
    ObjectAssignmentPattern,
    ArrayAssignmentPattern,
}

我整整一周都处于极度困惑的状态,直到我最终获得了顿悟:我们需要定义一个 AssignmentPattern 节点和一个 BindingPattern 节点,而不是单一的 Pattern 节点。

🌐 I was in a super confusing state for a whole week until I finally reached enlightenment: we need to define an AssignmentPattern node and a BindingPattern node instead of a single Pattern node.

  • estree 一定是正确的,因为人们已经使用它很多年了,所以它不可能错吗?
  • 我们如何在不定义两个独立节点的情况下,清晰地区分模式中的 Identifier?我就是找不到语法在哪里。
  • 在整整一天浏览规范之后……AssignmentPattern 的语法在主章节 “13.15 赋值运算符”的第5小节,副标题为“补充语法” 🤯——这真的是很不合适,因为所有语法通常都定义在主章节,而不是像这个语法在“运行时语义”章节之后定义的这样

TIP

以下情况真的很难理解。此处有风险。

模棱两可的语法

🌐 Ambiguous Grammar

让我们先像解析器一样思考并解决这个问题——给定 / 标记,它是除法运算符还是正则表达式的开始?

🌐 Let's first think like a parser and solve the problem - given the / token, is it a division operator or the start of a regex expression?

javascript
a / b;
a / / regex /;
a /= / regex /;
/ regex / / b;
/=/ / /=/;

这几乎不可能,不是吗?让我们把它们拆开来并遵循语法规则。

🌐 It is almost impossible, isn't it? Let's break these down and follow the grammar.

我们首先需要理解的是,正如 #sec-ecmascript-language-lexical-grammar 所述,语法语法推动词汇语法的发展

🌐 The first thing we need to understand is that the syntactic grammar drives the lexical grammar as stated in #sec-ecmascript-language-lexical-grammar

有几种情况,词汇输入元素的识别对消耗这些输入元素的语法语法环境非常敏感。

这意味着解析器负责告诉词法分析器下一个要返回的标记。上述示例表明,词法分析器需要返回 / 标记或 RegExp 标记。要获取正确的 /RegExp 标记,规范指出:

🌐 This means that the parser is responsible for telling the lexer which token to return next. The above example indicates that the lexer needs to return either a / token or a RegExp token. For getting the correct / or RegExp token, the specification says:

InputElementRegExp 目标符号在允许使用 RegularExpressionLiteral 的所有语法上下文中使用…… 在所有其他上下文中,作为词法目标符号使用 InputElementDiv。

InputElementDivInputElementRegExp 的语法是

🌐 And the syntax for InputElementDiv and InputElementRegExp are

InputElementDiv ::
    WhiteSpace
    LineTerminator
    Comment
    CommonToken
    DivPunctuator <---------- the `/` and `/=` token
    RightBracePunctuator

InputElementRegExp ::
    WhiteSpace
    LineTerminator
    Comment
    CommonToken
    RightBracePunctuator
    RegularExpressionLiteral <-------- the `RegExp` token

这意味着每当语法达到 RegularExpressionLiteral 时,/ 需要被标记为 RegExp 令牌(如果没有匹配的 / 将抛出错误)。 在所有其他情况下,我们将把 / 标记为斜杠令牌。

🌐 This means whenever the grammar reaches RegularExpressionLiteral, / need to be tokenized as a RegExp token (and throw an error if it does not have a matching /). All other cases we'll tokenize / as a slash token.

让我们来看一个例子:

🌐 Let's walk through an example:

a / / regex /
^ ------------ PrimaryExpression:: IdentifierReference
  ^ ---------- MultiplicativeExpression: MultiplicativeExpression MultiplicativeOperator ExponentiationExpression
    ^^^^^^^^ - PrimaryExpression: RegularExpressionLiteral

该语句与任何其他 Statement 的开头都不匹配,因此它将走 ExpressionStatement 路径:

🌐 This statement does not match any other start of Statement, so it'll go down the ExpressionStatement route:

ExpressionStatement --> Expression --> AssignmentExpression --> ... --> MultiplicativeExpression --> ... --> MemberExpression --> PrimaryExpression --> IdentifierReference.

我们停在了 IdentifierReference 而不是 RegularExpressionLiteral,声明“在所有其他上下文中,InputElementDiv 被用作词法目标符号”适用。第一个斜杠是一个 DivPunctuator 标记。

🌐 We stopped at IdentifierReference and not RegularExpressionLiteral, the statement "In all other contexts, InputElementDiv is used as the lexical goal symbol" applies. The first slash is a DivPunctuator token.

由于这是一个 DivPunctuator 标记,语法 MultiplicativeExpression: MultiplicativeExpression MultiplicativeOperator ExponentiationExpression 被匹配,右侧应为 ExponentiationExpression

🌐 Since this is a DivPunctuator token, the grammar MultiplicativeExpression: MultiplicativeExpression MultiplicativeOperator ExponentiationExpression is matched, the right-hand side is expected to be an ExponentiationExpression.

现在我们处于 a / / 中的第二个斜杠处。通过遵循 ExponentiationExpression,我们到达 PrimaryExpression: RegularExpressionLiteral,因为 RegularExpressionLiteral 是唯一与 / 匹配的语法:

🌐 Now we are at the second slash in a / /. By following ExponentiationExpression, we reach PrimaryExpression: RegularExpressionLiteral because RegularExpressionLiteral is the only matching grammar with a /:

RegularExpressionLiteral ::
    / RegularExpressionBody / RegularExpressionFlags

这个第二个 / 将被分词为 RegExp,因为规范说明“在所有允许使用 RegularExpressionLiteral 的语法上下文中都使用 InputElementRegExp 目标符号”。

🌐 This second / will be tokenized as RegExp because the specification states "The InputElementRegExp goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted".

INFO

作为练习,试着按照“/=/ / /=/”的语法学习。


涵盖语法

🌐 Cover Grammar

请先阅读关于此主题的V8 博客文章

🌐 Read the V8 blog post on this topic first.

总而言之,规范说明了以下三种覆盖语法:

🌐 To summarize, the specification states the following three cover grammars:

CoverParenthesizedExpressionAndArrowParameterList

PrimaryExpression[Yield, Await] :
    CoverParenthesizedExpressionAndArrowParameterList[?Yield, ?Await]

When processing an instance of the production
PrimaryExpression[Yield, Await] : CoverParenthesizedExpressionAndArrowParameterList[?Yield, ?Await]
    the interpretation of CoverParenthesizedExpressionAndArrowParameterList is refined using the following grammar:

ParenthesizedExpression[Yield, Await] :
    ( Expression[+In, ?Yield, ?Await] )
ArrowFunction[In, Yield, Await] :
    ArrowParameters[?Yield, ?Await] [no LineTerminator here] => ConciseBody[?In]

ArrowParameters[Yield, Await] :
    BindingIdentifier[?Yield, ?Await]
    CoverParenthesizedExpressionAndArrowParameterList[?Yield, ?Await]

这些定义指的是:

🌐 These definitions defines:

javascript
let foo = (a, b, c); // SequenceExpression
let bar = (a, b, c) => {}; // ArrowExpression
          ^^^^^^^^^ CoverParenthesizedExpressionAndArrowParameterList

解决这个问题的一种简单但笨拙的方法是先将其解析为 Vec<Expression>,然后编写一个转换函数将其转换为 ArrowParameters 节点,即每个单独的 Expression 都需要转换为 BindingPattern

🌐 A simple but cumbersome approach to solving this problem is to parse it as a Vec<Expression> first, then write a converter function to convert it to ArrowParameters node, i.e. each individual Expression need to be converted to a BindingPattern.

需要注意的是,如果我们在解析器中构建作用域树,即在解析过程中为箭头表达式创建作用域,但不为序列表达式创建作用域,那么如何实现这一点并不明显。esbuild 通过首先创建一个临时作用域来解决这个问题,然后如果它不是 ArrowExpression,再将其丢弃。

🌐 It should be noted that, if we are building the scope tree within the parser, i.e. create the scope for arrow expression during parsing, but do not create one for a sequence expression, it is not obvious how to do this. esbuild solved this problem by creating a temporary scope first, and then dropping it if it is not an ArrowExpression.

这在其架构文档中有所说明:

🌐 This is stated in its architecture document:

除了少数几个地方解析器会推入一个作用域并且正在解析一个声明,却发现它实际上并不是声明之外,这大部分都是相当直接的。当一个函数在 TypeScript 中被前向声明而没有函数体时会发生这种情况;在 JavaScript 中,当不确定一个带括号的表达式是否是箭头函数,直到我们遇到后面的 => 符号时,也会发生这种情况。这个问题可以通过进行三次遍而不是两次来解决,这样我们可以在开始设置作用域和声明符号之前完成解析,但我们试图只用两次遍历来完成。因此,如果我们的假设被证明是错误的,我们会调用 popAndDiscardScope() 或 popAndFlattenScope() 来修改作用域树,而不是调用 popScope()。


CoverCallExpressionAndAsyncArrowHead

CallExpression :
    CoverCallExpressionAndAsyncArrowHead

When processing an instance of the production
CallExpression : CoverCallExpressionAndAsyncArrowHead
the interpretation of CoverCallExpressionAndAsyncArrowHead is refined using the following grammar:

CallMemberExpression[Yield, Await] :
    MemberExpression[?Yield, ?Await] Arguments[?Yield, ?Await]
AsyncArrowFunction[In, Yield, Await] :
    CoverCallExpressionAndAsyncArrowHead[?Yield, ?Await] [no LineTerminator here] => AsyncConciseBody[?In]

CoverCallExpressionAndAsyncArrowHead[Yield, Await] :
    MemberExpression[?Yield, ?Await] Arguments[?Yield, ?Await]

When processing an instance of the production
AsyncArrowFunction : CoverCallExpressionAndAsyncArrowHead => AsyncConciseBody
the interpretation of CoverCallExpressionAndAsyncArrowHead is refined using the following grammar:

AsyncArrowHead :
    async [no LineTerminator here] ArrowFormalParameters[~Yield, +Await]

这些定义说明了:

🌐 These definitions define:

javascript
async (a, b, c); // CallExpression
async (a, b, c) => {} // AsyncArrowFunction
^^^^^^^^^^^^^^^ CoverCallExpressionAndAsyncArrowHead

这看起来很奇怪,因为 async 不是关键字。第一个 async 是一个函数名。

🌐 This looks strange because async is not a keyword. The first async is a function name.


CoverInitializedName

13.2.5 Object Initializer

ObjectLiteral[Yield, Await] :
    ...

PropertyDefinition[Yield, Await] :
    CoverInitializedName[?Yield, ?Await]

Note 3: In certain contexts, ObjectLiteral is used as a cover grammar for a more restricted secondary grammar.
The CoverInitializedName production is necessary to fully cover these secondary grammars. However, use of this production results in an early Syntax Error in normal contexts where an actual ObjectLiteral is expected.

13.2.5.1 Static Semantics: Early Errors

In addition to describing an actual object initializer the ObjectLiteral productions are also used as a cover grammar for ObjectAssignmentPattern and may be recognized as part of a CoverParenthesizedExpressionAndArrowParameterList. When ObjectLiteral appears in a context where ObjectAssignmentPattern is required the following Early Error rules are not applied. In addition, they are not applied when initially parsing a CoverParenthesizedExpressionAndArrowParameterList or CoverCallExpressionAndAsyncArrowHead.

PropertyDefinition : CoverInitializedName
    I* t is a Syntax Error if any source text is matched by this production.
13.15.1 Static Semantics: Early Errors

AssignmentExpression : LeftHandSideExpression = AssignmentExpression
If LeftHandSideExpression is an ObjectLiteral or an ArrayLiteral, the following Early Error rules are applied:
    * LeftHandSideExpression must cover an AssignmentPattern.

这些定义说明了:

🌐 These definitions define:

javascript
({ prop = value } = {}); // ObjectAssignmentPattern
({ prop: value }); // ObjectLiteral with SyntaxError

解析器需要使用 CoverInitializedName 解析 ObjectLiteral,如果未能达到 = 对于 ObjectAssignmentPattern,则抛出语法错误。

🌐 Parsers need to parse ObjectLiteral with CoverInitializedName, and throw the syntax error if it does not reach = for ObjectAssignmentPattern.

作为练习,以下“=”中哪个应该触发语法错误?

🌐 As an exercise, which one of the following = should throw a syntax error?

javascript
let { x = 1 } = ({ x = 1 } = { x: 1 });