12 ECMAScript Language: Lexical Grammar
The source text of an ECMAScript Script or Module is first converted into a sequence of input elements, which are tokens, line terminators, comments, or white space. The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element.
There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple goal symbols for the lexical grammar. The InputElementHashbangOrRegExp goal is used at the start of a Script or Module. The InputElementRegExpOrTemplateTail goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted. The InputElementRegExp goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted but neither a TemplateMiddle, nor a TemplateTail is permitted. The InputElementTemplateTail goal is used in all syntactic grammar contexts where a TemplateMiddle or a TemplateTail is permitted but a RegularExpressionLiteral is not permitted. In all other contexts, InputElementDiv is used as the lexical goal symbol.
Note
The use of multiple lexical goals ensures that there are no lexical ambiguities that would affect automatic semicolon insertion. For example, there are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted. This is not affected by semicolon insertion (see 12.10); in examples such as the following:
a = b
/hi/g.
exec(c).
map(d);
where the first non-whitespace, non-comment code point after a LineTerminator is U+002F (SOLIDUS) and the syntactic context allows division or division-assignment, no semicolon is inserted at the LineTerminator. That is, the above example is interpreted in the same way as:
a = b / hi / g.
exec(c).
map(d);
Syntax
InputElementDiv ::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
RightBracePunctuator
InputElementRegExp ::
WhiteSpace
LineTerminator
Comment
CommonToken
RightBracePunctuator
RegularExpressionLiteral
InputElementRegExpOrTemplateTail ::
WhiteSpace
LineTerminator
Comment
CommonToken
RegularExpressionLiteral
TemplateSubstitutionTail
InputElementTemplateTail ::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
TemplateSubstitutionTail
InputElementHashbangOrRegExp ::
WhiteSpace
LineTerminator
Comment
CommonToken
HashbangComment
RegularExpressionLiteral
12.1 Unicode Format-Control Characters
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.
U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see 12.2) outside of comments, string literals, template literals, and regular expression literals.
12.2 White Space
White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a StringLiteral, a RegularExpressionLiteral, a Template, or a TemplateSubstitutionTail where they are considered significant code points forming part of a literal value. They may also occur within a Comment, but cannot appear within any other kind of token.
The ECMAScript white space code points are listed in Table 30.
Table 30: White Space Code Points
Code Points
Name
Abbreviation
|
U+0009
|
CHARACTER TABULATION
|
<TAB>
|
|
U+000B
|
LINE TABULATION
|
<VT>
|
|
U+000C
|
FORM FEED (FF)
|
<FF>
|
|
U+FEFF
|
ZERO WIDTH NO-BREAK SPACE
|
<ZWNBSP>
|
|
any code point in general category “Space_Separator”
|
|
<USP>
|
Note 1
U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) code points are part of <USP>.
Note 2
Other than for the code points listed in Table 30, ECMAScript WhiteSpace intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).
Syntax
WhiteSpace ::
<TAB>
<VT>
<FF>
<ZWNBSP>
<USP>
12.3 Line Terminators
Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (12.10). A line terminator cannot occur within any token except a StringLiteral, Template, or TemplateSubstitutionTail. <LF> and <CR> line terminators cannot occur within a StringLiteral token except as part of a LineContinuation.
A line terminator can occur within a MultiLineComment but cannot occur within a SingleLineComment.
Line terminators are included in the set of white space code points that are matched by the \s class in regular expressions.
The ECMAScript line terminator code points are listed in Table 31.
Table 31: Line Terminator Code Points
Code Point
Unicode Name
Abbreviation
|
U+000A
|
LINE FEED (LF)
|
<LF>
|
|
U+000D
|
CARRIAGE RETURN (CR)
|
<CR>
|
|
U+2028
|
LINE SEPARATOR
|
<LS>
|
|
U+2029
|
PARAGRAPH SEPARATOR
|
<PS>
|
Only the Unicode code points in Table 31 are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they meet the requirements listed in Table 30. The sequence <CR><LF> is commonly used as a line terminator. It should be considered a single SourceCharacter for the purpose of reporting line numbers.
Syntax
LineTerminator ::
<LF>
<CR>
<LS>
<PS>
LineTerminatorSequence ::
<LF>
<CR>
[lookahead ≠ <LF>]
<LS>
<PS>
<CR>
<LF>
12.4 Comments
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any Unicode code point except a LineTerminator code point, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all code points from the // marker to the end of the line. However, the LineTerminator at the end of the line is not considered to be part of the single-line comment; it is recognized separately by the lexical grammar and becomes part of the stream of input elements for the syntactic grammar. This point is very important, because it implies that the presence or absence of single-line comments does not affect the process of automatic semicolon insertion (see 12.10).
Comments behave like white space and are discarded except that, if a MultiLineComment contains a line terminator code point, then the entire comment is considered to be a LineTerminator for purposes of parsing by the syntactic grammar.
Syntax
Comment ::
MultiLineComment
SingleLineComment
MultiLineComment ::
/*
MultiLineCommentCharsopt
*/
MultiLineCommentChars ::
MultiLineNotAsteriskChar
MultiLineCommentCharsopt
*
PostAsteriskCommentCharsopt
PostAsteriskCommentChars ::
MultiLineNotForwardSlashOrAsteriskChar
MultiLineCommentCharsopt
*
PostAsteriskCommentCharsopt
MultiLineNotAsteriskChar ::
SourceCharacter but not *
MultiLineNotForwardSlashOrAsteriskChar ::
SourceCharacter but not one of / or *
SingleLineComment ::
//
SingleLineCommentCharsopt
SingleLineCommentChars ::
SingleLineCommentChar
SingleLineCommentCharsopt
SingleLineCommentChar ::
SourceCharacter but not
LineTerminator
A number of productions in this section are given alternative definitions in section B.1.1
12.5 Hashbang Comments
Hashbang Comments are location-sensitive and like other types of comments are discarded from the stream of input elements for the syntactic grammar.
Syntax
HashbangComment ::
#!
SingleLineCommentCharsopt
12.6 Tokens
Syntax
CommonToken ::
IdentifierName
PrivateIdentifier
Punctuator
NumericLiteral
StringLiteral
Template
Note
The DivPunctuator, RegularExpressionLiteral, RightBracePunctuator, and TemplateSubstitutionTail productions derive additional tokens that are not included in the CommonToken production.
12.7 Names and Keywords
IdentifierName and ReservedWord are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. ReservedWord is an enumerated subset of IdentifierName. The syntactic grammar defines Identifier as an IdentifierName that is not a ReservedWord. The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in the latest version of the Unicode Standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.
Note 1
This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an IdentifierName.
Syntax
PrivateIdentifier ::
#
IdentifierName
IdentifierName ::
IdentifierStart
IdentifierName
IdentifierPart
IdentifierStart ::
IdentifierStartChar
\
UnicodeEscapeSequence
IdentifierPart ::
IdentifierPartChar
\
UnicodeEscapeSequence
IdentifierStartChar ::
UnicodeIDStart
$
_
IdentifierPartChar ::
UnicodeIDContinue
$
AsciiLetter :: one of a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
UnicodeIDStart ::
any Unicode code point with the Unicode property “ID_Start”
UnicodeIDContinue ::
any Unicode code point with the Unicode property “ID_Continue”
The definitions of the nonterminal UnicodeEscapeSequence is given in 12.9.4.
Note 2
The nonterminal IdentifierPart derives _ via UnicodeIDContinue.
Note 3
The sets of code points with Unicode properties “ID_Start” and “ID_Continue” include, respectively, the code points with Unicode properties “Other_ID_Start” and “Other_ID_Continue”.
12.7.1 Identifier Names
Unicode escape sequences are permitted in an IdentifierName, where they contribute a single Unicode code point equal to the IdentifierCodePoint of the UnicodeEscapeSequence. The \ preceding the UnicodeEscapeSequence does not contribute any code points. A UnicodeEscapeSequence cannot be used to contribute a code point to an IdentifierName that would otherwise be invalid. In other words, if a \ UnicodeEscapeSequence sequence were replaced by the SourceCharacter it contributes, the result must still be a valid IdentifierName that has the exact same sequence of SourceCharacter elements as the original IdentifierName. All interpretations of IdentifierName within this specification are based upon their actual code points regardless of whether or not an escape sequence was used to contribute any particular code point.
Two IdentifierNames that are canonically equivalent according to the Unicode Standard are not equal unless, after replacement of each UnicodeEscapeSequence, they are represented by the exact same sequence of code points.
12.7.1.1 Static Semantics: Early Errors
IdentifierStart ::
\
UnicodeEscapeSequence
IdentifierPart ::
\
UnicodeEscapeSequence
12.7.1.2 Static Semantics: IdentifierCodePoints
The syntax-directed operation IdentifierCodePoints takes no arguments and returns a List of code points. It is defined piecewise over the following productions:
IdentifierName ::
IdentifierStart
- Let cp be the IdentifierCodePoint of IdentifierStart.
- Return « cp ».
IdentifierName ::
IdentifierName
IdentifierPart
- Let cps be the IdentifierCodePoints of the derived IdentifierName.
- Let cp be the IdentifierCodePoint of IdentifierPart.
- Return the list-concatenation of cps and « cp ».
12.7.1.3 Static Semantics: IdentifierCodePoint
The syntax-directed operation IdentifierCodePoint takes no arguments and returns a code point. It is defined piecewise over the following productions:
IdentifierStart ::
IdentifierStartChar
- Return the code point matched by IdentifierStartChar.
IdentifierPart ::
IdentifierPartChar
- Return the code point matched by IdentifierPartChar.
UnicodeEscapeSequence ::
u
Hex4Digits
- Return the code point whose numeric value is the MV of Hex4Digits.
UnicodeEscapeSequence ::
u{
CodePoint
}
- Return the code point whose numeric value is the MV of CodePoint.
12.7.2 Keywords and Reserved Words
A keyword is a token that matches IdentifierName, but also has a syntactic use; that is, it appears literally, in a fixed width font, in some syntactic production. The keywords of ECMAScript include if, while, async, await, and many others.
A reserved word is an IdentifierName that cannot be used as an identifier. Many keywords are reserved words, but some are not, and some are reserved only in certain contexts. if and while are reserved words. await is reserved only inside async functions and modules. async is not reserved; it can be used as a variable name or statement label without restriction.
This specification uses a combination of grammatical productions and early error rules to specify which names are valid identifiers and which are reserved words. All tokens in the ReservedWord list below, except for await and yield, are unconditionally reserved. Exceptions for await and yield are specified in 13.1, using parameterized syntactic productions. Lastly, several early error rules restrict the set of valid identifiers. See 13.1.1, 14.3.1.1, 14.7.5.1, and 15.7.1. In summary, there are five categories of identifier names:
-
Those that are always allowed as identifiers, and are not keywords, such as Math, window, toString, and _;
-
Those that are never allowed as identifiers, namely the ReservedWords listed below except await and yield;
-
Those that are contextually allowed as identifiers, namely await and yield;
-
Those that are contextually disallowed as identifiers, in strict mode code: let, static, implements, interface, package, private, protected, and public;
-
Those that are always allowed as identifiers, but also appear as keywords within certain syntactic productions, at places where Identifier is not allowed: as, async, from, get, meta, of, set, and target.
The term conditional keyword, or contextual keyword, is sometimes used to refer to the keywords that fall in the last three categories, and thus can be used as identifiers in some contexts and as keywords in others.
Syntax
ReservedWord :: one of await break case catch class const continue debugger default delete do else enum export extends false finally for function if import in instanceof new null return super switch this throw true try typeof var void while with yield
Note 1
Per 5.1.5, keywords in the grammar match literal sequences of specific SourceCharacter elements. A code point in a keyword cannot be expressed by a \ UnicodeEscapeSequence.
An IdentifierName can contain \ UnicodeEscapeSequences, but it is not possible to declare a variable named "else" by spelling it els\u{65}. The early error rules in 13.1.1 rule out identifiers with the same StringValue as a reserved word.
Note 2
enum is not currently used as a keyword in this specification. It is a future reserved word, set aside for use as a keyword in future language extensions.
Similarly, implements, interface, package, private, protected, and public are future reserved words in strict mode code.
Note 3
The names arguments and eval are not keywords, but they are subject to some restrictions in strict mode code. See 13.1.1, 8.6.4, 15.2.1, 15.5.1, 15.6.1, and 15.8.1.
12.8 Punctuators
Syntax
Punctuator ::
OptionalChainingPunctuator
OtherPunctuator
OptionalChainingPunctuator ::
?.
[lookahead ∉
DecimalDigit]
OtherPunctuator :: one of { ( ) [ ] . ... ; , < > <= >= == != === !== + - * % ** ++ -- << >> >>> & | ^ ! ~ && || ?? ? : = += -= *= %= **= <<= >>= >>>= &= |= ^= &&= ||= ??= =>
DivPunctuator ::
/
/=
RightBracePunctuator ::
}
12.9 Literals
12.9.1 Null Literals
Syntax
NullLiteral ::
null
12.9.2 Boolean Literals
Syntax
BooleanLiteral ::
true
false
12.9.3 Numeric Literals
Syntax
NumericLiteralSeparator ::
_
NumericLiteral ::
DecimalLiteral
DecimalBigIntegerLiteral
NonDecimalIntegerLiteral[+Sep]
NonDecimalIntegerLiteral[+Sep]
BigIntLiteralSuffix
LegacyOctalIntegerLiteral
DecimalBigIntegerLiteral ::
0
BigIntLiteralSuffix
NonZeroDigit
DecimalDigits[+Sep]opt
BigIntLiteralSuffix
NonZeroDigit
NumericLiteralSeparator
DecimalDigits[+Sep]
BigIntLiteralSuffix
NonDecimalIntegerLiteral[Sep] ::
BinaryIntegerLiteral[?Sep]
OctalIntegerLiteral[?Sep]
HexIntegerLiteral[?Sep]
BigIntLiteralSuffix ::
n
DecimalLiteral ::
DecimalIntegerLiteral
.
DecimalDigits[+Sep]opt
ExponentPart[+Sep]opt
.
DecimalDigits[+Sep]
ExponentPart[+Sep]opt
DecimalIntegerLiteral
ExponentPart[+Sep]opt
DecimalIntegerLiteral ::
0
NonZeroDigit
NonZeroDigit
NumericLiteralSeparatoropt
DecimalDigits[+Sep]
NonOctalDecimalIntegerLiteral
DecimalDigits[Sep] ::
DecimalDigit
DecimalDigits[?Sep]
DecimalDigit
[+Sep]
DecimalDigits[+Sep]
NumericLiteralSeparator
DecimalDigit
DecimalDigit :: one of 0 1 2 3 4 5 6 7 8 9
NonZeroDigit :: one of 1 2 3 4 5 6 7 8 9
ExponentPart[Sep] ::
ExponentIndicator
SignedInteger[?Sep]
ExponentIndicator :: one of e E
SignedInteger[Sep] ::
DecimalDigits[?Sep]
+
DecimalDigits[?Sep]
-
DecimalDigits[?Sep]
BinaryIntegerLiteral[Sep] ::
0b
BinaryDigits[?Sep]
0B
BinaryDigits[?Sep]
BinaryDigits[Sep] ::
BinaryDigit
BinaryDigits[?Sep]
BinaryDigit
[+Sep]
BinaryDigits[+Sep]
NumericLiteralSeparator
BinaryDigit
BinaryDigit :: one of 0 1
OctalIntegerLiteral[Sep] ::
0o
OctalDigits[?Sep]
0O
OctalDigits[?Sep]
OctalDigits[Sep] ::
OctalDigit
OctalDigits[?Sep]
OctalDigit
[+Sep]
OctalDigits[+Sep]
NumericLiteralSeparator
OctalDigit
LegacyOctalIntegerLiteral ::
0
OctalDigit
LegacyOctalIntegerLiteral
OctalDigit
NonOctalDecimalIntegerLiteral ::
0
NonOctalDigit
LegacyOctalLikeDecimalIntegerLiteral
NonOctalDigit
NonOctalDecimalIntegerLiteral
DecimalDigit
LegacyOctalLikeDecimalIntegerLiteral ::
0
OctalDigit
LegacyOctalLikeDecimalIntegerLiteral
OctalDigit
OctalDigit :: one of 0 1 2 3 4 5 6 7
NonOctalDigit :: one of 8 9
HexIntegerLiteral[Sep] ::
0x
HexDigits[?Sep]
0X
HexDigits[?Sep]
HexDigits[Sep] ::
HexDigit
HexDigits[?Sep]
HexDigit
[+Sep]
HexDigits[+Sep]
NumericLiteralSeparator
HexDigit
HexDigit :: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The SourceCharacter immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit.
Note
For example: 3in is an error and not the two input elements 3 and in.
12.9.3.1 Static Semantics: Early Errors
NumericLiteral ::
LegacyOctalIntegerLiteral
DecimalIntegerLiteral ::
NonOctalDecimalIntegerLiteral
- It is a Syntax Error if IsStrict(this production) is true.
NoteIn
non-strict code, this syntax is
Legacy.
12.9.3.2 Static Semantics: MV
A numeric literal stands for a value of the Number type or the BigInt type.
-
The MV of
DecimalLiteral ::
DecimalIntegerLiteral
.
DecimalDigits
is the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits × 10-n), where n is the number of code points in DecimalDigits, excluding all occurrences of NumericLiteralSeparator.
-
The MV of
DecimalLiteral ::
DecimalIntegerLiteral
.
ExponentPart
is the MV of DecimalIntegerLiteral × 10e, where e is the MV of ExponentPart.
-
The MV of
DecimalLiteral ::
DecimalIntegerLiteral
.
DecimalDigits
ExponentPart
is (the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits × 10-n)) × 10e, where n is the number of code points in DecimalDigits, excluding all occurrences of NumericLiteralSeparator and e is the MV of ExponentPart.
-
The MV of
DecimalLiteral ::
.
DecimalDigits
is the MV of DecimalDigits × 10-n, where n is the number of code points in DecimalDigits, excluding all occurrences of NumericLiteralSeparator.
-
The MV of
DecimalLiteral ::
.
DecimalDigits
ExponentPart
is the MV of DecimalDigits × 10e - n, where n is the number of code points in DecimalDigits, excluding all occurrences of NumericLiteralSeparator, and e is the MV of ExponentPart.
-
The MV of
DecimalLiteral ::
DecimalIntegerLiteral
ExponentPart
is the MV of DecimalIntegerLiteral × 10e, where e is the MV of ExponentPart.
-
The MV of
DecimalIntegerLiteral :: 0
is 0.
-
The MV of
DecimalIntegerLiteral ::
NonZeroDigit
NumericLiteralSeparatoropt
DecimalDigits
is (the MV of NonZeroDigit × 10n) plus the MV of DecimalDigits, where n is the number of code points in DecimalDigits, excluding all occurrences of NumericLiteralSeparator.
-
The MV of
DecimalDigits ::
DecimalDigits
DecimalDigit
is (the MV of DecimalDigits × 10) plus the MV of DecimalDigit.
-
The MV of
DecimalDigits ::
DecimalDigits
NumericLiteralSeparator
DecimalDigit
is (the MV of DecimalDigits × 10) plus the MV of DecimalDigit.
-
The MV of
ExponentPart ::
ExponentIndicator
SignedInteger
is the MV of SignedInteger.
-
The MV of
SignedInteger ::
-
DecimalDigits
is the negative of the MV of DecimalDigits.
-
The MV of
DecimalDigit :: 0
or of
HexDigit :: 0
or of
OctalDigit :: 0
or of
LegacyOctalEscapeSequence :: 0
or of
BinaryDigit :: 0
is 0.
-
The MV of
DecimalDigit :: 1
or of
NonZeroDigit :: 1
or of
HexDigit :: 1
or of
OctalDigit :: 1
or of
BinaryDigit :: 1
is 1.
-
The MV of
DecimalDigit :: 2
or of
NonZeroDigit :: 2
or of
HexDigit :: 2
or of
OctalDigit :: 2
is 2.
-
The MV of
DecimalDigit :: 3
or of
NonZeroDigit :: 3
or of
HexDigit :: 3
or of
OctalDigit :: 3
is 3.
-
The MV of
DecimalDigit :: 4
or of
NonZeroDigit :: 4
or of
HexDigit :: 4
or of
OctalDigit :: 4
is 4.
-
The MV of
DecimalDigit :: 5
or of
NonZeroDigit :: 5
or of
HexDigit :: 5
or of
OctalDigit :: 5
is 5.
-
The MV of
DecimalDigit :: 6
or of
NonZeroDigit :: 6
or of
HexDigit :: 6
or of
OctalDigit :: 6
is 6.
-
The MV of
DecimalDigit :: 7
or of
NonZeroDigit :: 7
or of
HexDigit :: 7
or of
OctalDigit :: 7
is 7.
-
The MV of
DecimalDigit :: 8
or of
NonZeroDigit :: 8
or of
NonOctalDigit :: 8
or of
HexDigit :: 8
is 8.
-
The MV of
DecimalDigit :: 9
or of
NonZeroDigit :: 9
or of
NonOctalDigit :: 9
or of
HexDigit :: 9
is 9.
-
The MV of
HexDigit :: a
or of
HexDigit :: A
is 10.
-
The MV of
HexDigit :: b
or of
HexDigit :: B
is 11.
-
The MV of
HexDigit :: c
or of
HexDigit :: C
is 12.
-
The MV of
HexDigit :: d
or of
HexDigit :: D
is 13.
-
The MV of
HexDigit :: e
or of
HexDigit :: E
is 14.
-
The MV of
HexDigit :: f
or of
HexDigit :: F
is 15.
-
The MV of
BinaryDigits ::
BinaryDigits
BinaryDigit
is (the MV of BinaryDigits × 2) plus the MV of BinaryDigit.
-
The MV of
BinaryDigits ::
BinaryDigits
NumericLiteralSeparator
BinaryDigit
is (the MV of BinaryDigits × 2) plus the MV of BinaryDigit.
-
The MV of
OctalDigits ::
OctalDigits
OctalDigit
is (the MV of OctalDigits × 8) plus the MV of OctalDigit.
-
The MV of
OctalDigits ::
OctalDigits
NumericLiteralSeparator
OctalDigit
is (the MV of OctalDigits × 8) plus the MV of OctalDigit.
-
The MV of
LegacyOctalIntegerLiteral ::
LegacyOctalIntegerLiteral
OctalDigit
is (the MV of LegacyOctalIntegerLiteral times 8) plus the MV of OctalDigit.
-
The MV of
NonOctalDecimalIntegerLiteral ::
LegacyOctalLikeDecimalIntegerLiteral
NonOctalDigit
is (the MV of LegacyOctalLikeDecimalIntegerLiteral times 10) plus the MV of NonOctalDigit.
-
The MV of
NonOctalDecimalIntegerLiteral ::
NonOctalDecimalIntegerLiteral
DecimalDigit
is (the MV of NonOctalDecimalIntegerLiteral times 10) plus the MV of DecimalDigit.
-
The MV of
LegacyOctalLikeDecimalIntegerLiteral ::
LegacyOctalLikeDecimalIntegerLiteral
OctalDigit
is (the MV of LegacyOctalLikeDecimalIntegerLiteral times 10) plus the MV of OctalDigit.
-
The MV of
HexDigits ::
HexDigits
HexDigit
is (the MV of HexDigits × 16) plus the MV of HexDigit.
-
The MV of
HexDigits ::
HexDigits
NumericLiteralSeparator
HexDigit
is (the MV of HexDigits × 16) plus the MV of HexDigit.
12.9.3.3 Static Semantics: NumericValue
The syntax-directed operation NumericValue takes no arguments and returns a Number or a BigInt. It is defined piecewise over the following productions:
NumericLiteral ::
DecimalLiteral
- Return RoundMVResult(MV of DecimalLiteral).
NumericLiteral ::
NonDecimalIntegerLiteral
- Return 𝔽(MV of NonDecimalIntegerLiteral).
NumericLiteral ::
LegacyOctalIntegerLiteral
- Return 𝔽(MV of LegacyOctalIntegerLiteral).
NumericLiteral ::
NonDecimalIntegerLiteral
BigIntLiteralSuffix
- Return the BigInt value for the MV of NonDecimalIntegerLiteral.
DecimalBigIntegerLiteral ::
0
BigIntLiteralSuffix
- Return 0ℤ.
DecimalBigIntegerLiteral ::
NonZeroDigit
BigIntLiteralSuffix
- Return the BigInt value for the MV of NonZeroDigit.
DecimalBigIntegerLiteral ::
NonZeroDigit
DecimalDigits
BigIntLiteralSuffix
NonZeroDigit
NumericLiteralSeparator
DecimalDigits
BigIntLiteralSuffix
- Let n be the number of code points in DecimalDigits, excluding all occurrences of NumericLiteralSeparator.
- Let mv be (the MV of NonZeroDigit × 10n) plus the MV of DecimalDigits.
- Return ℤ(mv).
12.9.4 String Literals
Note 1
A string literal is 0 or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in 11.1.1. Code points belonging to the Basic Multilingual Plane are encoded as a single code unit element of the string. All other code points are encoded as two code unit elements of the string.
Syntax
StringLiteral ::
"
DoubleStringCharactersopt
"
'
SingleStringCharactersopt
'
DoubleStringCharacters ::
DoubleStringCharacter
DoubleStringCharactersopt
SingleStringCharacters ::
SingleStringCharacter
SingleStringCharactersopt
DoubleStringCharacter ::
SourceCharacter but not one of " or \ or
LineTerminator
<LS>
<PS>
\
EscapeSequence
LineContinuation
SingleStringCharacter ::
SourceCharacter but not one of ' or \ or
LineTerminator
<LS>
<PS>
\
EscapeSequence
LineContinuation
LineContinuation ::
\
LineTerminatorSequence
EscapeSequence ::
CharacterEscapeSequence
0
[lookahead ∉
DecimalDigit]
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
SingleEscapeCharacter :: one of ' " \ b f n r t v
NonEscapeCharacter ::
SourceCharacter but not one of
EscapeCharacter or
LineTerminator
EscapeCharacter ::
SingleEscapeCharacter
DecimalDigit
x
u
LegacyOctalEscapeSequence ::
0
[lookahead ∈ { 8, 9 }]
NonZeroOctalDigit
[lookahead ∉
OctalDigit]
ZeroToThree
OctalDigit
[lookahead ∉
OctalDigit]
FourToSeven
OctalDigit
ZeroToThree
OctalDigit
OctalDigit
NonZeroOctalDigit ::
OctalDigit but not 0
ZeroToThree :: one of 0 1 2 3
FourToSeven :: one of 4 5 6 7
NonOctalDecimalEscapeSequence :: one of 8 9
HexEscapeSequence ::
x
HexDigit
HexDigit
UnicodeEscapeSequence ::
u
Hex4Digits
u{
CodePoint
}
Hex4Digits ::
HexDigit
HexDigit
HexDigit
HexDigit
The definition of the nonterminal HexDigit is given in 12.9.3. SourceCharacter is defined in 11.1.
Note 2
<LF> and <CR> cannot appear in a string literal, except as part of a LineContinuation to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as \n or \u000A.
12.9.4.1 Static Semantics: Early Errors
EscapeSequence ::
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
- It is a Syntax Error if IsStrict(this production) is true.
Note 1In
non-strict code, this syntax is
Legacy.
Note 2
It is possible for string literals to precede a Use Strict Directive that places the enclosing code in strict mode, and implementations must take care to enforce the above rules for such literals. For example, the following source text contains a Syntax Error:
function invalid(
) {
"\7";
"use strict"; }
12.9.4.2 Static Semantics: SV
The syntax-directed operation SV takes no arguments and returns a String.
A string literal stands for a value of the String type. SV produces String values for string literals through recursive application on the various parts of the string literal. As part of this process, some Unicode code points within the string literal are interpreted as having a mathematical value, as described below or in 12.9.3.
Table 32: String Single Character Escape Sequences
Escape Sequence
Code Unit Value
Unicode Character Name
Symbol
|
\b
|
0x0008
|
BACKSPACE
|
<BS>
|
|
\t
|
0x0009
|
CHARACTER TABULATION
|
<HT>
|
|
\n
|
0x000A
|
LINE FEED (LF)
|
<LF>
|
|
\v
|
0x000B
|
LINE TABULATION
|
<VT>
|
|
\f
|
0x000C
|
FORM FEED (FF)
|
<FF>
|
|
\r
|
0x000D
|
CARRIAGE RETURN (CR)
|
<CR>
|
|
\"
|
0x0022
|
QUOTATION MARK
|
"
|
|
\'
|
0x0027
|
APOSTROPHE
|
'
|
|
\\
|
0x005C
|
REVERSE SOLIDUS
|
\
|
12.9.4.3 Static Semantics: MV
12.9.5 Regular Expression Literals
Note 1
A regular expression literal is an input element that is converted to a RegExp object (see 22.2) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp or calling the RegExp constructor as a function (see 22.2.4).
The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the RegularExpressionBody and the RegularExpressionFlags are subsequently parsed again using the more stringent ECMAScript Regular Expression grammar (22.2.1).
An implementation may extend the ECMAScript Regular Expression grammar defined in 22.2.1, but it must not extend the RegularExpressionBody and RegularExpressionFlags productions defined below or the productions used by these productions.
Syntax
RegularExpressionLiteral ::
/
RegularExpressionBody
/
RegularExpressionFlags
RegularExpressionBody ::
RegularExpressionFirstChar
RegularExpressionChars
RegularExpressionChars ::
[empty]
RegularExpressionChars
RegularExpressionChar
RegularExpressionFirstChar ::
RegularExpressionNonTerminator but not one of * or \ or / or [
RegularExpressionBackslashSequence
RegularExpressionClass
RegularExpressionChar ::
RegularExpressionNonTerminator but not one of \ or / or [
RegularExpressionBackslashSequence
RegularExpressionClass
RegularExpressionBackslashSequence ::
\
RegularExpressionNonTerminator
RegularExpressionNonTerminator ::
SourceCharacter but not
LineTerminator
RegularExpressionClass ::
[
RegularExpressionClassChars
]
RegularExpressionClassChars ::
[empty]
RegularExpressionClassChars
RegularExpressionClassChar
RegularExpressionClassChar ::
RegularExpressionNonTerminator but not one of ] or \
RegularExpressionBackslashSequence
RegularExpressionFlags ::
[empty]
RegularExpressionFlags
IdentifierPartChar
Note 2
Regular expression literals may not be empty; instead of representing an empty regular expression literal, the code unit sequence // starts a single-line comment. To specify an empty regular expression, use: /(?:)/.
12.9.5.1 Static Semantics: BodyText
The syntax-directed operation BodyText takes no arguments and returns source text. It is defined piecewise over the following productions:
RegularExpressionLiteral ::
/
RegularExpressionBody
/
RegularExpressionFlags
- Return the source text that was recognized as RegularExpressionBody.
12.9.5.2 Static Semantics: FlagText
The syntax-directed operation FlagText takes no arguments and returns source text. It is defined piecewise over the following productions:
RegularExpressionLiteral ::
/
RegularExpressionBody
/
RegularExpressionFlags
- Return the source text that was recognized as RegularExpressionFlags.
12.9.6 Template Literal Lexical Components
Syntax
Template ::
NoSubstitutionTemplate
TemplateHead
NoSubstitutionTemplate ::
`
TemplateCharactersopt
`
TemplateHead ::
`
TemplateCharactersopt
${
TemplateSubstitutionTail ::
TemplateMiddle
TemplateTail
TemplateMiddle ::
}
TemplateCharactersopt
${
TemplateTail ::
}
TemplateCharactersopt
`
TemplateCharacters ::
TemplateCharacter
TemplateCharactersopt
TemplateCharacter ::
$
[lookahead ≠ {]
\
TemplateEscapeSequence
\
NotEscapeSequence
LineContinuation
LineTerminatorSequence
SourceCharacter but not one of ` or \ or $ or
LineTerminator
TemplateEscapeSequence ::
CharacterEscapeSequence
0
[lookahead ∉
DecimalDigit]
HexEscapeSequence
UnicodeEscapeSequence
NotEscapeSequence ::
0
DecimalDigit
DecimalDigit but not 0
x
[lookahead ∉
HexDigit]
x
HexDigit
[lookahead ∉
HexDigit]
u
[lookahead ∉
HexDigit]
[lookahead ≠ {]
u
HexDigit
[lookahead ∉
HexDigit]
u
HexDigit
HexDigit
[lookahead ∉
HexDigit]
u
HexDigit
HexDigit
HexDigit
[lookahead ∉
HexDigit]
u
{
[lookahead ∉
HexDigit]
u
{
NotCodePoint
[lookahead ∉
HexDigit]
u
{
CodePoint
[lookahead ∉
HexDigit]
[lookahead ≠ }]
NotCodePoint ::
HexDigits[~Sep]
but only if the MV of
HexDigits > 0x10FFFF
CodePoint ::
HexDigits[~Sep]
but only if the MV of
HexDigits ≤ 0x10FFFF
Note
12.9.6.1 Static Semantics: TV
The syntax-directed operation TV takes no arguments and returns a String or undefined. A template literal component is interpreted by TV as a value of the String type. TV is used to construct the indexed components of a template object (colloquially, the template values). In TV, escape sequences are replaced by the UTF-16 code unit(s) of the Unicode code point represented by the escape sequence.
12.9.6.2 Static Semantics: TRV
The syntax-directed operation TRV takes no arguments and returns a String. A template literal component is interpreted by TRV as a value of the String type. TRV is used to construct the raw components of a template object (colloquially, the template raw values). TRV is similar to TV with the difference being that in TRV, escape sequences are interpreted as they appear in the literal.
-
The TRV of
NoSubstitutionTemplate ::
`
`
is the empty String.
-
The TRV of
TemplateHead ::
`
${
is the empty String.
-
The TRV of
TemplateMiddle ::
}
${
is the empty String.
-
The TRV of
TemplateTail ::
}
`
is the empty String.
-
The TRV of
TemplateCharacters ::
TemplateCharacter
TemplateCharacters
is the string-concatenation of the TRV of TemplateCharacter and the TRV of TemplateCharacters.
-
The TRV of
TemplateCharacter :: SourceCharacter but not one of ` or \ or $ or LineTerminator
is the result of performing UTF16EncodeCodePoint on the code point matched by SourceCharacter.
-
The TRV of
TemplateCharacter :: $
is the String value consisting of the code unit 0x0024 (DOLLAR SIGN).
-
The TRV of
TemplateCharacter ::
\
TemplateEscapeSequence
is the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV of TemplateEscapeSequence.
-
The TRV of
TemplateCharacter ::
\
NotEscapeSequence
is the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV of NotEscapeSequence.
-
The TRV of
TemplateEscapeSequence :: 0
is the String value consisting of the code unit 0x0030 (DIGIT ZERO).
-
The TRV of
NotEscapeSequence ::
0
DecimalDigit
is the string-concatenation of the code unit 0x0030 (DIGIT ZERO) and the TRV of DecimalDigit.
-
The TRV of
NotEscapeSequence ::
x
[lookahead ∉ HexDigit]
is the String value consisting of the code unit 0x0078 (LATIN SMALL LETTER X).
-
The TRV of
NotEscapeSequence ::
x
HexDigit
[lookahead ∉ HexDigit]
is the string-concatenation of the code unit 0x0078 (LATIN SMALL LETTER X) and the TRV of HexDigit.
-
The TRV of
NotEscapeSequence ::
u
[lookahead ∉ HexDigit]
[lookahead ≠ {]
is the String value consisting of the code unit 0x0075 (LATIN SMALL LETTER U).
-
The TRV of
NotEscapeSequence ::
u
HexDigit
[lookahead ∉ HexDigit]
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U) and the TRV of HexDigit.
-
The TRV of
NotEscapeSequence ::
u
HexDigit
HexDigit
[lookahead ∉ HexDigit]
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the TRV of the first HexDigit, and the TRV of the second HexDigit.
-
The TRV of
NotEscapeSequence ::
u
HexDigit
HexDigit
HexDigit
[lookahead ∉ HexDigit]
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the TRV of the first HexDigit, the TRV of the second HexDigit, and the TRV of the third HexDigit.
-
The TRV of
NotEscapeSequence ::
u
{
[lookahead ∉ HexDigit]
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U) and the code unit 0x007B (LEFT CURLY BRACKET).
-
The TRV of
NotEscapeSequence ::
u
{
NotCodePoint
[lookahead ∉ HexDigit]
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the code unit 0x007B (LEFT CURLY BRACKET), and the TRV of NotCodePoint.
-
The TRV of
NotEscapeSequence ::
u
{
CodePoint
[lookahead ∉ HexDigit]
[lookahead ≠ }]
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the code unit 0x007B (LEFT CURLY BRACKET), and the TRV of CodePoint.
-
The TRV of
DecimalDigit :: one of 0 1 2 3 4 5 6 7 8 9
is the result of performing UTF16EncodeCodePoint on the single code point matched by this production.
-
The TRV of
CharacterEscapeSequence :: NonEscapeCharacter
is the SV of NonEscapeCharacter.
-
The TRV of
SingleEscapeCharacter :: one of ' " \ b f n r t v
is the result of performing UTF16EncodeCodePoint on the single code point matched by this production.
-
The TRV of
HexEscapeSequence ::
x
HexDigit
HexDigit
is the string-concatenation of the code unit 0x0078 (LATIN SMALL LETTER X), the TRV of the first HexDigit, and the TRV of the second HexDigit.
-
The TRV of
UnicodeEscapeSequence ::
u
Hex4Digits
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U) and the TRV of Hex4Digits.
-
The TRV of
UnicodeEscapeSequence ::
u{
CodePoint
}
is the string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the code unit 0x007B (LEFT CURLY BRACKET), the TRV of CodePoint, and the code unit 0x007D (RIGHT CURLY BRACKET).
-
The TRV of
Hex4Digits ::
HexDigit
HexDigit
HexDigit
HexDigit
is the string-concatenation of the TRV of the first HexDigit, the TRV of the second HexDigit, the TRV of the third HexDigit, and the TRV of the fourth HexDigit.
-
The TRV of
HexDigits ::
HexDigits
HexDigit
is the string-concatenation of the TRV of HexDigits and the TRV of HexDigit.
-
The TRV of
HexDigit :: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
is the result of performing UTF16EncodeCodePoint on the single code point matched by this production.
-
The TRV of
LineContinuation ::
\
LineTerminatorSequence
is the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV of LineTerminatorSequence.
-
The TRV of
LineTerminatorSequence :: <LF>
is the String value consisting of the code unit 0x000A (LINE FEED).
-
The TRV of
LineTerminatorSequence :: <CR>
is the String value consisting of the code unit 0x000A (LINE FEED).
-
The TRV of
LineTerminatorSequence :: <LS>
is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).
-
The TRV of
LineTerminatorSequence :: <PS>
is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).
-
The TRV of
LineTerminatorSequence ::
<CR>
<LF>
is the String value consisting of the code unit 0x000A (LINE FEED).
Note
TV excludes the code units of LineContinuation while TRV includes them. <CR><LF> and <CR> LineTerminatorSequences are normalized to <LF> for both TV and TRV. An explicit TemplateEscapeSequence is needed to include a <CR> or <CR><LF> sequence.
12.10 Automatic Semicolon Insertion
Most ECMAScript statements and declarations must be terminated with a semicolon. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.
12.10.1 Rules of Automatic Semicolon Insertion
In the following rules, “token” means the actual recognized lexical token determined using the current lexical goal symbol as described in clause 12.
There are three basic rules of semicolon insertion:
-
When, as the source text is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
-
The offending token is separated from the previous token by at least one LineTerminator.
-
The offending token is }.
-
The previous token is ) and the inserted semicolon would then be parsed as the terminating semicolon of a do-while statement (14.7.2).
-
When, as the source text is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single instance of the goal nonterminal, then a semicolon is automatically inserted at the end of the input stream.
-
When, as the source text is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation “[no LineTerminator here]” within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 14.7.4).
Note
The following are the only restricted productions in the grammar:
UpdateExpression[Yield, Await] :
LeftHandSideExpression[?Yield, ?Await]
[no
LineTerminator here]
++
LeftHandSideExpression[?Yield, ?Await]
[no
LineTerminator here]
--
ContinueStatement[Yield, Await] :
continue
;
continue
[no
LineTerminator here]
LabelIdentifier[?Yield, ?Await]
;
BreakStatement[Yield, Await] :
break
;
break
[no
LineTerminator here]
LabelIdentifier[?Yield, ?Await]
;
ReturnStatement[Yield, Await] :
return
;
return
[no
LineTerminator here]
Expression[+In, ?Yield, ?Await]
;
ThrowStatement[Yield, Await] :
throw
[no
LineTerminator here]
Expression[+In, ?Yield, ?Await]
;
YieldExpression[In, Await] :
yield
yield
[no
LineTerminator here]
AssignmentExpression[?In, +Yield, ?Await]
yield
[no
LineTerminator here]
*
AssignmentExpression[?In, +Yield, ?Await]
ArrowFunction[In, Yield, Await] :
ArrowParameters[?Yield, ?Await]
[no
LineTerminator here]
=>
ConciseBody[?In]
AsyncFunctionDeclaration[Yield, Await, Default] :
async
[no
LineTerminator here]
function
BindingIdentifier[?Yield, ?Await]
(
FormalParameters[~Yield, +Await]
)
{
AsyncFunctionBody
}
[+Default]
async
[no
LineTerminator here]
function
(
FormalParameters[~Yield, +Await]
)
{
AsyncFunctionBody
}
AsyncFunctionExpression :
async
[no
LineTerminator here]
function
BindingIdentifier[~Yield, +Await]opt
(
FormalParameters[~Yield, +Await]
)
{
AsyncFunctionBody
}
AsyncMethod[Yield, Await] :
async
[no
LineTerminator here]
ClassElementName[?Yield, ?Await]
(
UniqueFormalParameters[~Yield, +Await]
)
{
AsyncFunctionBody
}
AsyncGeneratorDeclaration[Yield, Await, Default] :
async
[no
LineTerminator here]
function
*
BindingIdentifier[?Yield, ?Await]
(
FormalParameters[+Yield, +Await]
)
{
AsyncGeneratorBody
}
[+Default]
async
[no
LineTerminator here]
function
*
(
FormalParameters[+Yield, +Await]
)
{
AsyncGeneratorBody
}
AsyncGeneratorExpression :
async
[no
LineTerminator here]
function
*
BindingIdentifier[+Yield, +Await]opt
(
FormalParameters[+Yield, +Await]
)
{
AsyncGeneratorBody
}
AsyncGeneratorMethod[Yield, Await] :
async
[no
LineTerminator here]
*
ClassElementName[?Yield, ?Await]
(
UniqueFormalParameters[+Yield, +Await]
)
{
AsyncGeneratorBody
}
AsyncArrowFunction[In, Yield, Await] :
async
[no
LineTerminator here]
AsyncArrowBindingIdentifier[?Yield]
[no
LineTerminator here]
=>
AsyncConciseBody[?In]
CoverCallExpressionAndAsyncArrowHead[?Yield, ?Await]
[no
LineTerminator here]
=>
AsyncConciseBody[?In]
AsyncArrowHead :
async
[no
LineTerminator here]
ArrowFormalParameters[~Yield, +Await]
The practical effect of these restricted productions is as follows:
-
When a ++ or -- token is encountered where the parser would treat it as a postfix operator, and at least one LineTerminator occurred between the preceding token and the ++ or -- token, then a semicolon is automatically inserted before the ++ or -- token.
-
When a continue, break, return, throw, or yield token is encountered and a LineTerminator is encountered before the next token, a semicolon is automatically inserted after the continue, break, return, throw, or yield token.
-
When arrow function parameter(s) are followed by a LineTerminator before a => token, a semicolon is automatically inserted and the punctuator causes a syntax error.
-
When an async token is followed by a LineTerminator before a function or IdentifierName or ( token, a semicolon is automatically inserted and the async token is not treated as part of the same expression or class element as the following tokens.
-
When an async token is followed by a LineTerminator before a * token, a semicolon is automatically inserted and the punctuator causes a syntax error.
The resulting practical advice to ECMAScript programmers is:
-
A postfix ++ or -- operator should be on the same line as its operand.
-
An Expression in a return or throw statement or an AssignmentExpression in a yield expression should start on the same line as the return, throw, or yield token.
-
A LabelIdentifier in a break or continue statement should be on the same line as the break or continue token.
-
The end of an arrow function's parameter(s) and its => should be on the same line.
-
The async token preceding an asynchronous function or method should be on the same line as the immediately following token.
12.10.2 Examples of Automatic Semicolon Insertion
This section is non-normative.
The source
{
1 2 }
3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast, the source
{
1
2 }
3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{
1
;
2 ;}
3;
which is a valid ECMAScript sentence.
The source
for (a; b
)
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the semicolon is needed for the header of a for statement. Automatic semicolon insertion never inserts one of the two semicolons in the header of a for statement.
The source
return
a + b
is transformed by automatic semicolon insertion into the following:
return;
a + b;
Note 1
The expression a + b is not treated as a value to be returned by the return statement, because a LineTerminator separates it from the token return.
The source
a = b
++c
is transformed by automatic semicolon insertion into the following:
a = b;
++c;
Note 2
The token ++ is not treated as a postfix operator applying to the variable b, because a LineTerminator occurs between b and ++.
The source
if (a > b)
else c = d
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the else token, even though no production of the grammar applies at that point, because an automatically inserted semicolon would then be parsed as an empty statement.
The source
a = b + c
(d + e).
print()
is not transformed by automatic semicolon insertion, because the parenthesized expression that begins the second line can be interpreted as an argument list for a function call:
a = b +
c(d + e).
print()
In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic semicolon insertion.
12.10.3 Interesting Cases of Automatic Semicolon Insertion
This section is non-normative.
ECMAScript programs can be written in a style with very few semicolons by relying on automatic semicolon insertion. As described above, semicolons are not inserted at every newline, and automatic semicolon insertion can depend on multiple tokens across line terminators.
As new syntactic features are added to ECMAScript, additional grammar productions could be added that cause lines relying on automatic semicolon insertion preceding them to change grammar productions when parsed.
For the purposes of this section, a case of automatic semicolon insertion is considered interesting if it is a place where a semicolon may or may not be inserted, depending on the source text which precedes it. The rest of this section describes a number of interesting cases of automatic semicolon insertion in this version of ECMAScript.
12.10.3.1 Interesting Cases of Automatic Semicolon Insertion in Statement Lists
In a StatementList, many StatementListItems end in semicolons, which may be omitted using automatic semicolon insertion. As a consequence of the rules above, at the end of a line ending an expression, a semicolon is required if the following line begins with any of the following:
- An opening parenthesis ((). Without a semicolon, the two lines together are treated as a CallExpression.
- An opening square bracket ([). Without a semicolon, the two lines together are treated as property access, rather than an ArrayLiteral or ArrayAssignmentPattern.
- A template literal (`). Without a semicolon, the two lines together are interpreted as a tagged Template (13.3.11), with the previous expression as the MemberExpression.
- Unary + or -. Without a semicolon, the two lines together are interpreted as a usage of the corresponding binary operator.
- A RegExp literal. Without a semicolon, the two lines together may be parsed instead as the / MultiplicativeOperator, for example if the RegExp has flags.
12.10.3.2 Cases of Automatic Semicolon Insertion and “[no LineTerminator here]”
This section is non-normative.
ECMAScript contains grammar productions which include “[no LineTerminator here]”. These productions are sometimes a means to have optional operands in the grammar. Introducing a LineTerminator in these locations would change the grammar production of a source text by using the grammar production without the optional operand.
The rest of this section describes a number of productions using “[no LineTerminator here]” in this version of ECMAScript.
12.10.3.2.1 List of Grammar Productions with Optional Operands and “[no LineTerminator here]”