← 返回首页
Syntax Highlight Guide | Visual Studio Code Extension API

Extension API

Topics Overview Your First Extension Extension Anatomy Wrapping Up Overview Common Capabilities Theming Extending Workbench Overview AI Extensibility Language Model Tool MCP Dev Guide Chat Participant Chat Tutorial Language Model Language Model Tutorial Language Model Chat Provider Prompt TSX Command Color Theme File Icon Theme Product Icon Theme Tree View Webview Notebook Custom Editors Virtual Documents Virtual Workspaces Web Extensions Workspace Trust Task Provider Source Control Debugger Extension Markdown Extension Test Extension Custom Data Extension Telemetry Overview Activity Bar Sidebars Panel Status Bar Views Editor Actions Quick Picks Command Palette Notifications Webviews Context Menus Walkthroughs Settings Overview Syntax Highlight Guide Semantic Highlight Guide Snippet Guide Language Configuration Guide Programmatic Language Features Language Server Extension Guide Embedded Languages Testing Extensions Publishing Extensions Bundling Extensions Continuous Integration Extension Host Remote Development and Codespaces Using Proposed API Migrate from TSLint to ESLint Python Extension Template VS Code API Contribution Points Activation Events Extension Manifest Built-In Commands When Clause Contexts Theme Color Product Icon Reference Document Selector

Syntax Highlight Guide

Syntax highlighting determines the color and style of source code displayed in the Visual Studio Code editor. It is responsible for colorizing keywords like if or for in JavaScript differently than strings and comments and variable names.

There are two components to syntax highlighting:

  • Tokenization: Breaking text into a list of tokens
  • Theming: Using themes or user settings to map the tokens to specific colors and styles

Before diving into the details, a good start is to play with the scope inspector tool and explore what tokens are present in a source file and what theme rules they match to. To see both semantic and syntax token, use a built-in theme (for example, Dark+) on a TypeScript file.

Tokenization

The tokenization of text is about breaking the text into segments and to classify each segment with a token type.

VS Code's tokenization engine is powered by TextMate grammars. TextMate grammars are a structured collection of regular expressions and are written as a plist (XML) or JSON files. VS Code extensions can contribute grammars through the grammars contribution point.

The TextMate tokenization engine runs in the same process as the renderer and tokens are updated as the user types. Tokens are used for syntax highlighting, but also to classify the source code into areas of comments, strings, regex.

Starting with release 1.43, VS Code also allows extensions to provide tokenization through a Semantic Token Provider. Semantic providers are typically implemented by language servers that have a deeper understanding of the source file and can resolve symbols in the context of the project. For example, a constant variable name can be rendered using constant highlighting throughout the project, not just at the place of its declaration.

Highlighting based on semantic tokens is considered an addition to the TextMate-based syntax highlighting. Semantic highlighting goes on top of the syntax highlighting. And as language servers can take a while to load and analyze a project, semantic token highlighting may appear after a short delay.

This article focuses on the TextMate-based tokenization. Semantic tokenization and theming are explained in the Semantic Highlighting guide.

TextMate grammars

VS Code uses TextMate grammars as the syntax tokenization engine. Invented for the TextMate editor, they have been adopted by many other editors and IDEs due to large number of language bundles created and maintained by the Open Source community.

TextMate grammars rely on Oniguruma regular expressions and are typically written as a plist or JSON. You can find a good introduction to TextMate grammars here, and you can take a look at existing TextMate grammars to learn more about how they work.

TextMate tokens and scopes

Tokens are one or more characters that are part of the same program element. Example tokens include operators such as + and *, variable names such as myVar, or strings such as "my string".

Each token is associated with a scope that defines the context of the token. A scope is a dot separated list of identifiers that specify the context of the current token. The + operation in JavaScript, for example, has the scope keyword.operator.arithmetic.js.

Themes map scopes to colors and styles to provide syntax highlighting. TextMate provides list of common scopes that many themes target. In order to have your grammar as broadly supported as possible, try to build on existing scopes rather than defining new ones.

Scopes nest so that each token is also associated with a list of parent scopes. The example below uses the scope inspector to show the scope hierarchy for the + operator in a simple JavaScript function. The most specific scope is listed at the top, with more general parent scopes listed below:

Parent scope information is also used for theming. When a theme targets a scope, all tokens with that parent scope will be colorized unless the theme also provides a more specific colorization for their individual scopes.

Configure bracket matching scopes

Some languages include tokens that should not participate in bracket matching, even though they visually resemble brackets.

There are two properties for configuring bracket matching behavior:

  • balancedBracketScopes: defines which scopes participate in bracket matching. By default, all scopes are included.
  • unbalancedBracketScopes: defines scopes that should be excluded from bracket matching.
{ "contributes": { "languages": [ { "id": "abc", "extensions": [".abc"] } ], "grammars": [ { "language": "abc", "scopeName": "source.abc", "path": "./syntaxes/abc.tmGrammar.json" } ] } }

The grammar file itself consists of a top-level rule. This is typically split into a patterns section that lists the top-level elements of the program and a repository that defines each of the elements. Other rules in the grammar can reference elements from the repository using { "include": "#id" }.

The example abc grammar marks the letters a, b, and c as keywords, and nestings of parens as expressions.

a ( b ) x ( ( c xyz ) ) ( a

The example grammar produces the following scopes (listed left-to-right from most specific to least specific scope):

{ "contributes": { "grammars": [ { "path": "./syntaxes/abc.tmLanguage.json", "scopeName": "source.abc", "embeddedLanguages": { "meta.embedded.block.javascript": "javascript" } } ] } }

Now if you try to comment code or trigger snippets inside a set of tokens marked meta.embedded.block.javascript, they will get the correct // JavaScript style comment and the correct JavaScript snippets.

Developing a new grammar extension

To quickly create a new grammar extension, use VS Code's Yeoman templates to run yo code and select the New Language option:

Yeoman will walk you through some basic questions to scaffold the new extension. The important questions for creating a new grammar are:

  • Language id - A unique identifier for your language.
  • Language name - A human readable name for your language.
  • Scope names - Root TextMate scope name for your grammar.

The generator assumes that you want to define both a new language and a new grammar for that language. If you are creating a grammar for an existing language, just fill these in with your target language's information and be sure to delete the languages contribution point in the generated package.json.

After answering all the questions, Yeoman will create a new extension with the structure:

Remember, if you are contributing a grammar to a language that VS Code already knows about, be sure to delete the languages contribution point in the generated package.json.

Converting an existing TextMate grammar

yo code can also help convert an existing TextMate grammar to a VS Code extension. Again, start by running yo code and selecting Language extension. When asked for an existing grammar file, give it the full path to either a .tmLanguage or .json TextMate grammar file:

Using YAML to write a grammar

As a grammar grows more complex, it can become difficult to understand and maintain it as json. If you find yourself writing complex regular expressions or needing to add comments to explain aspects of the grammar, consider using yaml to define your grammar instead.

Yaml grammars have the exact same structure as a json based grammar but allow you to use yaml's more concise syntax, along with features such as multi-line strings and comments.

VS Code can only load json grammars, so yaml based grammars must be converted to json. The js-yaml package and command-line tool makes this easy.

{ "contributes": { "grammars": [ { "path": "./syntaxes/injection.json", "scopeName": "todo-comment.injection", "injectTo": ["source.js"] } ] } }

The grammar itself is a standard TextMate grammar except for the top level injectionSelector entry. The injectionSelector is a scope selector that specifies which scopes the injected grammar should be applied in. For our example, we want to highlight the word TODO in all // comments. Using the scope inspector, we find that JavaScript's double slash comments have the scope comment.line.double-slash, so our injection selector is L:comment.line.double-slash:

{ "contributes": { "grammars": [ { "path": "./syntaxes/injection.json", "scopeName": "sql-string.injection", "injectTo": ["source.js"], "embeddedLanguages": { "meta.embedded.inline.sql": "sql" } } ] } }

Token types and embedded languages

There is one additional complication for injection languages embedded languages: by default, VS Code treats all tokens within a string as string contents and all tokens with a comment as token content. Since features such as bracket matching and auto closing pairs are disabled inside of strings and comments, if the embedded language appears inside a string or comment, these features will also be disabled in the embedded language.

To override this behavior, you can use a meta.embedded.* scope to reset VS Code's marking of tokens as string or comment content. It is a good idea to always wrap embedded language in a meta.embedded.* scope to make sure VS Code treats the embedded language properly.

If you can't add a meta.embedded.* scope to your grammar, you can alternatively use tokenTypes in the grammar's contribution point to map specific scopes to content mode. The tokenTypes section below ensures that any content in the my.sql.template.string scope is treated as source code:

{ "key": "cmd+alt+shift+i", "command": "editor.action.inspectTMScopes" }

The scope inspector displays the following information:

  1. The current token.
  2. Metadata about the token and information about its computed appearance. If you are working with embedded languages, the important entries here language and token type.
  3. The semantic token section is shown when a semantic token provider is available for the current language and when the current theme supports semantic highlighting. It shows the current semantic token type and modifiers along with the theme rules that match the semantic token type and modifiers.
  4. The TextMate section shows the scope list for the current TextMate token, with the most specific scope at the top. It also shows the most specific theme rules that match the scopes. This only shows the theme rules that are responsible for the token's current style, it does not show overridden rules. If semantic tokens are present, the theme rules are only shown when they differ from the rule matching the semantic token.
Copy as Markdown

On this page there are 3 sections