← 返回首页
Customizing Library Models for JavaScript — CodeQL CodeQL docs
CodeQL documentation
CodeQL resources

Customizing Library Models for JavaScript

Beta Notice - Unstable API

Library customization using data extensions is currently in beta and subject to change.

Breaking changes to this format may occur while in beta.

JavaScript analysis can be customized by adding library models in data extension files.

A data extension for JavaScript is a YAML file of the form:

extensions: - addsTo: pack: codeql/javascript-all extensible: <name of extensible predicate> data: - <tuple1> - <tuple2> - ...

The CodeQL library for JavaScript exposes the following extensible predicates:

We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.

Example: Taint sink in the ‘execa’ package

In this example, we’ll show how to add the following argument, passed to execa, as a command-line injection sink:

import { shell } from "execa"; shell(cmd); // <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: sinkModel data: - ["execa", "Member[shell].Argument[0]", "command-injection"]

Example: Taint sources from window ‘message’ events

In this example, we’ll show how the event.data expression below could be marked as a remote flow source:

window.addEventListener("message", function (event) { let data = event.data; // <-- add 'event.data' as a taint source });

Note that this source is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the sourceModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: sourceModel data: - [ "global", "Member[addEventListener].Argument[1].Parameter[0].Member[data]", "remote", ]

In the next section, we’ll show how to restrict the model to recognize events of a specific type.

Continued example: Restricting the event type

The model above treats all events as sources of remote flow, not just message events. For example, it would also pick up this irrelevant source:

window.addEventListener("onclick", function (event) { let data = event.data; // <-- 'event.data' became a spurious taint source });

We can refine the model by adding the WithStringArgument component to restrict the set of calls being considered:

extensions: - addsTo: pack: codeql/javascript-all extensible: sourceModel data: - [ "global", "Member[addEventListener].WithStringArgument[0=message].Argument[1].Parameter[0].Member[data]", "remote", ]

The WithStringArgument[0=message] component here selects the subset of calls to addEventListener where the first argument is a string literal with the value "message".

Example: Using types to add MySQL injection sinks

In this example, we’ll show how to add the following SQL injection sink:

import { Connection } from "mysql"; function submit(connection: Connection, q: string) { connection.query(q); // <-- add 'q' as a SQL injection sink }

We need to add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: sinkModel data: - ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"]

This works in this example because the connection parameter has a type annotation that matches what the model is looking for.

Note that there is a significant difference between the following two rows:

data: - ["mysql.Connection", "", ...] - ["mysql", "Member[Connection]", ...]

The first row matches instances of mysql.Connection, which are objects that encapsulate a MySQL connection. The second row would match something like require('mysql').Connection, which is not itself a connection object.

In the next section, we’ll show how to generalize the model to handle the absence of type annotations.

Continued example: Dealing with untyped code

Suppose we want the model from above to detect the sink in this snippet:

import { getConnection } from "@example/db"; let connection = getConnection(); connection.query(q); // <-- add 'q' as a SQL injection sink

There is no type annotation on connection, and there is no indication of what getConnection() returns. By adding a tuple to the typeModel(type1, type2, path) extensible predicate we can tell our model that this function returns an instance of mysql.Connection:

extensions: - addsTo: pack: codeql/javascript-all extensible: typeModel data: - ["mysql.Connection", "@example/db", "Member[getConnection].ReturnValue"]

The new model states that the return value of getConnection() has type mysql.Connection. Combining this with the sink model we added earlier, the sink in the example is detected by the model.

The mechanism used here is how library models work for both TypeScript and plain JavaScript. A good library model contains typeModel tuples to ensure it works even in codebases without type annotations. For example, the mysql model that is included with the CodeQL JS analysis includes this type definition (among many others):

- ["mysql.Connection", "mysql", "Member[createConnection].ReturnValue"]

Example: Using fuzzy models to simplify modeling

In this example, we’ll show how to add the following SQL injection sink using a “fuzzy” model:

import * as mysql from 'mysql'; const pool = mysql.createPool({...}); pool.getConnection((err, conn) => { conn.query(q, (err, rows) => {...}); // <-- add 'q' as a SQL injection sink });

We need to add a tuple for a fuzzy model to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: sinkModel data: - ["mysql", "Fuzzy.Member[query].Argument[0]", "sql-injection"]

For reference, a more detailed model might look like this, as described in the preceding examples:

extensions: - addsTo: pack: codeql/javascript-all extensible: sinkModel data: - ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"] - addsTo: pack: codeql/javascript-all extensible: typeModel data: - ["mysql.Pool", "mysql", "Member[createPool].ReturnValue"] - ["mysql.Connection", "mysql.Pool", "Member[getConnection].Argument[0].Parameter[1]"]

The model using the Fuzzy component is simpler, at the cost of being approximate. This technique is useful when modeling a large or complex library, where it is difficult to write a detailed model.

Example: Adding flow through ‘decodeURIComponent’

In this example, we’ll show how to add flow through calls to decodeURIComponent:

let y = decodeURIComponent(x); // add taint flow from 'x' to 'y'

Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: summaryModel data: - [ "global", "Member[decodeURIComponent]", "Argument[0]", "ReturnValue", "taint", ]

Example: Adding flow through ‘underscore.forEach’

In this example, we’ll show how to add flow through calls to forEach from the underscore package:

require('underscore').forEach([x, y], (v) => { ... }); // add value flow from 'x' and 'y' to 'v'

Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: summaryModel data: - [ "underscore", "Member[forEach]", "Argument[0].ArrayElement", "Argument[1].Parameter[0]", "value", ]

Example: Modeling properties injected by a middleware function

In this example, we’ll show how to model a hypothetical middleware function that adds a tainted value on the incoming request objects:

const express = require('express') const app = express() app.use(require('@example/middleware').injectData()) app.get('/foo', (req, res) => { req.data; // <-- mark 'req.data' as a taint source });

We need to add a tuple to the sourceModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: sourceModel data: - [ "@example/middleware", "Member[injectData].ReturnValue.GuardedRouteHandler.Parameter[0].Member[data]", "remote", ]

Example: Taint barrier using the ‘encodeURIComponent’ function

In this example, we’ll show how to add the return value of encodeURIComponent as a barrier for XSS.

let escaped = encodeURIComponent(input); // The return value of this method is safe for XSS. document.body.innerHTML = escaped;

We need to add a tuple to the barrierModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: barrierModel data: - ["global", "Member[encodeURIComponent].ReturnValue", "html-injection"]

Example: Add a barrier guard

This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. Consider a function called isValid which returns true when the data is considered safe.

if (isValid(userInput)) { // The check guards the use, so the input is safe. db.query(userInput); // This is safe. }

We need to add a tuple to the barrierGuardModel(type, path, acceptingValue, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/javascript-all extensible: barrierGuardModel data: - ["my-package", "Member[isValid].Argument[0]", "true", "sql-injection"]

Reference material

The following sections provide reference material for extensible predicates, access paths, types, and kinds.

Extensible predicates

sourceModel(type, path, kind)

Adds a new taint source. Most taint-tracking queries will use the new source.

Example:

extensions: - addsTo: pack: codeql/javascript-all extensible: sourceModel data: - ["global", "Member[user].Member[name]", "remote"]

sinkModel(type, path, kind)

Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries.

Example:

extensions: - addsTo: pack: codeql/javascript-all extensible: sinkModel data: - ["global", "Member[eval].Argument[0]", "code-injection"]

summaryModel(type, path, input, output, kind)

Adds flow through a function call.

Example:

extensions: - addsTo: pack: codeql/javascript-all extensible: summaryModel data: - [ "global", "Member[decodeURIComponent]", "Argument[0]", "ReturnValue", "taint", ]

typeModel(type1, type2, path)

Adds a new definition of a type.

Example:

extensions: - addsTo: pack: codeql/javascript-all extensible: typeModel data: - [ "mysql.Connection", "@example/db", "Member[getConnection].ReturnValue", ]

Types

A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. A type can be defined by adding typeModel tuples for that type. Additionally, the following built-in types are available:

Access paths

The path, input, and output columns consist of a .-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values.

The following components are supported:

The following components are called “call site filters”. They select a subset of the previously-selected calls, if the call fits certain criteria:

Components related to decorators:

Additionally there is a component related to middleware functions:

Additional notes about the syntax of operands:

Kinds

Source kinds

See also Threat models.

Sink kinds

Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query.

Summary kinds

Threat models

Note

Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.

A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.

The kind property of the sourceModel determines which threat model a source is associated with. There are two main categories:

Note that subcategories can be turned included or excluded separately, so you can specify local without database, or just commandargs and environment without the rest of local.

The less commonly used categories are:

When running a CodeQL analysis, the remote threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, see Analyzing your code with CodeQL queries and Customizing your advanced setup for code scanning.