← 返回首页
Customizing Library Models for Python — CodeQL CodeQL docs
CodeQL documentation
CodeQL resources

Customizing Library Models for Python

Beta Notice - Unstable API

Library customization using data extensions is currently in beta and subject to change.

Breaking changes to this format may occur while in beta.

Python analysis can be customized by adding library models in data extension files.

A data extension for Python is a YAML file of the form:

extensions: - addsTo: pack: codeql/python-all extensible: <name of extensible predicate> data: - <tuple1> - <tuple2> - ...

The CodeQL library for Python exposes the following extensible predicates:

We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.

Example: Taint sink in the ‘fabric’ package

In this example, we’ll show how to add the following argument, passed to sudo from the fabric package, as a command-line injection sink:

from fabric.operations import sudo sudo(cmd) # <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: sinkModel data: - ["fabric", "Member[operations].Member[sudo].Argument[0]", "command-injection"]

Example: Taint sink in the ‘invoke’ package

Often sinks are found as arguments to methods rather than functions. In this example, we’ll show how to add the following argument, passed to run from the invoke package, as a command-line injection sink:

import invoke c = invoke.Context() c.run(cmd) # <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: sinkModel data: - ["invoke", "Member[Context].Instance.Member[run].Argument[0]", "command-injection"]

Note that the Instance component is used to select instances of a class, including instances of its subclasses. Since methods on instances are common targets, we have a more compact syntax for selecting them. The first column, the type, is allowed to contain a dotted path ending in a class name. This will begin the search at instances of that class. Using this syntax, the previous example could be written as:

extensions: - addsTo: pack: codeql/python-all extensible: sinkModel data: - ["invoke.Context", "Member[run].Argument[0]", "command-injection"]

Continued example: Multiple ways to obtain a type

The invoke package provides multiple ways to obtain a Context instance. The following example shows how to add a new way to obtain a Context instance:

from invoke import context c = context.Context() c.run(cmd) # <-- add 'cmd' as a taint sink

Comparing to the previous Python snippet, the Context class is now found as invoke.context.Context instead of invoke.Context. We could add a data extension similar to the previous one, but with the type invoke.context.Context. However, we can also use the typeModel(type1, type2, path) extensible predicate to describe how to reach invoke.Context from invoke.context.Context:

extensions: - addsTo: pack: codeql/python-all extensible: typeModel data: - ["invoke.Context", "invoke.context.Context", ""]

Combining this with the sink model we added earlier, the sink in the example is detected by the model.

Example: Taint sources from Django ‘upload_to’ argument

This example is a bit more advanced, involving both a callback function and a class constructor. The Django web framework allows you to specify a function that determines the path where uploaded files are stored (see the Django documentation). This function is passed as an argument to the FileField constructor. The function is called with two arguments: the instance of the model and the filename of the uploaded file. This filename is what we want to mark as a taint source. An example use looks as follows:

from django.db import models def user_directory_path(instance, filename): # <-- add 'filename' as a taint source # file will be uploaded to MEDIA_ROOT/user_<id>/<filename> return "user_{0}/{1}".format(instance.user.id, filename) class MyModel(models.Model): upload = models.FileField(upload_to=user_directory_path) # <-- the 'upload_to' parameter defines our custom function

Note that this source is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the sourceModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: sourceModel data: - [ "django.db.models.FileField!", "Call.Argument[0,upload_to:].Parameter[1]", "remote", ]

Example: Adding flow through ‘re.compile’

In this example, we’ll show how to add flow through calls to re.compile. re.compile returns a compiled regular expression for efficient evaluation, but the pattern to be compiled is stored in the pattern attribute of the resulting object.

import re let y = re.compile(pattern = x); // add value flow from 'x' to 'y.pattern'

Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: summaryModel data: - [ "re", "Member[compile]", "Argument[0,pattern:]", "ReturnValue.Attribute[pattern]", "value", ]

Example: Adding flow through ‘sorted’

In this example, we’ll show how to add flow through calls to the built-in function sorted:

y = sorted(x) # add taint flow from 'x' to 'y'

Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: summaryModel data: - [ "builtins", "Member[sorted]", "Argument[0]", "ReturnValue", "taint", ]

We might also provide a summary stating that the elements of the input list are preserved in the output list:

extensions: - addsTo: pack: codeql/python-all extensible: summaryModel data: - [ "builtins", "Member[sorted]", "Argument[0].ListElement", "ReturnValue.ListElement", "value", ]

The tracking of list elements is imprecise in that the analysis does not know where in the list the tracked value is found. So this summary simply states that if the value is found somewhere in the input list, it will also be found somewhere in the output list, unchanged.

Example: Taint barrier using the ‘escape’ function

In this example, we’ll show how to add the return value of html.escape as a barrier for XSS.

import html escaped = html.escape(unknown) # The return value of this function is safe for XSS.

We need to add a tuple to the barrierModel(type, path, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: barrierModel data: - ["html", "Member[escape].ReturnValue", "html-injection"]

Example: Add a barrier guard

This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. A barrier guard model is used when a function returns a boolean that indicates whether the data is safe to use. Consider the function url_has_allowed_host_and_scheme from the django.utils.http package which returns true when the URL is in a safe domain.

if url_has_allowed_host_and_scheme(url, allowed_hosts=...): # The check guards the use of 'url', so it is safe. redirect(url) # This is safe.

We need to add a tuple to the barrierGuardModel(type, path, acceptingValue, kind) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/python-all extensible: barrierGuardModel data: - [ "django", "Member[utils].Member[http].Member[url_has_allowed_host_and_scheme].Argument[0,url:]", "true", "url-redirection", ]

Reference material

The following sections provide reference material for extensible predicates, access paths, types, and kinds.

Extensible predicates

sourceModel(type, path, kind)

Adds a new taint source. Most taint-tracking queries will use the new source.

Example:

extensions: - addsTo: pack: codeql/python-all extensible: sourceModel data: - ["flask", "Member[request]", "remote"]

sinkModel(type, path, kind)

Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries.

Example:

extensions: - addsTo: pack: codeql/python-all extensible: sinkModel data: - ["builtins", "Member[exec].Argument[0]", "code-injection"]

summaryModel(type, path, input, output, kind)

Adds flow through a function call.

Example:

extensions: - addsTo: pack: codeql/python-all extensible: summaryModel data: - [ "builtins", "Member[reversed]", "Argument[0]", "ReturnValue", "taint", ]

typeModel(type1, type2, path)

A description of how to reach type1 from type2. If this is the only way to reach type1, for instance if type1 is a name we made up to represent the inner workings of a library, we think of this as a definition of type1. In the context of instances, this describes how to obtain an instance of type1 from an instance of type2.

Example:

extensions: - addsTo: pack: codeql/python-all extensible: typeModel data: - [ "flask.Response", "flask", "Member[jsonify].ReturnValue", ]

Types

A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. A type can be defined by adding typeModel tuples for that type. Additionally, the following built-in types are available:

Access paths

The path, input, and output columns consist of a .-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values.

The following components are supported:

Additional notes about the syntax of operands:

Kinds

Source kinds

See documentation below for Threat models.

Sink kinds

Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query.

Summary kinds

Threat models

Note

Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.

A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.

The kind property of the sourceModel determines which threat model a source is associated with. There are two main categories:

Note that subcategories can be turned included or excluded separately, so you can specify local without database, or just commandargs and environment without the rest of local.

The less commonly used categories are:

When running a CodeQL analysis, the remote threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, see Analyzing your code with CodeQL queries and Customizing your advanced setup for code scanning.