← 返回首页
Customizing library models for C# — CodeQL CodeQL docs
CodeQL documentation
CodeQL resources

Customizing library models for C#

You can model the methods and callables that control data flow in any framework or library. This is especially useful for custom frameworks or niche libraries, that are not supported by the standard CodeQL libraries.

Beta Notice - Unstable API

Library customization using data extensions is currently in beta and subject to change.

Breaking changes to this format may occur while in beta.

About this article

This article contains reference material about how to define custom models for sources, sinks, and flow summaries for C# dependencies in data extension files.

About data extensions

You can customize analysis by defining models (summaries, sinks, and sources) of your code’s C#/.NET dependencies in data extension files. Each model defines the behavior of one or more elements of your library or framework, such as methods, properties, and callables. When you run dataflow analysis, these models expand the potential sources and sinks tracked by dataflow analysis and improve the precision of results.

Most of the security queries search for paths from a source of untrusted input to a sink that represents a vulnerability. This is known as taint tracking. Each source is a starting point for dataflow analysis to track tainted data and each sink is an end point.

Taint tracking queries also need to know how data can flow through elements that are not included in the source code. These are modeled as summaries. A summary model enables queries to synthesize the flow behavior through elements in dependency code that is not stored in your repository.

Syntax used to define an element in an extension file

Each model of an element is defined using a data extension where each tuple constitutes a model. A data extension file to extend the standard C# queries included with CodeQL is a YAML file with the form:

extensions: - addsTo: pack: codeql/csharp-all extensible: <name of extensible predicate> data: - <tuple1> - <tuple2> - ...

Each YAML file may contain one or more top-level extensions.

Data extensions use union semantics, which means that the tuples of all extensions for a single extensible predicate are combined, duplicates are removed, and all of the remaining tuples are queryable by referencing the extensible predicate.

Publish data extension files in a CodeQL model pack to share

You can group one or more data extension files into a CodeQL model pack and publish it to the GitHub Container Registry. This makes it easy for anyone to download the model pack and use it to extend their analysis. For more information, see Creating a CodeQL model pack and Publishing and using CodeQL packs in the CodeQL CLI documentation.

Extensible predicates used to create custom models in C#

The CodeQL library for C# analysis exposes the following extensible predicates:

The extensible predicates are populated using the models defined in data extension files.

Examples of custom model definitions

The examples in this section are taken from the standard CodeQL C# query pack published by GitHub. They demonstrate how to add tuples to extend extensible predicates that are used by the standard queries.

Example: Taint sink in the System.Data.SqlClient namespace

This example shows how the C# query pack models the argument of the SqlCommand constructor as a SQL injection sink. This is the constructor of the SqlCommand class, which is located in the System.Data.SqlClient namespace.

public static void TaintSink(SqlConnection conn, string query) { SqlCommand command = new SqlCommand(query, connection) // The argument to this method is a SQL injection sink. ... }

We need to add a tuple to the sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/csharp-all extensible: sinkModel data: - ["System.Data.SqlClient", "SqlCommand", False, "SqlCommand", "(System.String,System.Data.SqlClient.SqlConnection)", "", "Argument[0]", "sql-injection", "manual"]

The first five values identify the callable (in this case a method) to be modeled as a sink.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the kind, and the provenance (origin) of the sink.

Example: Taint source from the System.Net.Sockets namespace

This example shows how the C# query pack models the return value from the GetStream method as a remote source. This is the GetStream method in the TcpClient class, which is located in the System.Net.Sockets namespace.

public static void Tainted(TcpClient client) { NetworkStream stream = client.GetStream(); // The return value of this method is a remote source of taint. ... }

We need to add a tuple to the sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/csharp-all extensible: sourceModel data: - ["System.Net.Sockets", "TcpClient", False, "GetStream", "()", "", "ReturnValue", "remote", "manual"]

The first five values identify the callable (in this case a method) to be modeled as a source.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the kind, and the provenance (origin) of the source.

Example: Add flow through the Concat method

This example shows how the C# query pack models flow through a method for a simple case. This pattern covers many of the cases where we need to summarize flow through a method that is stored in a library or framework outside the repository.

public static void TaintFlow(string s1, string s2) { string t = String.Concat(s1, s2); // There is taint flow from s1 and s2 to t. ... }

We need to add tuples to the summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file:

extensions: - addsTo: pack: codeql/csharp-all extensible: summaryModel data: - ["System", "String", False, "Concat", "(System.Object,System.Object)", "", "Argument[0]", "ReturnValue", "taint", "manual"] - ["System", "String", False, "Concat", "(System.Object,System.Object)", "", "Argument[1]", "ReturnValue", "taint", "manual"]

Each tuple defines flow from one argument to the return value. The first row defines flow from the first argument (s1 in the example) to the return value (t in the example) and the second row defines flow from the second argument (s2 in the example) to the return value (t in the example).

The first five values identify the callable (in this case a method) to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the kind, and the provenance (origin) of the summary.

It would also be possible to merge the two rows into one by using a comma-separated list in the seventh value. This would be useful if the method has many arguments and the flow is the same for all of them.

extensions: - addsTo: pack: codeql/csharp-all extensible: summaryModel data: - ["System", "String", False, "Concat", "(System.Object,System.Object)", "", "Argument[0,1]", "ReturnValue", "taint", "manual"]

This row defines flow from both the first and the second argument to the return value. The seventh value Argument[0,1] is shorthand for specifying an access path to both Argument[0] and Argument[1].

Example: Add flow through the Trim method

This example shows how the C# query pack models flow through a method for a simple case.

public static void TaintFlow(string s) { string t = s.Trim(); // There is taint flow from s to t. ... }

We need to add a tuple to the summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file:

extensions: - addsTo: pack: codeql/csharp-all extensible: summaryModel data: - ["System", "String", False, "Trim", "()", "", "Argument[this]", "ReturnValue", "taint", "manual"]

Each tuple defines flow from one argument to the return value. The first row defines flow from the qualifier of the method call (s1 in the example) to the return value (t in the example).

The first five values identify the callable (in this case a method) to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the kind, and the provenance (origin) of the summary.

Example: Add flow through the Select method

This example shows how the C# query pack models a more complex flow through a method. Here we model flow through higher order methods and collection types, as well as how to handle extension methods and generics.

public static void TaintFlow(IEnumerable<string> stream) { IEnumerable<string> lines = stream.Select(item => item + "\n"); ... }

We need to add tuples to the summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file:

extensions: - addsTo: pack: codeql/csharp-all extensible: summaryModel data: - ["System.Linq", "Enumerable", False, "Select<TSource,TResult>", "(System.Collections.Generic.IEnumerable<TSource>,System.Func<TSource,TResult>)", "", "Argument[0].Element", "Argument[1].Parameter[0]", "value", "manual"] - ["System.Linq", "Enumerable", False, "Select<TSource,TResult>", "(System.Collections.Generic.IEnumerable<TSource>,System.Func<TSource,TResult>)", "", "Argument[1].ReturnValue", "ReturnValue.Element", "value", "manual"]

Each tuple defines part of the flow that comprises the total flow through the Select method. The first five values identify the callable (in this case a method) to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the kind, and the provenance (origin) of the summary definition.

For the first row:

For the second row:

For the remaining values for both rows:

That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided to Select. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned from Select.

Example: Add a barrier for the RawUrl property

This example shows how we can model a property as a barrier for a specific kind of query. A barrier model is used to define that the flow of taint stops at the modeled element for the specified kind of query. Here we model the getter of the RawUrl property of the HttpRequest class as a barrier for URL redirection queries. The RawUrl property returns the raw URL of the current request, which is considered safe for URL redirects because it is the URL of the current request and cannot be manipulated by an attacker.

public static void TaintBarrier(HttpRequest request) { string url = request.RawUrl; // The return value of this property is considered safe for URL redirects. Response.Redirect(url); // This is not a URL redirection vulnerability. }

We need to add a tuple to the barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/csharp-all extensible: barrierModel data: - ["System.Web", "HttpRequest", False, "get_RawUrl", "()", "", "ReturnValue", "url-redirection", "manual"]

The first five values identify the callable (in this case the getter of a property) to be modeled as a barrier.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the kind, and the provenance (origin) of the barrier.

Example: Add a barrier guard for the IsAbsoluteUri property

This example shows how we can model a property as a barrier guard for a specific kind of query. A barrier guard model is used to stop the flow of taint when a conditional check is performed on data. Here we model the getter of the IsAbsoluteUri property of the Uri class as a barrier guard for URL redirection queries. When the IsAbsoluteUri property returns false, the URL is relative and therefore safe for URL redirects because it cannot redirect to an external site controlled by an attacker.

public static void TaintBarrierGuard(Uri uri) { if (!uri.IsAbsoluteUri) { // The check guards the redirect, so the URL is safe. Response.Redirect(uri.ToString()); // This is not a URL redirection vulnerability. } }

We need to add a tuple to the barrierGuardModel(namespace, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/csharp-all extensible: barrierGuardModel data: - ["System", "Uri", False, "get_IsAbsoluteUri", "()", "", "Argument[this]", "false", "url-redirection", "manual"]

The first five values identify the callable (in this case the getter of a property) to be modeled as a barrier guard.

The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the access-path, the accepting-value, the kind, and the provenance (origin) of the barrier guard.

Example: Add a neutral method

This example shows how we can model a method as being neutral with respect to flow. We will also cover how to model a property by modeling the getter of the Now property of the DateTime class as neutral. A neutral model is used to define that there is no flow through a method.

public static void TaintFlow() { System.DateTime t = System.DateTime.Now; // There is no flow from Now to t. ... }

We need to add a tuple to the neutralModel(namespace, type, name, signature, kind, provenance) extensible predicate by updating a data extension file.

extensions: - addsTo: pack: codeql/csharp-all extensible: neutralModel data: - ["System", "DateTime", "get_Now", "()", "summary", "manual"]

The first four values identify the callable (in this case the getter of the Now property) to be modeled as a neutral, the fifth value is the kind, and the sixth value is the provenance (origin) of the neutral.

Threat models

Note

Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.

A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.

The kind property of the sourceModel determines which threat model a source is associated with. There are two main categories:

Note that subcategories can be turned included or excluded separately, so you can specify local without database, or just commandargs and environment without the rest of local.

The less commonly used categories are:

When running a CodeQL analysis, the remote threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, see Analyzing your code with CodeQL queries and Customizing your advanced setup for code scanning.