diff --git a/cpp/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/cpp/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..30f0092a4e95 --- /dev/null +++ b/cpp/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for C and C++](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-cpp/). diff --git a/csharp/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/csharp/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..6408acc7dae8 --- /dev/null +++ b/csharp/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for C#](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-csharp/). diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-actions.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-actions.rst index 2bf452b5a90b..0b78b37359f4 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-actions.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-actions.rst @@ -24,26 +24,26 @@ The CodeQL library for GitHub Actions exposes the following extensible predicate Customizing data flow and taint tracking: -- **actionsSourceModel**\(action, version, output, kind, provenance) -- **actionsSinkModel**\(action, version, input, kind, provenance) -- **actionsSummaryModel**\(action, version, input, output, kind, provenance) +- ``actionsSourceModel(action, version, output, kind, provenance)`` +- ``actionsSinkModel(action, version, input, kind, provenance)`` +- ``actionsSummaryModel(action, version, input, output, kind, provenance)`` Customizing Actions-specific analysis: -- **argumentInjectionSinksDataModel**\(regexp, command_group, argument_group) -- **contextTriggerDataModel**\(trigger, context_prefix) -- **externallyTriggerableEventsDataModel**\(event) -- **immutableActionsDataModel**\(action) -- **poisonableActionsDataModel**\(action) -- **poisonableCommandsDataModel**\(regexp) -- **poisonableLocalScriptsDataModel**\(regexp, group) -- **repositoryDataModel**\(visibility, default_branch_name) -- **trustedActionsOwnerDataModel**\(owner) -- **untrustedEventPropertiesDataModel**\(property, kind) -- **untrustedGhCommandDataModel**\(cmd_regex, flag) -- **untrustedGitCommandDataModel**\(cmd_regex, flag) -- **vulnerableActionsDataModel**\(action, vulnerable_version, vulnerable_sha, fixed_version) -- **workflowDataModel**\(path, trigger, job, secrets_source, permissions, runner) +- ``argumentInjectionSinksDataModel(regexp, command_group, argument_group)`` +- ``contextTriggerDataModel(trigger, context_prefix)`` +- ``externallyTriggerableEventsDataModel(event)`` +- ``immutableActionsDataModel(action)`` +- ``poisonableActionsDataModel(action)`` +- ``poisonableCommandsDataModel(regexp)`` +- ``poisonableLocalScriptsDataModel(regexp, group)`` +- ``repositoryDataModel(visibility, default_branch_name)`` +- ``trustedActionsOwnerDataModel(owner)`` +- ``untrustedEventPropertiesDataModel(property, kind)`` +- ``untrustedGhCommandDataModel(cmd_regex, flag)`` +- ``untrustedGitCommandDataModel(cmd_regex, flag)`` +- ``vulnerableActionsDataModel(action, vulnerable_version, vulnerable_sha, fixed_version)`` +- ``workflowDataModel(path, trigger, job, secrets_source, permissions, runner)`` Examples of custom model definitions ------------------------------------ @@ -62,9 +62,9 @@ To allow any Action from the publisher ``octodemo``, such as ``octodemo/3rd-part .. code-block:: yaml extensions: - - addsTo: + - addsTo: pack: codeql/actions-all - extensible: trustedActionsOwnerDataModel + extensible: trustedActionsOwnerDataModel data: - ["octodemo"] diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-cpp.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-cpp.rst index 29e8be5a4ae4..7ca619632272 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-cpp.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-cpp.rst @@ -58,6 +58,8 @@ The CodeQL library for CPP analysis exposes the following extensible predicates: - ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." - ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable. - ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. +- ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model barriers, which are elements that stop the flow of taint. +- ``barrierGuardModel(namespace, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)``. This is used to model barrier guards, which are elements that can stop the flow of taint depending on a conditional check. The extensible predicates are populated using the models defined in data extension files. @@ -75,7 +77,7 @@ This example shows how the CPP query pack models the return value from the ``rea boost::asio::read_until(socket, recv_buffer, '\0', error); -We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -86,12 +88,11 @@ We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, data: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"] -Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the callable (in this case a free function) to be modeled as a source. - The first value ``"boost::asio"`` is the namespace name. -- The second value ``""`` is the name of the type (class) that contains the method. Because we're modelling a free function, the type is left blank. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. For a free function, this should be ``False``. +- The second value ``""`` is the name of the type (class) that contains the method. Because we're modeling a free function, the type is left blank. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. For a free function, this should be ``False``. - The fourth value ``"read_until"`` is the function name. - The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name. In this case, we want the model to include all functions in ``boost::asio`` called ``read_until``. @@ -111,7 +112,7 @@ This example shows how the CPP query pack models the second argument of the ``bo boost::asio::write(socket, send_buffer, error); -We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -122,12 +123,11 @@ We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, si data: - ["boost::asio", "", False, "write", "", "", "Argument[*1]", "remote-sink", "manual"] -Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. The first five values identify the callable (in this case a free function) to be modeled as a sink. - The first value ``"boost::asio"`` is the namespace name. -- The second value ``""`` is the name of the type (class) that contains the method. Because we're modelling a free function, the type is left blank. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. For a free function, this should be ``False``. +- The second value ``""`` is the name of the type (class) that contains the method. Because we're modeling a free function, the type is left blank. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. For a free function, this should be ``False``. - The fourth value ``"write"`` is the function name. - The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name. In this case, we want the model to include all functions in ``boost::asio`` called ``write``. @@ -147,7 +147,7 @@ This example shows how the CPP query pack models flow through a function for a s boost::asio::write(socket, boost::asio::buffer(send_str), error); -We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -158,13 +158,11 @@ We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, data: - ["boost::asio", "", False, "buffer", "", "", "Argument[*0]", "ReturnValue", "taint", "manual"] -Since we are adding flow through a function, we need to add tuples to the ``summaryModel`` extensible predicate. - The first five values identify the callable (in this case free function) to be modeled as a summary. - The first value ``"boost::asio"`` is the namespace name. -- The second value ``""`` is the name of the type (class) that contains the method. Because we're modelling a free function, the type is left blank. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. For a free function, this should be ``False``. +- The second value ``""`` is the name of the type (class) that contains the method. Because we're modeling a free function, the type is left blank. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. For a free function, this should be ``False``. - The fourth value ``"buffer"`` is the function name. - The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name. In this case, we want the model to include all functions in ``boost::asio`` called ``buffer``. @@ -176,6 +174,86 @@ The remaining values are used to define the input and output specifications, the - The ninth value ``"taint"`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``"manual"`` is the provenance of the summary, which is used to identify the origin of the summary model. +Example: Taint barrier using the ``mysql_real_escape_string`` function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This example shows how the CPP query pack models the ``mysql_real_escape_string`` function as a barrier for SQL injection. +This function escapes special characters in a string for use in an SQL statement, which prevents SQL injection attacks. + +.. code-block:: cpp + + char *query = "SELECT * FROM users WHERE name = '%s'"; + char *name = get_untrusted_input(); + char *escaped_name = new char[2 * strlen(name) + 1]; + mysql_real_escape_string(mysql, escaped_name, name, strlen(name)); // The escaped_name is safe for SQL injection. + sprintf(query_buffer, query, escaped_name); + +We need to add a tuple to the ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/cpp-all + extensible: barrierModel + data: + - ["", "", False, "mysql_real_escape_string", "", "", "Argument[*1]", "sql-injection", "manual"] + +The first five values identify the callable (in this case a free function) to be modeled as a barrier. + +- The first value ``""`` is the namespace name. +- The second value ``""`` is the name of the type (class) that contains the method. Because we're modeling a free function, the type is left blank. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. For a free function, this should be ``False``. +- The fourth value ``"mysql_real_escape_string"`` is the function name. +- The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the output specification, the ``kind``, and the ``provenance`` (origin) of the barrier. + +- The seventh value ``"Argument[*1]"`` is the output specification, which means in this case that the barrier is the first indirection (or pointed-to value, ``*``) of the second argument (``Argument[1]``) passed to the function. +- The eighth value ``"sql-injection"`` is the kind of the barrier. The barrier kind is used to define the queries where the barrier is in scope. +- The ninth value ``"manual"`` is the provenance of the barrier, which is used to identify the origin of the barrier model. + +Example: Add a barrier guard +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. +A barrier guard model is used when a function returns a boolean that indicates whether the data is safe to use. +Consider a function called ``is_safe`` which returns ``true`` when the data is considered safe. + +.. code-block:: cpp + + if (is_safe(user_input)) { // The check guards the use, so the input is safe. + mysql_query(user_input); // This is safe. + } + +We need to add a tuple to the ``barrierGuardModel(namespace, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/cpp-all + extensible: barrierGuardModel + data: + - ["", "", False, "is_safe", "", "", "Argument[*0]", "true", "sql-injection", "manual"] + +The first five values identify the callable (in this case a free function) to be modeled as a barrier guard. + +- The first value ``""`` is the namespace name. +- The second value ``""`` is the name of the type (class) that contains the method. Because we're modeling a free function, the type is left blank. +- The third value ``False`` is a flag that indicates whether or not the model guard also applies to all overrides of the method. For a free function, this should be ``False``. +- The fourth value ``"is_safe"`` is the function name. +- The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the input specification, the ``accepting-value``, the ``kind``, and the ``provenance`` (origin) of the barrier guard. + +- The seventh value ``Argument[*0]`` is the input specification (the value being validated). In this case, the first indirection (or pointed-to value, ``*``) of the first argument (``Argument[0]``) passed to the function. +- The eighth value ``true`` is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. +- The ninth value ``sql-injection`` is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope. +- The tenth value ``manual`` is the provenance of the barrier guard, which is used to identify the origin of the barrier guard. + .. _threat-models-cpp: Threat models diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-csharp.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-csharp.rst index 39b5ee30ee47..a4b0e26d1bc8 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-csharp.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-csharp.rst @@ -58,6 +58,8 @@ The CodeQL library for C# analysis exposes the following extensible predicates: - ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." - ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable. - ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. +- ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model barriers, which are elements that stop the flow of taint. +- ``barrierGuardModel(namespace, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)``. This is used to model barrier guards, which are elements that can stop the flow of taint depending on a conditional check. - ``neutralModel(namespace, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) can be used to override generated summary models (those with a provenance such as ``df-generated``), so that the summary model will be ignored. Other than that, neutral models have no effect. The extensible predicates are populated using the models defined in data extension files. @@ -91,19 +93,18 @@ We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, si data: - ["System.Data.SqlClient", "SqlCommand", False, "SqlCommand", "(System.String,System.Data.SqlClient.SqlConnection)", "", "Argument[0]", "sql-injection", "manual"] -Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. The first five values identify the callable (in this case a method) to be modeled as a sink. - The first value ``System.Data.SqlClient`` is the namespace name. - The second value ``SqlCommand`` is the name of the class (type) that contains the method. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``SqlCommand`` is the method name. Constructors are named after the class. - The fifth value ``(System.String,System.Data.SqlClient.SqlConnection)`` is the method input type signature. The type names must be fully qualified. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the sink. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the sink. -- The seventh value ``Argument[0]`` is the ``access path`` to the first argument passed to the method, which means that this is the location of the sink. +- The seventh value ``Argument[0]`` is the ``access-path`` to the first argument passed to the method, which means that this is the location of the sink. - The eighth value ``sql-injection`` is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries. - The ninth value ``manual`` is the provenance of the sink, which is used to identify the origin of the sink. @@ -119,7 +120,7 @@ This is the ``GetStream`` method in the ``TcpClient`` class, which is located in ... } -We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -130,18 +131,16 @@ We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, data: - ["System.Net.Sockets", "TcpClient", False, "GetStream", "()", "", "ReturnValue", "remote", "manual"] - -Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the callable (in this case a method) to be modeled as a source. - The first value ``System.Net.Sockets`` is the namespace name. - The second value ``TcpClient`` is the name of the class (type) that contains the source. -- The third value ``False`` is a flag that indicates whether or not the source also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``GetStream`` is the method name. - The fifth value ``()`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the source. - The seventh value ``ReturnValue`` is the access path to the return of the method, which means that it is the return value that should be considered a source of tainted input. - The eighth value ``remote`` is the kind of the source. The source kind is used to define the threat model where the source is in scope. ``remote`` applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses ``remote`` sources. For more information, see ":ref:`Threat models `." @@ -159,7 +158,7 @@ This pattern covers many of the cases where we need to summarize flow through a ... } -We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -171,7 +170,6 @@ We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, - ["System", "String", False, "Concat", "(System.Object,System.Object)", "", "Argument[0]", "ReturnValue", "taint", "manual"] - ["System", "String", False, "Concat", "(System.Object,System.Object)", "", "Argument[1]", "ReturnValue", "taint", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. The first row defines flow from the first argument (``s1`` in the example) to the return value (``t`` in the example) and the second row defines flow from the second argument (``s2`` in the example) to the return value (``t`` in the example). @@ -180,12 +178,12 @@ These are the same for both of the rows above as we are adding two summaries for - The first value ``System`` is the namespace name. - The second value ``String`` is the class (type) name. -- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``Concat`` is the method name. - The fifth value ``(System.Object,System.Object)`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument (``s1`` in the example) and ``Argument[1]`` is the access path to the second argument (``s2`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. @@ -216,7 +214,7 @@ This example shows how the C# query pack models flow through a method for a simp ... } -We need to add a tuple to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add a tuple to the ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -227,7 +225,6 @@ We need to add a tuple to the ``summaryModel``\(namespace, type, subtypes, name, data: - ["System", "String", False, "Trim", "()", "", "Argument[this]", "ReturnValue", "taint", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. The first row defines flow from the qualifier of the method call (``s1`` in the example) to the return value (``t`` in the example). @@ -236,12 +233,12 @@ These are the same for both of the rows above as we are adding two summaries for - The first value ``System`` is the namespace name. - The second value ``String`` is the class (type) name. -- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``Trim`` is the method name. - The fifth value ``()`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[this]`` is the access path to the qualifier (``s`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. @@ -260,7 +257,7 @@ Here we model flow through higher order methods and collection types, as well as ... } -We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -272,20 +269,18 @@ We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, - ["System.Linq", "Enumerable", False, "Select", "(System.Collections.Generic.IEnumerable,System.Func)", "", "Argument[0].Element", "Argument[1].Parameter[0]", "value", "manual"] - ["System.Linq", "Enumerable", False, "Select", "(System.Collections.Generic.IEnumerable,System.Func)", "", "Argument[1].ReturnValue", "ReturnValue.Element", "value", "manual"] - -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines part of the flow that comprises the total flow through the ``Select`` method. The first five values identify the callable (in this case a method) to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method. - The first value ``System.Linq`` is the namespace name. - The second value ``Enumerable`` is the class (type) name. -- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``Select`` is the method name, along with the type parameters for the method. The names of the generic type parameters provided in the model must match the names of the generic type parameters in the method signature in the source code. - The fifth value ``(System.Collections.Generic.IEnumerable,System.Func)`` is the method input type signature. The generics in the signature must match the generics in the method signature in the source code. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary definition. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary definition. - The seventh value is the access path to the ``input`` (where data flows from). - The eighth value is the access path to the ``output`` (where data flows to). @@ -307,6 +302,88 @@ For the remaining values for both rows: That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided to ``Select``. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned from ``Select``. +Example: Add a barrier for the ``RawUrl`` property +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how we can model a property as a barrier for a specific kind of query. +A barrier model is used to define that the flow of taint stops at the modeled element for the specified kind of query. +Here we model the getter of the ``RawUrl`` property of the ``HttpRequest`` class as a barrier for URL redirection queries. +The ``RawUrl`` property returns the raw URL of the current request, which is considered safe for URL redirects because it is the URL of the current request and cannot be manipulated by an attacker. + +.. code-block:: csharp + + public static void TaintBarrier(HttpRequest request) { + string url = request.RawUrl; // The return value of this property is considered safe for URL redirects. + Response.Redirect(url); // This is not a URL redirection vulnerability. + } + +We need to add a tuple to the ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/csharp-all + extensible: barrierModel + data: + - ["System.Web", "HttpRequest", False, "get_RawUrl", "()", "", "ReturnValue", "url-redirection", "manual"] + +The first five values identify the callable (in this case the getter of a property) to be modeled as a barrier. + +- The first value ``System.Web`` is the namespace name. +- The second value ``HttpRequest`` is the class (type) name. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. +- The fourth value ``get_RawUrl`` is the method name. Getter and setter methods are named ``get_`` and ``set_`` respectively. +- The fifth value ``()`` is the method input type signature. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the barrier. + +- The seventh value ``ReturnValue`` is the access path to the return value of the property getter, which means that the return value is considered safe. +- The eighth value ``url-redirection`` is the kind of the barrier. The barrier kind is used to define the queries where the barrier is in scope. In this case - the URL redirection queries. +- The ninth value ``manual`` is the provenance of the barrier, which is used to identify the origin of the barrier. + +Example: Add a barrier guard for the ``IsAbsoluteUri`` property +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how we can model a property as a barrier guard for a specific kind of query. +A barrier guard model is used to stop the flow of taint when a conditional check is performed on data. +Here we model the getter of the ``IsAbsoluteUri`` property of the ``Uri`` class as a barrier guard for URL redirection queries. +When the ``IsAbsoluteUri`` property returns ``false``, the URL is relative and therefore safe for URL redirects because it cannot redirect to an external site controlled by an attacker. + +.. code-block:: csharp + + public static void TaintBarrierGuard(Uri uri) { + if (!uri.IsAbsoluteUri) { // The check guards the redirect, so the URL is safe. + Response.Redirect(uri.ToString()); // This is not a URL redirection vulnerability. + } + } + +We need to add a tuple to the ``barrierGuardModel(namespace, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/csharp-all + extensible: barrierGuardModel + data: + - ["System", "Uri", False, "get_IsAbsoluteUri", "()", "", "Argument[this]", "false", "url-redirection", "manual"] + +The first five values identify the callable (in this case the getter of a property) to be modeled as a barrier guard. + +- The first value ``System`` is the namespace name. +- The second value ``Uri`` is the class (type) name. +- The third value ``False`` is a flag that indicates whether or not the model guard also applies to all overrides of the method. +- The fourth value ``get_IsAbsoluteUri`` is the method name. Getter and setter methods are named ``get_`` and ``set_`` respectively. +- The fifth value ``()`` is the method input type signature. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access-path``, the ``accepting-value``, the ``kind``, and the ``provenance`` (origin) of the barrier guard. + +- The seventh value ``Argument[this]`` is the access path to the input whose flow is blocked. In this case, the qualifier of the property access (``uri`` in the example). +- The eighth value ``false`` is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. In this case, when ``IsAbsoluteUri`` is ``false``, the URL is relative and considered safe. +- The ninth value ``url-redirection`` is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope. In this case - the URL redirection queries. +- The tenth value ``manual`` is the provenance of the barrier guard, which is used to identify the origin of the barrier guard. + Example: Add a ``neutral`` method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how we can model a method as being neutral with respect to flow. We will also cover how to model a property by modeling the getter of the ``Now`` property of the ``DateTime`` class as neutral. @@ -319,7 +396,7 @@ A neutral model is used to define that there is no flow through a method. ... } -We need to add a tuple to the ``neutralModel``\(namespace, type, name, signature, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``neutralModel(namespace, type, name, signature, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -330,8 +407,6 @@ We need to add a tuple to the ``neutralModel``\(namespace, type, name, signature data: - ["System", "DateTime", "get_Now", "()", "summary", "manual"] - -Since we are adding a neutral model, we need to add tuples to the ``neutralModel`` extensible predicate. The first four values identify the callable (in this case the getter of the ``Now`` property) to be modeled as a neutral, the fifth value is the kind, and the sixth value is the provenance (origin) of the neutral. - The first value ``System`` is the namespace name. diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index c5b74ccd73ae..2eb9446459f4 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -58,6 +58,8 @@ The CodeQL library for Go analysis exposes the following extensible predicates: - ``sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." - ``sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable. - ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. +- ``barrierModel(package, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model barriers, which are elements that stop the flow of taint. +- ``barrierGuardModel(package, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)``. This is used to model barrier guards, which are elements that can stop the flow of taint depending on a conditional check. - ``neutralModel(package, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) can be used to override generated summary models (those with a provenance such as ``df-generated``), so that the summary model will be ignored. Other than that, neutral models have no effect. The extensible predicates are populated using the models defined in data extension files. @@ -91,19 +93,18 @@ We need to add a tuple to the ``sinkModel``\(package, type, subtypes, name, sign data: - ["database/sql", "DB", True, "Prepare", "", "", "Argument[0]", "sql-injection", "manual"] -Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. The first five values identify the function (in this case a method) to be modeled as a sink. - The first value ``database/sql`` is the package name. - The second value ``DB`` is the name of the type that the method is associated with. -- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. +- The third value ``True`` is a flag that indicates whether or not the model also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``Prepare`` is the method name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the sink. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the sink. -- The seventh value ``Argument[0]`` is the ``access path`` to the first argument passed to the method, which means that this is the location of the sink. +- The seventh value ``Argument[0]`` is the ``access-path`` to the first argument passed to the method, which means that this is the location of the sink. - The eighth value ``sql-injection`` is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries. - The ninth value ``manual`` is the provenance of the sink, which is used to identify the origin of the sink. @@ -120,7 +121,7 @@ This is the ``FormValue`` method of the ``Request`` type which is located in the } -We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -131,18 +132,16 @@ We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, si data: - ["net/http", "Request", True, "FormValue", "", "", "ReturnValue", "remote", "manual"] - -Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the function to be modeled as a source. - The first value ``net/http`` is the package name. - The second value ``Request`` is the type name, since the function is a method of the ``Request`` type. -- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. +- The third value ``True`` is a flag that indicates whether or not the model also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``FormValue`` is the function name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the source. - The seventh value ``ReturnValue`` is the access path to the return of the method, which means that it is the return value that should be considered a source of tainted input. - The eighth value ``remote`` is the kind of the source. The source kind is used to define the threat model where the source is in scope. ``remote`` applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses ``remote`` sources. For more information, see ":ref:`Threat models `." @@ -162,7 +161,7 @@ This pattern covers many of the cases where we need to summarize flow through a ... } -We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add a tuple to the ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -171,21 +170,20 @@ We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, s pack: codeql/go-all extensible: summaryModel data: - - ["slices", "", False, "Max", "", "", "Argument[0].ArrayElement", "ReturnValue", "value", "manual"] + - ["slices", "", False, "Max", "", "", "Argument[0].ArrayElement", "ReturnValue", "value", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. The first row defines flow from the first argument (``a`` in the example) to the return value (``max`` in the example). The first five values identify the function to be modeled as a summary. - The first value ``slices`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. +- The third value ``False`` is a flag that indicates whether or not the model also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Max`` is the function name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[0].ArrayElement`` is the access path to the array elements of the first argument (the elements of the slice in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. @@ -207,7 +205,7 @@ This pattern covers many of the cases where we need to summarize flow through a ... } -We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add a tuple to the ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -218,19 +216,18 @@ We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, s data: - ["slices", "", False, "Concat", "", "", "Argument[0].ArrayElement.ArrayElement", "ReturnValue.ArrayElement", "value", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. The first row defines flow from the arguments (``a`` and ``b`` in the example) to the return value (``c`` in the example). The first five values identify the function to be modeled as a summary. - The first value ``slices`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. +- The third value ``False`` is a flag that indicates whether or not the model also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Max`` is the function name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[0].ArrayElement.ArrayElement`` is the access path to the array elements of the array elements of the first argument. Note that a variadic parameter of type `...T` is treated as if it has type `[]T` and arguments corresponding to the variadic parameter are accessed as elements of this slice. - The eighth value ``ReturnValue.ArrayElement`` is the access path to the output (where data flows to), in this case ``ReturnValue.ArrayElement``, which means that the input flows to the array elements of the return value. @@ -251,7 +248,7 @@ This pattern covers many of the cases where we need to summarize flow through a ... } -We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -263,7 +260,6 @@ We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, si - ["strings", "", False, "Join", "", "", "Argument[0]", "ReturnValue", "taint", "manual"] - ["strings", "", False, "Join", "", "", "Argument[1]", "ReturnValue", "taint", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. The first row defines flow from the first argument (``elems`` in the example) to the return value (``t`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). @@ -272,12 +268,12 @@ These are the same for both of the rows above as we are adding two summaries for - The first value ``strings`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. +- The third value ``False`` is a flag that indicates whether or not the model also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Join`` is the function name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument (``elems`` in the example) and ``Argument[1]`` is the access path to the second argument (``sep`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. @@ -307,8 +303,8 @@ This example shows how the Go query pack models flow through a method for a simp host := u.Hostname() // There is taint flow from u to host. ... } - -We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: + +We need to add a tuple to the ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -319,7 +315,6 @@ We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, s data: - ["net/url", "URL", True, "Hostname", "", "", "Argument[receiver]", "ReturnValue", "taint", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. The first row defines flow from the qualifier of the method call (``u`` in the example) to the return value (``host`` in the example). @@ -327,18 +322,98 @@ The first five values identify the function (in this case a method) to be modele - The first value ``net/url`` is the package name. - The second value ``URL`` is the receiver type. -- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. +- The third value ``True`` is a flag that indicates whether or not the model also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``Hostname`` is the method name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[receiver]`` is the access path to the receiver (``u`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. When there are multiple return values, use ``ReturnValue[i]`` to refer to the ``i`` th return value (starting from 0). - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. +Example: Add a barrier using the ``Htmlquote`` function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Go query pack models a barrier that stops the flow of taint. +The ``Htmlquote`` function from the `beego` framework HTML-escapes a string, which prevents HTML injection attacks. + +.. code-block:: go + + func Render(w http.ResponseWriter, r *http.Request) { + name := r.FormValue("name") + safe := beego.Htmlquote(name) // The return value of this function is safe to use in HTML. + ... + } + +We need to add a tuple to the ``barrierModel(package, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: barrierModel + data: + - ["group:beego", "", True, "Htmlquote", "", "", "ReturnValue", "html-injection", "manual"] + +The first five values identify the function to be modeled as a barrier. + +- The first value ``group:beego`` is the package group name. The ``group:`` prefix indicates that this is a package group, which is used to match multiple package paths that refer to the same package. +- The second value ``""`` is left blank since the function is not a method of a type. +- The third value ``True`` is a flag that indicates whether or not the model also applies to subtypes. This has no effect for non-method functions. +- The fourth value ``Htmlquote`` is the function name. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the barrier. + +- The seventh value ``ReturnValue`` is the access path to the output of the barrier, which means that the return value is considered sanitized. +- The eighth value ``html-injection`` is the kind of the barrier. The barrier kind must match the kind used in the query where the barrier should take effect. In this case, it matches the ``html-injection`` sink kind used by XSS queries. +- The ninth value ``manual`` is the provenance of the barrier, which is used to identify the origin of the barrier. + +Example: Add a barrier guard +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. +A barrier guard model is used when a function returns a boolean that indicates whether the data is safe to use. +Consider a function called ``IsSafe`` which returns ``true`` when the data is considered safe for SQL injection. + +.. code-block:: go + + func Query(db *sql.DB, input string) { + if example.IsSafe(input) { // The check guards the query, so the input is safe. + db.Query(input) // This is not a SQL injection vulnerability. + } + } + +We need to add a tuple to the ``barrierGuardModel(package, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: barrierGuardModel + data: + - ["example.com/example", "", False, "IsSafe", "", "", "Argument[0]", "true", "sql-injection", "manual"] + +The first five values identify the function to be modeled as a barrier guard. + +- The first value ``example.com/example`` is the package name. +- The second value ``""`` is left blank since the function is not a method of a type. +- The third value ``False`` is a flag that indicates whether or not the model guard also applies to subtypes. This has no effect for non-method functions. +- The fourth value ``IsSafe`` is the function name. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access-path``, the ``accepting-value``, the ``kind``, and the ``provenance`` (origin) of the barrier guard. + +- The seventh value ``Argument[0]`` is the access path to the input whose flow is blocked. In this case, the first argument to the function (``input`` in the example). +- The eighth value ``true`` is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. In this case, when ``IsSafe`` returns ``true``, the input is considered safe. +- The ninth value ``sql-injection`` is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope. In this case - the SQL injection queries. +- The tenth value ``manual`` is the provenance of the barrier guard, which is used to identify the origin of the barrier guard. + Example: Accessing the ``Body`` field of an HTTP request ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how we can model a field read as a source of tainted data. @@ -350,7 +425,7 @@ This example shows how we can model a field read as a source of tainted data. ... } -We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -361,17 +436,16 @@ We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, si data: - ["net/http", "Request", True, "Body", "", "", "", "remote", "manual"] -Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the field to be modeled as a source. - The first value ``net/http`` is the package name. - The second value ``Request`` is the name of the type that the field is associated with. -- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. For fields this means when the field is accessed as a promoted field in another type. +- The third value ``True`` is a flag that indicates whether or not the model also applies to subtypes. For fields this means when the field is accessed as a promoted field in another type. - The fourth value ``Body`` is the field name. - The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the source. - The seventh value ``""`` is left blank. Leaving the access path of a source model blank indicates that it is a field access. - The eighth value ``remote`` is the source kind. This indicates that the source is a remote source of untrusted data. @@ -387,7 +461,7 @@ Note that packages hosted at ``gopkg.in`` use a slightly different syntax: the m To write models that only apply to ``github.com/couchbase/gocb/v2``, it is sufficient to include the major version suffix (``/v2``) in the package column. To write models that only apply to ``github.com/couchbase/gocb``, you may prefix the package column with ``fixed-version:``. For example, here are two models for a method that has changed name from v1 to v2. .. code-block:: yaml - + extensions: - addsTo: pack: codeql/go-all diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-java-and-kotlin.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-java-and-kotlin.rst index 7f0a41b3040e..203213b94255 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-java-and-kotlin.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-java-and-kotlin.rst @@ -63,6 +63,8 @@ The CodeQL library for Java and Kotlin analysis exposes the following extensible - ``sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." - ``sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data maybe used in a way that makes the code vulnerable. - ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. +- ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model barriers, which are elements that stop the flow of taint. +- ``barrierGuardModel(namespace, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)``. This is used to model barrier guards, which are elements that can stop the flow of taint depending on a conditional check. - ``neutralModel(package, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) override generated summary models (those with a provenance such as ``df-generated``) so that the summary will be ignored. Other than that, neutral models have a slight impact on the dataflow dispatch logic, which is out of scope for this documentation. The extensible predicates are populated using the models defined in data extension files. @@ -85,7 +87,7 @@ This is the ``execute`` method in the ``Statement`` class, which is located in t stmt.execute(query); // The argument to this method is a SQL injection sink. } -We need to add a tuple to the ``sinkModel``\(package, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -96,20 +98,18 @@ We need to add a tuple to the ``sinkModel``\(package, type, subtypes, name, sign data: - ["java.sql", "Statement", True, "execute", "(String)", "", "Argument[0]", "sql-injection", "manual"] - -Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. The first five values identify the callable (in this case a method) to be modeled as a sink. - The first value ``java.sql`` is the package name. - The second value ``Statement`` is the name of the class (type) that contains the method. -- The third value ``True`` is a flag that indicates whether or not the sink also applies to all overrides of the method. +- The third value ``True`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``execute`` is the method name. - The fifth value ``(String)`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the sink. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the sink. -- The seventh value ``Argument[0]`` is the ``access path`` to the first argument passed to the method, which means that this is the location of the sink. +- The seventh value ``Argument[0]`` is the ``access-path`` to the first argument passed to the method, which means that this is the location of the sink. - The eighth value ``sql-injection`` is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries. - The ninth value ``manual`` is the provenance of the sink, which is used to identify the origin of the sink. @@ -125,7 +125,7 @@ This is the ``getInputStream`` method in the ``Socket`` class, which is located ... } -We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -136,18 +136,16 @@ We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, si data: - ["java.net", "Socket", False, "getInputStream", "()", "", "ReturnValue", "remote", "manual"] - -Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the callable (in this case a method) to be modeled as a source. - The first value ``java.net`` is the package name. - The second value ``Socket`` is the name of the class (type) that contains the source. -- The third value ``False`` is a flag that indicates whether or not the source also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``getInputStream`` is the method name. - The fifth value ``()`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the source. - The seventh value ``ReturnValue`` is the access path to the return of the method, which means that it is the return value that should be considered a source of tainted input. - The eighth value ``remote`` is the kind of the source. The source kind is used to define the threat model where the source is in scope. ``remote`` applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses ``remote`` sources. For more information, see ":ref:`Threat models `." @@ -165,7 +163,7 @@ This pattern covers many of the cases where we need to summarize flow through a ... } -We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -177,7 +175,6 @@ We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, si - ["java.lang", "String", False, "concat", "(String)", "", "Argument[this]", "ReturnValue", "taint", "manual"] - ["java.lang", "String", False, "concat", "(String)", "", "Argument[0]", "ReturnValue", "taint", "manual"] -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. The first row defines flow from the qualifier (``s1`` in the example) to the return value (``t`` in the example) and the second row defines flow from the first argument (``s2`` in the example) to the return value (``t`` in the example). @@ -186,12 +183,12 @@ These are the same for both of the rows above as we are adding two summaries for - The first value ``java.lang`` is the package name. - The second value ``String`` is the class (type) name. -- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``concat`` is the method name. - The fifth value ``(String)`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[this]`` is the access path to the qualifier (``s1`` in the example) and ``Argument[0]`` is the access path to the first argument (``s2`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. @@ -210,7 +207,7 @@ Here we model flow through higher order methods and collection types. ... } -We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)`` extensible predicate by updating a data extension file: .. code-block:: yaml @@ -222,20 +219,18 @@ We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, si - ["java.util.stream", "Stream", True, "map", "(Function)", "", "Argument[this].Element", "Argument[0].Parameter[0]", "value", "manual"] - ["java.util.stream", "Stream", True, "map", "(Function)", "", "Argument[0].ReturnValue", "ReturnValue.Element", "value", "manual"] - -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines part of the flow that comprises the total flow through the ``map`` method. The first five values identify the callable (in this case a method) to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method. - The first value ``java.util.stream`` is the package name. - The second value ``Stream`` is the class (type) name. -- The third value ``True`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``True`` is a flag that indicates whether or not the model also applies to all overrides of the method. - The fourth value ``map`` is the method name. - The fifth value ``Function`` is the method input type signature. The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary definition. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the summary definition. - The seventh value is the access path to the ``input`` (where data flows from). - The eighth value is the access path to the ``output`` (where data flows to). @@ -257,6 +252,87 @@ For the remaining values for both rows: That is, the first row specifies that values can flow from the elements of the qualifier stream into the first argument of the function provided to ``map``. The second row specifies that values can flow from the return value of the function to the elements of the stream returned from ``map``. +Example: Taint barrier in the ``java.io`` package +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Java query pack models the return value from the ``getName`` method as a barrier for path injection. +This is the ``getName`` method in the ``File`` class, which is located in the ``java.io`` package. The method returns only the final component of a path, which means that it protects against path injection vulnerabilities. + +.. code-block:: java + + public static void barrier(File file) { + String name = file.getName(); // The return value of this method is a barrier for path injection. + ... + } + +We need to add a tuple to the ``barrierModel(package, type, subtypes, name, signature, ext, output, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/java-all + extensible: barrierModel + data: + - ["java.io", "File", True, "getName", "()", "", "ReturnValue", "path-injection", "manual"] + +The first five values identify the callable (in this case a method) to be modeled as a barrier. + +- The first value ``java.io`` is the package name. +- The second value ``File`` is the name of the class (type) that contains the method. +- The third value ``True`` is a flag that indicates whether or not the model also applies to all overrides of the method. +- The fourth value ``getName`` is the method name. +- The fifth value ``()`` is the method input type signature. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access-path``, the ``kind``, and the ``provenance`` (origin) of the barrier. + +- The seventh value ``ReturnValue`` is the access path to the return of the method, which means that it is the return value that should be considered a barrier. +- The eighth value ``path-injection`` is the kind of the barrier. The barrier kind is used to define the queries where the barrier is in scope. In this case - the path injection queries. +- The ninth value ``manual`` is the provenance of the barrier, which is used to identify the origin of the barrier. + +Example: Taint barrier guard in the ``java.net`` package +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Java query pack models the ``isAbsolute`` method as a barrier guard for request forgery. +This is the ``isAbsolute`` method in the ``URI`` class, which is located in the ``java.net`` package. +A barrier guard model is used to stop the flow of taint when a conditional check is performed on data. +When the ``isAbsolute`` method returns ``false``, the URI is relative and therefore safe for request forgery because it cannot redirect to an external server controlled by an attacker. + +.. code-block:: java + + public static void barrierguard(URI uri) throws IOException { + if (!uri.isAbsolute()) { // The check guards the request, so the URI is safe. + URL url = uri.toURL(); + url.openConnection(); // This is not a request forgery vulnerability. + } + } + +We need to add a tuple to the ``barrierGuardModel(package, type, subtypes, name, signature, ext, input, acceptingValue, kind, provenance)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/java-all + extensible: barrierGuardModel + data: + - ["java.net", "URI", True, "isAbsolute", "()", "", "Argument[this]", "false", "request-forgery", "manual"] + +The first five values identify the callable (in this case a method) to be modeled as a barrier guard. + +- The first value ``java.net`` is the package name. +- The second value ``URI`` is the name of the class (type) that contains the method. +- The third value ``True`` is a flag that indicates whether or not the model guard also applies to all overrides of the method. +- The fourth value ``isAbsolute`` is the method name. +- The fifth value ``()`` is the method input type signature. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access-path``, the ``accepting-value``, the ``kind``, and the ``provenance`` (origin) of the barrier guard. + +- The seventh value ``Argument[this]`` is the access path to the input whose flow is blocked. In this case, the qualifier of the method call (``uri`` in the example). +- The eighth value ``false`` is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. In this case, when ``isAbsolute`` is ``false``, the URI is relative and considered safe. +- The ninth value ``request-forgery`` is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope. In this case - the request forgery queries. +- The tenth value ``manual`` is the provenance of the barrier guard, which is used to identify the origin of the barrier guard. + Example: Add a ``neutral`` method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how the Java query pack models the ``now`` method as being neutral with respect to flow. @@ -269,7 +345,7 @@ A neutral model is used to define that there is no flow through a method. ... } -We need to add a tuple to the ``neutralModel``\(package, type, name, signature, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``neutralModel(package, type, name, signature, kind, provenance)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -280,8 +356,6 @@ We need to add a tuple to the ``neutralModel``\(package, type, name, signature, data: - ["java.time", "Instant", "now", "()", "summary", "manual"] - -Since we are adding a neutral model, we need to add tuples to the ``neutralModel`` extensible predicate. The first four values identify the callable (in this case a method) to be modeled as a neutral, the fifth value is the kind, and the sixth value is the provenance (origin) of the neutral. - The first value ``java.time`` is the package name. diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-javascript.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-javascript.rst index b8f064c75747..a0702289cefb 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-javascript.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-javascript.rst @@ -22,24 +22,27 @@ A data extension for JavaScript is a YAML file of the form: The CodeQL library for JavaScript exposes the following extensible predicates: -- **sourceModel**\(type, path, kind) -- **sinkModel**\(type, path, kind) -- **typeModel**\(type1, type2, path) -- **summaryModel**\(type, path, input, output, kind) +- ``sourceModel(type, path, kind)`` +- ``sinkModel(type, path, kind)`` +- ``typeModel(type1, type2, path)`` +- ``summaryModel(type, path, input, output, kind)`` +- ``barrierModel(type, path, kind)`` +- ``barrierGuardModel(type, path, acceptingValue, kind)`` We'll explain how to use these using a few examples, and provide some reference material at the end of this article. Example: Taint sink in the 'execa' package ------------------------------------------ -In this example, we'll show how to add the following argument, passed to **execa**, as a command-line injection sink: +In this example, we'll show how to add the following argument, passed to ``execa``, as a command-line injection sink: .. code-block:: js import { shell } from "execa"; shell(cmd); // <-- add 'cmd' as a taint sink -Note that this sink is already recognized by the CodeQL JS analysis, but for this example, you could use the following data extension: +Note that this sink is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the +``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -50,21 +53,19 @@ Note that this sink is already recognized by the CodeQL JS analysis, but for thi data: - ["execa", "Member[shell].Argument[0]", "command-injection"] - -- Since we're adding a new sink, we add a tuple to the **sinkModel** extensible predicate. -- The first column, **"execa"**, identifies a set of values from which to begin the search for the sink. - The string **"execa"** means we start at the places where the codebase imports the NPM package **execa**. +- The first column, ``"execa"``, identifies a set of values from which to begin the search for the sink. + The string ``"execa"`` means we start at the places where the codebase imports the NPM package ``execa``. - The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column. - - **Member[shell]** selects accesses to the **shell** member of the **execa** package. - - **Argument[0]** selects the first argument to calls to that member. + - ``Member[shell]`` selects accesses to the ``shell`` member of the ``execa`` package. + - ``Argument[0]`` selects the first argument to calls to that member. -- **command-injection** indicates that this is considered a sink for the command injection query. +- ``command-injection`` indicates that this is considered a sink for the command injection query. Example: Taint sources from window 'message' events --------------------------------------------------- -In this example, we'll show how the **event.data** expression below could be marked as a remote flow source: +In this example, we'll show how the ``event.data`` expression below could be marked as a remote flow source: .. code-block:: js @@ -72,7 +73,8 @@ In this example, we'll show how the **event.data** expression below could be mar let data = event.data; // <-- add 'event.data' as a taint source }); -Note that this source is already known by the CodeQL JS analysis, but for this example, you could use the following data extension: +Note that this source is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the +``sourceModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -87,21 +89,19 @@ Note that this source is already known by the CodeQL JS analysis, but for this e "remote", ] - -- Since we're adding a new taint source, we add a tuple to the **sourceModel** extensible predicate. -- The first column, **"global"**, begins the search at references to the global object (also known as **window** in browser contexts). This is a special JavaScript object that contains all global variables and methods. -- **Member[addEventListener]** selects accesses to the **addEventListener** member. -- **Argument[1]** selects the second argument of calls to that member (the argument containing the callback). -- **Parameter[0]** selects the first parameter of the callback (the parameter named **event**). -- **Member[data]** selects accesses to the **data** property of the event object. -- Finally, the kind **remote** indicates that this is considered a source of remote flow. +- The first column, ``"global"``, begins the search at references to the global object (also known as ``window`` in browser contexts). This is a special JavaScript object that contains all global variables and methods. +- ``Member[addEventListener]`` selects accesses to the ``addEventListener`` member. +- ``Argument[1]`` selects the second argument of calls to that member (the argument containing the callback). +- ``Parameter[0]`` selects the first parameter of the callback (the parameter named ``event``). +- ``Member[data]`` selects accesses to the ``data`` property of the event object. +- Finally, the kind ``remote`` indicates that this is considered a source of remote flow. In the next section, we'll show how to restrict the model to recognize events of a specific type. Continued example: Restricting the event type --------------------------------------------- -The model above treats all events as sources of remote flow, not just **message** events. +The model above treats all events as sources of remote flow, not just ``message`` events. For example, it would also pick up this irrelevant source: .. code-block:: js @@ -111,7 +111,7 @@ For example, it would also pick up this irrelevant source: }); -We can refine the model by adding the **WithStringArgument** component to restrict the set of calls being considered: +We can refine the model by adding the ``WithStringArgument`` component to restrict the set of calls being considered: .. code-block:: yaml @@ -126,7 +126,7 @@ We can refine the model by adding the **WithStringArgument** component to restri "remote", ] -The **WithStringArgument[0=message]** component here selects the subset of calls to **addEventListener** where the first argument is a string literal with the value **"message"**. +The ``WithStringArgument[0=message]`` component here selects the subset of calls to ``addEventListener`` where the first argument is a string literal with the value ``"message"``. Example: Using types to add MySQL injection sinks ------------------------------------------------- @@ -141,7 +141,7 @@ In this example, we'll show how to add the following SQL injection sink: connection.query(q); // <-- add 'q' as a SQL injection sink } -We can recognize this using the following extension: +We need to add a tuple to the ``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -152,14 +152,13 @@ We can recognize this using the following extension: data: - ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"] +- The first column, ``"mysql.Connection"``, begins the search at any expression whose value is known to be an instance of + the ``Connection`` type from the ``mysql`` package. This will select the ``connection`` parameter above because of its type annotation. +- ``Member[query]`` selects the ``query`` member from the connection object. +- ``Argument[0]`` selects the first argument of a call to that member. +- ``sql-injection`` indicates that this is considered a sink for the SQL injection query. -- The first column, **"mysql.Connection"**, begins the search at any expression whose value is known to be an instance of - the **Connection** type from the **mysql** package. This will select the **connection** parameter above because of its type annotation. -- **Member[query]** selects the **query** member from the connection object. -- **Argument[0]** selects the first argument of a call to that member. -- **sql-injection** indicates that this is considered a sink for the SQL injection query. - -This works in this example because the **connection** parameter has a type annotation that matches what the model is looking for. +This works in this example because the ``connection`` parameter has a type annotation that matches what the model is looking for. Note that there is a significant difference between the following two rows: @@ -169,8 +168,8 @@ Note that there is a significant difference between the following two rows: - ["mysql.Connection", "", ...] - ["mysql", "Member[Connection]", ...] -The first row matches instances of **mysql.Connection**, which are objects that encapsulate a MySQL connection. -The second row would match something like **require('mysql').Connection**, which is not itself a connection object. +The first row matches instances of ``mysql.Connection``, which are objects that encapsulate a MySQL connection. +The second row would match something like ``require('mysql').Connection``, which is not itself a connection object. In the next section, we'll show how to generalize the model to handle the absence of type annotations. @@ -185,8 +184,9 @@ Suppose we want the model from above to detect the sink in this snippet: let connection = getConnection(); connection.query(q); // <-- add 'q' as a SQL injection sink -There is no type annotation on **connection**, and there is no indication of what **getConnection()** returns. -Using a **typeModel** tuple we can tell our model that this function returns an instance of **mysql.Connection**: +There is no type annotation on ``connection``, and there is no indication of what ``getConnection()`` returns. +By adding a tuple to the ``typeModel(type1, type2, path)`` extensible predicate we can tell our model that +this function returns an instance of ``mysql.Connection``: .. code-block:: yaml @@ -197,19 +197,17 @@ Using a **typeModel** tuple we can tell our model that this function returns an data: - ["mysql.Connection", "@example/db", "Member[getConnection].ReturnValue"] +- The first column, ``"mysql.Connection"``, names the type that we're adding a new definition for. +- The second column, ``"@example/db"``, begins the search at imports of the hypothetical NPM package ``@example/db``. +- ``Member[getConnection]`` selects references to the ``getConnection`` member from that package. +- ``ReturnValue`` selects the return value from a call to that member. -- Since we're providing type information, we add a tuple to the **typeModel** extensible predicate. -- The first column, **"mysql.Connection"**, names the type that we're adding a new definition for. -- The second column, **"@example/db"**, begins the search at imports of the hypothetical NPM package **@example/db**. -- **Member[getConnection]** selects references to the **getConnection** member from that package. -- **ReturnValue** selects the return value from a call to that member. - -The new model states that the return value of **getConnection()** has type **mysql.Connection**. +The new model states that the return value of ``getConnection()`` has type ``mysql.Connection``. Combining this with the sink model we added earlier, the sink in the example is detected by the model. The mechanism used here is how library models work for both TypeScript and plain JavaScript. -A good library model contains **typeModel** tuples to ensure it works even in codebases without type annotations. -For example, the **mysql** model that is included with the CodeQL JS analysis includes this type definition (among many others): +A good library model contains ``typeModel`` tuples to ensure it works even in codebases without type annotations. +For example, the ``mysql`` model that is included with the CodeQL JS analysis includes this type definition (among many others): .. code-block:: yaml @@ -228,7 +226,7 @@ In this example, we'll show how to add the following SQL injection sink using a conn.query(q, (err, rows) => {...}); // <-- add 'q' as a SQL injection sink }); -We can recognize this using a fuzzy model, as shown in the following extension: +We need to add a tuple for a fuzzy model to the ``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -239,13 +237,13 @@ We can recognize this using a fuzzy model, as shown in the following extension: data: - ["mysql", "Fuzzy.Member[query].Argument[0]", "sql-injection"] -- The first column, **"mysql"**, begins the search at places where the `mysql` package is imported. -- **Fuzzy** selects all objects that appear to originate from the `mysql` package, such as the `pool`, `conn`, `err`, and `rows` objects. -- **Member[query]** selects the **query** member from any of those objects. In this case, the only such member is `conn.query`. +- The first column, ``"mysql"``, begins the search at places where the `mysql` package is imported. +- ``Fuzzy`` selects all objects that appear to originate from the `mysql` package, such as the `pool`, `conn`, `err`, and `rows` objects. +- ``Member[query]`` selects the ``query`` member from any of those objects. In this case, the only such member is `conn.query`. In principle, this would also find expressions such as `pool.query` and `err.query`, but in practice such expressions are not likely to occur, because the `pool` and `err` objects do not have a member named `query`. -- **Argument[0]** selects the first argument of a call to the selected member, that is, the `q` argument to `conn.query`. -- **sql-injection** indicates that this is considered as a sink for the SQL injection query. +- ``Argument[0]`` selects the first argument of a call to the selected member, that is, the `q` argument to `conn.query`. +- ``sql-injection`` indicates that this is considered as a sink for the SQL injection query. For reference, a more detailed model might look like this, as described in the preceding examples: @@ -265,7 +263,7 @@ For reference, a more detailed model might look like this, as described in the p - ["mysql.Pool", "mysql", "Member[createPool].ReturnValue"] - ["mysql.Connection", "mysql.Pool", "Member[getConnection].Argument[0].Parameter[1]"] -The model using the **Fuzzy** component is simpler, at the cost of being approximate. +The model using the ``Fuzzy`` component is simpler, at the cost of being approximate. This technique is useful when modeling a large or complex library, where it is difficult to write a detailed model. Example: Adding flow through 'decodeURIComponent' @@ -277,7 +275,8 @@ In this example, we'll show how to add flow through calls to `decodeURIComponent let y = decodeURIComponent(x); // add taint flow from 'x' to 'y' -Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could use the following data extension: +Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the +``summaryModel(type, path, input, output, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -294,28 +293,27 @@ Note that this flow is already recognized by the CodeQL JS analysis, but for thi "taint", ] - -- Since we're adding flow through a function call, we add a tuple to the **summaryModel** extensible predicate. -- The first column, **"global"**, begins the search for relevant calls at references to the global object. +- The first column, ``"global"``, begins the search for relevant calls at references to the global object. In JavaScript, global variables are properties of the global object, so this lets us access global variables or functions. -- The second column, **Member[decodeURIComponent]**, is a path leading to the function calls we wish to model. - In this case, we select references to the **decodeURIComponent** member from the global object, that is, - the global variable named **decodeURIComponent**. -- The third column, **Argument[0]**, indicates the input of the flow. In this case, the first argument to the function call. -- The fourth column, **ReturnValue**, indicates the output of the flow. In this case, the return value of the function call. -- The last column, **taint**, indicates the kind of flow to add. The value **taint** means the output is not necessarily equal +- The second column, ``Member[decodeURIComponent]``, is a path leading to the function calls we wish to model. + In this case, we select references to the ``decodeURIComponent`` member from the global object, that is, + the global variable named ``decodeURIComponent``. +- The third column, ``Argument[0]``, indicates the input of the flow. In this case, the first argument to the function call. +- The fourth column, ``ReturnValue``, indicates the output of the flow. In this case, the return value of the function call. +- The last column, ``taint``, indicates the kind of flow to add. The value ``taint`` means the output is not necessarily equal to the input, but was derived from the input in a taint-preserving way. Example: Adding flow through 'underscore.forEach' ------------------------------------------------- -In this example, we'll show how to add flow through calls to **forEach** from the **underscore** package: +In this example, we'll show how to add flow through calls to ``forEach`` from the ``underscore`` package: .. code-block:: js require('underscore').forEach([x, y], (v) => { ... }); // add value flow from 'x' and 'y' to 'v' -Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could use the following data extension: +Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the +``summaryModel(type, path, input, output, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -332,21 +330,19 @@ Note that this flow is already recognized by the CodeQL JS analysis, but for thi "value", ] - -- Since we're adding flow through a function call, we add a tuple to the **summaryModel** extensible predicate. -- The first column, **"underscore"**, begins the search for relevant calls at places where the **underscore** package is imported. -- The second column, **Member[forEach]**, selects references to the **forEach** member from the **underscore** package. +- The first column, ``"underscore"``, begins the search for relevant calls at places where the ``underscore`` package is imported. +- The second column, ``Member[forEach]``, selects references to the ``forEach`` member from the ``underscore`` package. - The third column specifies the input of the flow: - - **Argument[0]** selects the first argument of **forEach**, which is the array being iterated over. - - **ArrayElement** selects the elements of that array (the expressions **x** and **y**). + - ``Argument[0]`` selects the first argument of ``forEach``, which is the array being iterated over. + - ``ArrayElement`` selects the elements of that array (the expressions ``x`` and ``y``). - The fourth column specifies the output of the flow: - - **Argument[1]** selects the second argument of **forEach** (the argument containing the callback function). - - **Parameter[0]** selects the first parameter of the callback function (the parameter named **v**). + - ``Argument[1]`` selects the second argument of ``forEach`` (the argument containing the callback function). + - ``Parameter[0]`` selects the first parameter of the callback function (the parameter named ``v``). -- The last column, **value**, indicates the kind of flow to add. The value **value** means the input value is unchanged as +- The last column, ``value``, indicates the kind of flow to add. The value ``value`` means the input value is unchanged as it flows to the output. @@ -367,7 +363,7 @@ on the incoming request objects: req.data; // <-- mark 'req.data' as a taint source }); -This can be achieved with the following data extension: +We need to add a tuple to the ``sourceModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -382,14 +378,66 @@ This can be achieved with the following data extension: "remote", ] -- Since we're adding a new taint source, we add a tuple to the **sourceModel** extensible predicate. -- The first column, **"@example/middleware"**, begins the search at imports of the hypothetical NPM package **@example/middleware**. -- **Member[injectData]** selects accesses to the **injectData** member. -- **ReturnValue** selects the return value of the call to **injectData**. -- **GuardedRouteHandler** interprets the current value as a middleware function and selects all route handlers guarded by that middleware. Since the current value is passd to **app.use()**, the callback subsequently passed to **app.get()** is seen as a guarded route handler. -- **Parameter[0]** selects the first parameter of the callback (the parameter named **req**). -- **Member[data]** selects accesses to the **data** property of the **req** object. -- Finally, the kind **remote** indicates that this is considered a source of remote flow. +- The first column, ``"@example/middleware"``, begins the search at imports of the hypothetical NPM package ``@example/middleware``. +- ``Member[injectData]`` selects accesses to the ``injectData`` member. +- ``ReturnValue`` selects the return value of the call to ``injectData``. +- ``GuardedRouteHandler`` interprets the current value as a middleware function and selects all route handlers guarded by that middleware. Since the current value is passd to ``app.use()``, the callback subsequently passed to ``app.get()`` is seen as a guarded route handler. +- ``Parameter[0]`` selects the first parameter of the callback (the parameter named ``req``). +- ``Member[data]`` selects accesses to the ``data`` property of the ``req`` object. +- Finally, the kind ``remote`` indicates that this is considered a source of remote flow. + +Example: Taint barrier using the 'encodeURIComponent' function +-------------------------------------------------------------- + +In this example, we'll show how to add the return value of ``encodeURIComponent`` as a barrier for XSS. + +.. code-block:: js + + let escaped = encodeURIComponent(input); // The return value of this method is safe for XSS. + document.body.innerHTML = escaped; + +We need to add a tuple to the ``barrierModel(type, path, kind)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/javascript-all + extensible: barrierModel + data: + - ["global", "Member[encodeURIComponent].ReturnValue", "html-injection"] + +- The first column, ``"global"``, begins the search for relevant calls at references to the global object. +- The second column, ``Member[encodeURIComponent].ReturnValue``, selects the return value of the ``encodeURIComponent`` function. +- The third column, ``"html-injection"``, is the kind of the barrier. + +Example: Add a barrier guard +---------------------------- + +This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. +Consider a function called `isValid` which returns `true` when the data is considered safe. + +.. code-block:: js + + if (isValid(userInput)) { // The check guards the use, so the input is safe. + db.query(userInput); // This is safe. + } + +We need to add a tuple to the ``barrierGuardModel(type, path, acceptingValue, kind)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/javascript-all + extensible: barrierGuardModel + data: + - ["my-package", "Member[isValid].Argument[0]", "true", "sql-injection"] + +- The first column, ``"my-package"``, begins the search at imports of the hypothetical NPM package ``my-package``. +- The second column, ``Member[isValid].Argument[0]``, selects the first argument of the `isValid` function. This is the value being validated. +- The third column, ``"true"``, is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. +- The fourth column, ``"sql-injection"``, is the kind of the barrier guard. Reference material ------------------ @@ -404,9 +452,9 @@ sourceModel(type, path, kind) Adds a new taint source. Most taint-tracking queries will use the new source. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to the source. -- **kind**: Kind of source to add. See the section on source kinds for a list of supported kinds. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to the source. +- ``kind``: Kind of source to add. See the section on source kinds for a list of supported kinds. Example: @@ -424,9 +472,9 @@ sinkModel(type, path, kind) Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to the sink. -- **kind**: Kind of sink to add. See the section on sink kinds for a list of supported kinds. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to the sink. +- ``kind``: Kind of sink to add. See the section on sink kinds for a list of supported kinds. Example: @@ -444,11 +492,11 @@ summaryModel(type, path, input, output, kind) Adds flow through a function call. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to a function call. -- **input**: Path relative to the function call that leads to input of the flow. -- **output**: Path relative to the function call leading to the output of the flow. -- **kind**: Kind of summary to add. Can be **taint** for taint-propagating flow, or **value** for value-preserving flow. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to a function call. +- ``input``: Path relative to the function call that leads to input of the flow. +- ``output``: Path relative to the function call leading to the output of the flow. +- ``kind``: Kind of summary to add. Can be ``taint`` for taint-propagating flow, or ``value`` for value-preserving flow. Example: @@ -472,9 +520,9 @@ typeModel(type1, type2, path) Adds a new definition of a type. -- **type1**: Name of the type to define. -- **type2**: Name of the type from which to evaluate **path**. -- **path**: Access path leading from **type2** to **type1**. +- ``type1``: Name of the type to define. +- ``type2``: Name of the type from which to evaluate ``path``. +- ``path``: Access path leading from ``type2`` to ``type1``. Example: @@ -496,56 +544,56 @@ Types A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. -A type can be defined by adding **typeModel** tuples for that type. Additionally, the following built-in types are available: +A type can be defined by adding ``typeModel`` tuples for that type. Additionally, the following built-in types are available: -- The name of an NPM package matches imports of that package. For example, the type **express** matches the expression **require("express")**. If the package name includes dots, it must be surrounded by single quotes, such as in **'lodash.escape'**. -- The type **global** identifies the global object, also known as **window**. In JavaScript, global variables are properties of the global object, so global variables can be identified using this type. (This type also matches imports of the NPM package named **global**, which is a package that happens to export the global object.) -- A qualified type name of form **.** identifies expressions of type **** from ****. For example, **mysql.Connection** identifies expression of type **Connection** from the **mysql** package. Note that this only works if type annotations are present in the codebase, or if sufficient **typeModel** tuples have been provided for that type. +- The name of an NPM package matches imports of that package. For example, the type ``express`` matches the expression ``require("express")``. If the package name includes dots, it must be surrounded by single quotes, such as in ``'lodash.escape'``. +- The type ``global`` identifies the global object, also known as ``window``. In JavaScript, global variables are properties of the global object, so global variables can be identified using this type. (This type also matches imports of the NPM package named ``global``, which is a package that happens to export the global object.) +- A qualified type name of form ``.`` identifies expressions of type ```` from ````. For example, ``mysql.Connection`` identifies expression of type ``Connection`` from the ``mysql`` package. Note that this only works if type annotations are present in the codebase, or if sufficient ``typeModel`` tuples have been provided for that type. Access paths ------------ -The **path**, **input**, and **output** columns consist of a **.**-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values. +The ``path``, ``input``, and ``output`` columns consist of a ``.``-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values. The following components are supported: -- **Argument[**\ `number`\ **]** selects the argument at the given index. -- **Argument[this]** selects the receiver of a method call. -- **Parameter[**\ `number`\ **]** selects the parameter at the given index. -- **Parameter[this]** selects the **this** parameter of a function. -- **ReturnValue** selects the return value of a function or call. -- **Member[**\ `name`\ **]** selects the property with the given name. -- **AnyMember** selects any property regardless of name. -- **ArrayElement** selects an element of an array. -- **MapValue** selects a value of a map object. -- **Awaited** selects the value of a promise. -- **Instance** selects instances of a class, including instances of its subclasses. -- **Fuzzy** selects all values that are derived from the current value through a combination of the other operations described in this list. +- ``Argument[``\ `number`\ ``]`` selects the argument at the given index. +- ``Argument[this]`` selects the receiver of a method call. +- ``Parameter[``\ `number`\ ``]`` selects the parameter at the given index. +- ``Parameter[this]`` selects the ``this`` parameter of a function. +- ``ReturnValue`` selects the return value of a function or call. +- ``Member[``\ `name`\ ``]`` selects the property with the given name. +- ``AnyMember`` selects any property regardless of name. +- ``ArrayElement`` selects an element of an array. +- ``MapValue`` selects a value of a map object. +- ``Awaited`` selects the value of a promise. +- ``Instance`` selects instances of a class, including instances of its subclasses. +- ``Fuzzy`` selects all values that are derived from the current value through a combination of the other operations described in this list. For example, this can be used to find all values that appear to originate from a particular package. This can be useful for finding method calls from a known package, but where the receiver type is not known or is difficult to model. The following components are called "call site filters". They select a subset of the previously-selected calls, if the call fits certain criteria: -- **WithArity[**\ `number`\ **]** selects the subset of calls that have the given number of arguments. -- **WithStringArgument[**\ `number`\ **=**\ `value`\ **]** selects the subset of calls where the argument at the given index is a string literal with the given value. +- ``WithArity[``\ `number`\ ``]`` selects the subset of calls that have the given number of arguments. +- ``WithStringArgument[``\ `number`\ ``=``\ `value`\ ``]`` selects the subset of calls where the argument at the given index is a string literal with the given value. Components related to decorators: -- **DecoratedClass** selects a class that has the current value as a decorator. For example, **Member[Component].DecoratedClass** selects any class that is decorated with **@Component**. -- **DecoratedParameter** selects a parameter that is decorated by the current value. -- **DecoratedMember** selects a method, field, or accessor that is decorated by the current value. +- ``DecoratedClass`` selects a class that has the current value as a decorator. For example, ``Member[Component].DecoratedClass`` selects any class that is decorated with ``@Component``. +- ``DecoratedParameter`` selects a parameter that is decorated by the current value. +- ``DecoratedMember`` selects a method, field, or accessor that is decorated by the current value. Additionally there is a component related to middleware functions: -- **GuardedRouteHandler** interprets the current value as a middleware function, and selects any route handler function that comes after it in the routing hierarchy. - This can be used to model properties injected onto request and response objects, such as **req.db** after a middleware that injects a database connection. +- ``GuardedRouteHandler`` interprets the current value as a middleware function, and selects any route handler function that comes after it in the routing hierarchy. + This can be used to model properties injected onto request and response objects, such as ``req.db`` after a middleware that injects a database connection. Note that this currently over-approximates the set of route handlers but may be made more accurate in the future. Additional notes about the syntax of operands: -- Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, **Member[foo,bar]** matches the union of **Member[foo]** and **Member[bar]**. -- Numeric operands to **Argument**, **Parameter**, and **WithArity** may be given as an interval. For example, **Argument[0..2]** matches argument 0, 1, or 2. -- **Argument[N-1]** selects the last argument of a call, and **Parameter[N-1]** selects the last parameter of a function, with **N-2** being the second-to-last and so on. +- Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, ``Member[foo,bar]`` matches the union of ``Member[foo]`` and ``Member[bar]``. +- Numeric operands to ``Argument``, ``Parameter``, and ``WithArity`` may be given as an interval. For example, ``Argument[0..2]`` matches argument 0, 1, or 2. +- ``Argument[N-1]`` selects the last argument of a call, and ``Parameter[N-1]`` selects the last parameter of a function, with ``N-2`` being the second-to-last and so on. Kinds ----- @@ -553,14 +601,14 @@ Kinds Source kinds ~~~~~~~~~~~~ -- **remote**: A general source of remote flow. -- **browser**: A source in the browser environment that does not fit a more specific browser kind. -- **browser-url-query**: A source derived from the query parameters of the browser URL, such as ``location.search``. -- **browser-url-fragment**: A source derived from the fragment part of the browser URL, such as ``location.hash``. -- **browser-url-path**: A source derived from the pathname of the browser URL, such as ``location.pathname``. -- **browser-url**: A source derived from the browser URL, where the untrusted part is prefixed by trusted data such as the scheme and hostname. -- **browser-window-name**: A source derived from the window name, such as ``window.name``. -- **browser-message-event**: A source derived from cross-window message passing, such as ``event`` in ``window.onmessage = event => {...}``. +- ``remote``: A general source of remote flow. +- ``browser``: A source in the browser environment that does not fit a more specific browser kind. +- ``browser-url-query``: A source derived from the query parameters of the browser URL, such as ``location.search``. +- ``browser-url-fragment``: A source derived from the fragment part of the browser URL, such as ``location.hash``. +- ``browser-url-path``: A source derived from the pathname of the browser URL, such as ``location.pathname``. +- ``browser-url``: A source derived from the browser URL, where the untrusted part is prefixed by trusted data such as the scheme and hostname. +- ``browser-window-name``: A source derived from the window name, such as ``window.name``. +- ``browser-message-event``: A source derived from cross-window message passing, such as ``event`` in ``window.onmessage = event => {...}``. See also :ref:`Threat models `. @@ -569,22 +617,22 @@ Sink kinds Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query. -- **code-injection**: A sink that can be used to inject code, such as in calls to **eval**. -- **command-injection**: A sink that can be used to inject shell commands, such as in calls to **child_process.spawn**. -- **path-injection**: A sink that can be used for path injection in a file system access, such as in calls to **fs.readFile**. -- **sql-injection**: A sink that can be used for SQL injection, such as in a MySQL **query** call. -- **nosql-injection**: A sink that can be used for NoSQL injection, such as in a MongoDB **findOne** call. -- **html-injection**: A sink that can be used for HTML injection, such as in a jQuery **$()** call. -- **request-forgery**: A sink that controls the URL of a request, such as in a **fetch** call. -- **url-redirection**: A sink that can be used to redirect the user to a malicious URL. -- **unsafe-deserialization**: A deserialization sink that can lead to code execution or other unsafe behaviour, such as an unsafe YAML parser. -- **log-injection**: A sink that can be used for log injection, such as in a **console.log** call. +- ``code-injection``: A sink that can be used to inject code, such as in calls to ``eval``. +- ``command-injection``: A sink that can be used to inject shell commands, such as in calls to ``child_process.spawn``. +- ``path-injection``: A sink that can be used for path injection in a file system access, such as in calls to ``fs.readFile``. +- ``sql-injection``: A sink that can be used for SQL injection, such as in a MySQL ``query`` call. +- ``nosql-injection``: A sink that can be used for NoSQL injection, such as in a MongoDB ``findOne`` call. +- ``html-injection``: A sink that can be used for HTML injection, such as in a jQuery ``$()`` call. +- ``request-forgery``: A sink that controls the URL of a request, such as in a ``fetch`` call. +- ``url-redirection``: A sink that can be used to redirect the user to a malicious URL. +- ``unsafe-deserialization``: A deserialization sink that can lead to code execution or other unsafe behaviour, such as an unsafe YAML parser. +- ``log-injection``: A sink that can be used for log injection, such as in a ``console.log`` call. Summary kinds ~~~~~~~~~~~~~ -- **taint**: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well. -- **value**: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved. +- ``taint``: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well. +- ``value``: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved. .. _threat-models-javascript: diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst index 30888f7b6092..ee4565caff3e 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst @@ -9,7 +9,7 @@ Python analysis can be customized by adding library models in data extension fil A data extension for Python is a YAML file of the form: -.. code-block:: yaml +.. code-block:: yaml extensions: - addsTo: @@ -22,24 +22,27 @@ A data extension for Python is a YAML file of the form: The CodeQL library for Python exposes the following extensible predicates: -- **sourceModel**\(type, path, kind) -- **sinkModel**\(type, path, kind) -- **typeModel**\(type1, type2, path) -- **summaryModel**\(type, path, input, output, kind) +- ``sourceModel(type, path, kind)`` +- ``sinkModel(type, path, kind)`` +- ``typeModel(type1, type2, path)`` +- ``summaryModel(type, path, input, output, kind)`` +- ``barrierModel(type, path, kind)`` +- ``barrierGuardModel(type, path, acceptingValue, kind)`` We'll explain how to use these using a few examples, and provide some reference material at the end of this article. Example: Taint sink in the 'fabric' package ------------------------------------------- -In this example, we'll show how to add the following argument, passed to **sudo** from the **fabric** package, as a command-line injection sink: +In this example, we'll show how to add the following argument, passed to ``sudo`` from the ``fabric`` package, as a command-line injection sink: .. code-block:: python from fabric.operations import sudo sudo(cmd) # <-- add 'cmd' as a taint sink -Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension: +Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the +``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -50,22 +53,20 @@ Note that this sink is already recognized by the CodeQL Python analysis, but for data: - ["fabric", "Member[operations].Member[sudo].Argument[0]", "command-injection"] - -- Since we're adding a new sink, we add a tuple to the **sinkModel** extensible predicate. -- The first column, **"fabric"**, identifies a set of values from which to begin the search for the sink. - The string **"fabric"** means we start at the places where the codebase imports the package **fabric**. +- The first column, ``"fabric"``, identifies a set of values from which to begin the search for the sink. + The string ``"fabric"`` means we start at the places where the codebase imports the package ``fabric``. - The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column. - - **Member[operations]** selects accesses to the **operations** module. - - **Member[sudo]** selects accesses to the **sudo** function in the **operations** module. - - **Argument[0]** selects the first argument to calls to that function. + - ``Member[operations]`` selects accesses to the ``operations`` module. + - ``Member[sudo]`` selects accesses to the ``sudo`` function in the ``operations`` module. + - ``Argument[0]`` selects the first argument to calls to that function. -- **"command-injection"** indicates that this is considered a sink for the command injection query. +- ``"command-injection"`` indicates that this is considered a sink for the command injection query. Example: Taint sink in the 'invoke' package ------------------------------------------- -Often sinks are found as arguments to methods rather than functions. In this example, we'll show how to add the following argument, passed to **run** from the **invoke** package, as a command-line injection sink: +Often sinks are found as arguments to methods rather than functions. In this example, we'll show how to add the following argument, passed to ``run`` from the ``invoke`` package, as a command-line injection sink: .. code-block:: python @@ -73,7 +74,8 @@ Often sinks are found as arguments to methods rather than functions. In this exa c = invoke.Context() c.run(cmd) # <-- add 'cmd' as a taint sink -Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension: +Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the +``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -84,17 +86,17 @@ Note that this sink is already recognized by the CodeQL Python analysis, but for data: - ["invoke", "Member[Context].Instance.Member[run].Argument[0]", "command-injection"] -- The first column, **"invoke"**, begins the search at places where the codebase imports the package **invoke**. +- The first column, ``"invoke"``, begins the search at places where the codebase imports the package ``invoke``. - The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column. - - **Member[Context]** selects accesses to the **Context** class. - - **Instance** selects instances of the **Context** class. - - **Member[run]** selects accesses to the **run** method in the **Context** class. - - **Argument[0]** selects the first argument to calls to that method. + - ``Member[Context]`` selects accesses to the ``Context`` class. + - ``Instance`` selects instances of the ``Context`` class. + - ``Member[run]`` selects accesses to the ``run`` method in the ``Context`` class. + - ``Argument[0]`` selects the first argument to calls to that method. -- **"command-injection"** indicates that this is considered a sink for the command injection query. +- ``"command-injection"`` indicates that this is considered a sink for the command injection query. -Note that the **Instance** component is used to select instances of a class, including instances of its subclasses. +Note that the ``Instance`` component is used to select instances of a class, including instances of its subclasses. Since methods on instances are common targets, we have a more compact syntax for selecting them. The first column, the type, is allowed to contain a dotted path ending in a class name. This will begin the search at instances of that class. Using this syntax, the previous example could be written as: @@ -110,7 +112,7 @@ This will begin the search at instances of that class. Using this syntax, the pr Continued example: Multiple ways to obtain a type ------------------------------------------------- -The invoke package provides multiple ways to obtain a **Context** instance. The following example shows how to add a new way to obtain a **Context** instance: +The invoke package provides multiple ways to obtain a ``Context`` instance. The following example shows how to add a new way to obtain a ``Context`` instance: .. code-block:: python @@ -118,8 +120,9 @@ The invoke package provides multiple ways to obtain a **Context** instance. The c = context.Context() c.run(cmd) # <-- add 'cmd' as a taint sink -Comparing to the previous Python snippet, the **Context** class is now found as **invoke.context.Context** instead of **invoke.Context**. -We could add a data extension similar to the previous one, but with the type **invoke.context.Context**. However, we can also use the **typeModel** extensible predicate to describe how to reach **invoke.Context** from **invoke.context.Context**: +Comparing to the previous Python snippet, the ``Context`` class is now found as ``invoke.context.Context`` instead of ``invoke.Context``. +We could add a data extension similar to the previous one, but with the type ``invoke.context.Context``. +However, we can also use the ``typeModel(type1, type2, path)`` extensible predicate to describe how to reach ``invoke.Context`` from ``invoke.context.Context``: .. code-block:: yaml @@ -130,9 +133,9 @@ We could add a data extension similar to the previous one, but with the type **i data: - ["invoke.Context", "invoke.context.Context", ""] -- The first column, **"invoke.Context"**, is the name of the type to reach. -- The second column, **"invoke.context.Context"**, is the name of the type from which to evaluate the path. -- The third column is just an empty string, indicating that any instance of **invoke.context.Context** is also an instance of **invoke.Context**. +- The first column, ``"invoke.Context"``, is the name of the type to reach. +- The second column, ``"invoke.context.Context"``, is the name of the type from which to evaluate the path. +- The third column is just an empty string, indicating that any instance of ``invoke.context.Context`` is also an instance of ``invoke.Context``. Combining this with the sink model we added earlier, the sink in the example is detected by the model. @@ -141,7 +144,7 @@ Example: Taint sources from Django 'upload_to' argument This example is a bit more advanced, involving both a callback function and a class constructor. The Django web framework allows you to specify a function that determines the path where uploaded files are stored (see the `Django documentation `_). -This function is passed as an argument to the **FileField** constructor. +This function is passed as an argument to the ``FileField`` constructor. The function is called with two arguments: the instance of the model and the filename of the uploaded file. This filename is what we want to mark as a taint source. An example use looks as follows: @@ -156,7 +159,8 @@ This filename is what we want to mark as a taint source. An example use looks as class MyModel(models.Model): upload = models.FileField(upload_to=user_directory_path) # <-- the 'upload_to' parameter defines our custom function -Note that this source is already known by the CodeQL Python analysis, but for this example, you could use the following data extension: +Note that this source is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the +``sourceModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -171,18 +175,16 @@ Note that this source is already known by the CodeQL Python analysis, but for th "remote", ] - -- Since we're adding a new taint source, we add a tuple to the **sourceModel** extensible predicate. -- The first column, **"django.db.models.FileField!"**, is a dotted path to the **FileField** class from the **django.db.models** package. - The **!** at the end of the type name indicates that we are looking for the class itself rather than instances of this class. +- The first column, ``"django.db.models.FileField!"``, is a dotted path to the ``FileField`` class from the ``django.db.models`` package. + The ``!`` at the end of the type name indicates that we are looking for the class itself rather than instances of this class. - The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column. - - - **Call** selects calls to the class. That is, constructor calls. - - **Argument[0,upload_to:]** selects the first positional argument, or the named argument named **upload_to**. Note that the colon at the end of the argument name indicates that we are looking for a named argument. - - **Parameter[1]** selects the second parameter of the callback function, which is the parameter receiving the filename. -- Finally, the kind **"remote"** indicates that this is considered a source of remote flow. + - ``Call`` selects calls to the class. That is, constructor calls. + - ``Argument[0,upload_to:]`` selects the first positional argument, or the named argument named ``upload_to``. Note that the colon at the end of the argument name indicates that we are looking for a named argument. + - ``Parameter[1]`` selects the second parameter of the callback function, which is the parameter receiving the filename. + +- Finally, the kind ``"remote"`` indicates that this is considered a source of remote flow. Example: Adding flow through 're.compile' ---------------------------------------------- @@ -196,7 +198,8 @@ In this example, we'll show how to add flow through calls to ``re.compile``. let y = re.compile(pattern = x); // add value flow from 'x' to 'y.pattern' -Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension: +Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the +``summaryModel(type, path, input, output, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -213,26 +216,25 @@ Note that this flow is already recognized by the CodeQL Python analysis, but for "value", ] - -- Since we're adding flow through a function call, we add a tuple to the **summaryModel** extensible predicate. -- The first column, **"re"**, begins the search for relevant calls at places where the **re** package is imported. -- The second column, **"Member[compile]"**, is a path leading to the function calls we wish to model. - In this case, we select references to the **compile** function from the ``re`` package. -- The third column, **"Argument[0,pattern:]"**, indicates the input of the flow. In this case, either the first argument to the function call or the argument named **pattern**. -- The fourth column, **"ReturnValue.Attribute[pattern]"**, indicates the output of the flow. In this case, the ``pattern`` attribute of the return value of the function call. -- The last column, **"value"**, indicates the kind of flow to add. The value **value** means the input value is unchanged as +- The first column, ``"re"``, begins the search for relevant calls at places where the ``re`` package is imported. +- The second column, ``"Member[compile]"``, is a path leading to the function calls we wish to model. + In this case, we select references to the ``compile`` function from the ``re`` package. +- The third column, ``"Argument[0,pattern:]"``, indicates the input of the flow. In this case, either the first argument to the function call or the argument named ``pattern``. +- The fourth column, ``"ReturnValue.Attribute[pattern]"``, indicates the output of the flow. In this case, the ``pattern`` attribute of the return value of the function call. +- The last column, ``"value"``, indicates the kind of flow to add. The value ``value`` means the input value is unchanged as it flows to the output. Example: Adding flow through 'sorted' ------------------------------------------------- -In this example, we'll show how to add flow through calls to the built-in function **sorted**: +In this example, we'll show how to add flow through calls to the built-in function ``sorted``: .. code-block:: python y = sorted(x) # add taint flow from 'x' to 'y' -Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension: +Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the +``summaryModel(type, path, input, output, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -249,14 +251,12 @@ Note that this flow is already recognized by the CodeQL Python analysis, but for "taint", ] - -- Since we're adding flow through a function call, we add a tuple to the **summaryModel** extensible predicate. -- The first column, **"builtins"**, begins the search for relevant calls among references to the built-in names. - In Python, many built-in functions are available. Technically, most of these are part of the **builtins** package, but they can be accessed without an explicit import. When we write **builtins** in the first column, we will find both the implicit and explicit references to the built-in functions. -- The second column, **"Member[sorted]"**, selects references to the **sorted** function from the **builtins** package; that is, the built-in function **sorted**. -- The third column, **"Argument[0]"**, indicates the input of the flow. In this case, the first argument to the function call. -- The fourth column, **"ReturnValue"**, indicates the output of the flow. In this case, the return value of the function call. -- The last column, **"taint"**, indicates the kind of flow to add. The value **taint** means the output is not necessarily equal +- The first column, ``"builtins"``, begins the search for relevant calls among references to the built-in names. + In Python, many built-in functions are available. Technically, most of these are part of the ``builtins`` package, but they can be accessed without an explicit import. When we write ``builtins`` in the first column, we will find both the implicit and explicit references to the built-in functions. +- The second column, ``"Member[sorted]"``, selects references to the ``sorted`` function from the ``builtins`` package; that is, the built-in function ``sorted``. +- The third column, ``"Argument[0]"``, indicates the input of the flow. In this case, the first argument to the function call. +- The fourth column, ``"ReturnValue"``, indicates the output of the flow. In this case, the return value of the function call. +- The last column, ``"taint"``, indicates the kind of flow to add. The value ``taint`` means the output is not necessarily equal to the input, but was derived from the input in a taint-preserving way. We might also provide a summary stating that the elements of the input list are preserved in the output list: @@ -279,6 +279,64 @@ We might also provide a summary stating that the elements of the input list are The tracking of list elements is imprecise in that the analysis does not know where in the list the tracked value is found. So this summary simply states that if the value is found somewhere in the input list, it will also be found somewhere in the output list, unchanged. +Example: Taint barrier using the 'escape' function +-------------------------------------------------- + +In this example, we'll show how to add the return value of ``html.escape`` as a barrier for XSS. + +.. code-block:: python + + import html + escaped = html.escape(unknown) # The return value of this function is safe for XSS. + +We need to add a tuple to the ``barrierModel(type, path, kind)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/python-all + extensible: barrierModel + data: + - ["html", "Member[escape].ReturnValue", "html-injection"] + +- The first column, ``"html"``, begins the search at places where the ``html`` module is imported. +- The second column, ``Member[escape].ReturnValue``, selects the return value of the ``escape`` function from the ``html`` module. +- The third column, ``"html-injection"``, is the kind of the barrier. + +Example: Add a barrier guard +---------------------------- + +This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. +A barrier guard model is used when a function returns a boolean that indicates whether the data is safe to use. +Consider the function ``url_has_allowed_host_and_scheme`` from the ``django.utils.http`` package which returns ``true`` when the URL is in a safe domain. + +.. code-block:: python + + if url_has_allowed_host_and_scheme(url, allowed_hosts=...): # The check guards the use of 'url', so it is safe. + redirect(url) # This is safe. + +We need to add a tuple to the ``barrierGuardModel(type, path, acceptingValue, kind)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/python-all + extensible: barrierGuardModel + data: + - [ + "django", + "Member[utils].Member[http].Member[url_has_allowed_host_and_scheme].Argument[0,url:]", + "true", + "url-redirection", + ] + +- The first column, ``"django"``, begins the search at places where the ``django`` package is imported. +- The second column, ``Member[utils].Member[http].Member[url_has_allowed_host_and_scheme].Argument[0,url:]``, selects the first argument (or the keyword argument ``url``) of the ``url_has_allowed_host_and_scheme`` function in the ``django.utils.http`` module. This is the value being validated. +- The third column, ``"true"``, is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. +- The fourth column, ``"url-redirection"``, is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope. + Reference material ------------------ @@ -292,9 +350,9 @@ sourceModel(type, path, kind) Adds a new taint source. Most taint-tracking queries will use the new source. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to the source. -- **kind**: Kind of source to add. Currently only **remote** is used. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to the source. +- ``kind``: Kind of source to add. Currently only ``remote`` is used. Example: @@ -312,9 +370,9 @@ sinkModel(type, path, kind) Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to the sink. -- **kind**: Kind of sink to add. See the section on sink kinds for a list of supported kinds. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to the sink. +- ``kind``: Kind of sink to add. See the section on sink kinds for a list of supported kinds. Example: @@ -332,11 +390,11 @@ summaryModel(type, path, input, output, kind) Adds flow through a function call. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to a function call. -- **input**: Path relative to the function call that leads to input of the flow. -- **output**: Path relative to the function call leading to the output of the flow. -- **kind**: Kind of summary to add. Can be **taint** for taint-propagating flow, or **value** for value-preserving flow. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to a function call. +- ``input``: Path relative to the function call that leads to input of the flow. +- ``output``: Path relative to the function call leading to the output of the flow. +- ``kind``: Kind of summary to add. Can be ``taint`` for taint-propagating flow, or ``value`` for value-preserving flow. Example: @@ -358,13 +416,13 @@ Example: typeModel(type1, type2, path) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A description of how to reach **type1** from **type2**. -If this is the only way to reach **type1**, for instance if **type1** is a name we made up to represent the inner workings of a library, we think of this as a definition of **type1**. -In the context of instances, this describes how to obtain an instance of **type1** from an instance of **type2**. +A description of how to reach ``type1`` from ``type2``. +If this is the only way to reach ``type1``, for instance if ``type1`` is a name we made up to represent the inner workings of a library, we think of this as a definition of ``type1``. +In the context of instances, this describes how to obtain an instance of ``type1`` from an instance of ``type2``. -- **type1**: Name of the type to reach. -- **type2**: Name of the type from which to evaluate **path**. -- **path**: Access path leading from **type2** to **type1**. +- ``type1``: Name of the type to reach. +- ``type2``: Name of the type from which to evaluate ``path``. +- ``path``: Access path leading from ``type2`` to ``type1``. Example: @@ -386,40 +444,40 @@ Types A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. -A type can be defined by adding **typeModel** tuples for that type. Additionally, the following built-in types are available: +A type can be defined by adding ``typeModel`` tuples for that type. Additionally, the following built-in types are available: -- The name of a package matches imports of that package. For example, the type **django** matches the expression **import django**. -- The type **builtins** identifies the builtins package. In Python, all built-in values are found in this package, so they can be identified using this type. -- A dotted path ending in a class name identifies instances of that class. If the suffix **!** is added, the type refers to the class itself. +- The name of a package matches imports of that package. For example, the type ``django`` matches the expression ``import django``. +- The type ``builtins`` identifies the builtins package. In Python, all built-in values are found in this package, so they can be identified using this type. +- A dotted path ending in a class name identifies instances of that class. If the suffix ``!`` is added, the type refers to the class itself. Access paths ------------ -The **path**, **input**, and **output** columns consist of a **.**-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values. +The ``path``, ``input``, and ``output`` columns consist of a ``.``-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values. The following components are supported: -- **Argument[**\ ``number``\ **]** selects the argument at the given index. -- **Argument[**\ ``name``:\ **]** selects the argument with the given name. -- **Argument[this]** selects the receiver of a method call. -- **Parameter[**\ ``number``\ **]** selects the parameter at the given index. -- **Parameter[**\ ``name``:\ **]** selects the named parameter with the given name. -- **Parameter[this]** selects the **this** parameter of a function. -- **ReturnValue** selects the return value of a function or call. -- **Member[**\ ``name``\ **]** selects the function/method/class/value with the given name. -- **Instance** selects instances of a class, including instances of its subclasses. -- **Attribute[**\ ``name``\ **]** selects the attribute with the given name. -- **ListElement** selects an element of a list. -- **SetElement** selects an element of a set. -- **TupleElement[**\ ``number``\ **]** selects the subscript at the given index. -- **DictionaryElement[**\ ``name``\ **]** selects the subscript at the given name. +- ``Argument[``\ ``number``\ ``]`` selects the argument at the given index. +- ``Argument[``\ ``name``:\ ``]`` selects the argument with the given name. +- ``Argument[this]`` selects the receiver of a method call. +- ``Parameter[``\ ``number``\ ``]`` selects the parameter at the given index. +- ``Parameter[``\ ``name``:\ ``]`` selects the named parameter with the given name. +- ``Parameter[this]`` selects the ``this`` parameter of a function. +- ``ReturnValue`` selects the return value of a function or call. +- ``Member[``\ ``name``\ ``]`` selects the function/method/class/value with the given name. +- ``Instance`` selects instances of a class, including instances of its subclasses. +- ``Attribute[``\ ``name``\ ``]`` selects the attribute with the given name. +- ``ListElement`` selects an element of a list. +- ``SetElement`` selects an element of a set. +- ``TupleElement[``\ ``number``\ ``]`` selects the subscript at the given index. +- ``DictionaryElement[``\ ``name``\ ``]`` selects the subscript at the given name. Additional notes about the syntax of operands: -- Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, **Member[foo,bar]** matches the union of **Member[foo]** and **Member[bar]**. -- Numeric operands to **Argument**, **Parameter**, and **WithArity** may be given as an interval. For example, **Argument[0..2]** matches argument 0, 1, or 2. -- **Argument[N-1]** selects the last argument of a call, and **Parameter[N-1]** selects the last parameter of a function, with **N-2** being the second-to-last and so on. +- Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, ``Member[foo,bar]`` matches the union of ``Member[foo]`` and ``Member[bar]``. +- Numeric operands to ``Argument``, ``Parameter``, and ``WithArity`` may be given as an interval. For example, ``Argument[0..2]`` matches argument 0, 1, or 2. +- ``Argument[N-1]`` selects the last argument of a call, and ``Parameter[N-1]`` selects the last parameter of a function, with ``N-2`` being the second-to-last and so on. Kinds ----- @@ -434,21 +492,21 @@ Sink kinds Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query. -- **code-injection**: A sink that can be used to inject code, such as in calls to **exec**. -- **command-injection**: A sink that can be used to inject shell commands, such as in calls to **os.system**. -- **path-injection**: A sink that can be used for path injection in a file system access, such as in calls to **flask.send_from_directory**. -- **sql-injection**: A sink that can be used for SQL injection, such as in a MySQL **query** call. -- **html-injection**: A sink that can be used for HTML injection, such as a server response body. -- **js-injection**: A sink that can be used for JS injection, such as a server response body. -- **url-redirection**: A sink that can be used to redirect the user to a malicious URL. -- **unsafe-deserialization**: A deserialization sink that can lead to code execution or other unsafe behavior, such as an unsafe YAML parser. -- **log-injection**: A sink that can be used for log injection, such as in a **logging.info** call. +- ``code-injection``: A sink that can be used to inject code, such as in calls to ``exec``. +- ``command-injection``: A sink that can be used to inject shell commands, such as in calls to ``os.system``. +- ``path-injection``: A sink that can be used for path injection in a file system access, such as in calls to ``flask.send_from_directory``. +- ``sql-injection``: A sink that can be used for SQL injection, such as in a MySQL ``query`` call. +- ``html-injection``: A sink that can be used for HTML injection, such as a server response body. +- ``js-injection``: A sink that can be used for JS injection, such as a server response body. +- ``url-redirection``: A sink that can be used to redirect the user to a malicious URL. +- ``unsafe-deserialization``: A deserialization sink that can lead to code execution or other unsafe behavior, such as an unsafe YAML parser. +- ``log-injection``: A sink that can be used for log injection, such as in a ``logging.info`` call. Summary kinds ~~~~~~~~~~~~~ -- **taint**: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well. -- **value**: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved. +- ``taint``: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well. +- ``value``: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved. .. _threat-models-python: diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-ruby.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-ruby.rst index 23a6bd419f5d..db041a521514 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-ruby.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-ruby.rst @@ -23,24 +23,27 @@ A data extension for Ruby is a YAML file of the form: The CodeQL library for Ruby exposes the following extensible predicates: -- **sourceModel**\(type, path, kind) -- **sinkModel**\(type, path, kind) -- **typeModel**\(type1, type2, path) -- **summaryModel**\(type, path, input, output, kind) +- ``sourceModel(type, path, kind)`` +- ``sinkModel(type, path, kind)`` +- ``typeModel(type1, type2, path)`` +- ``summaryModel(type, path, input, output, kind)`` +- ``barrierModel(type, path, kind)`` +- ``barrierGuardModel(type, path, acceptingValue, kind)`` We'll explain how to use these using a few examples, and provide some reference material at the end of this article. Example: Taint sink in the 'tty-command' gem -------------------------------------------- -In this example, we'll show how to add the following argument, passed to **tty-command**, as a command-line injection sink: +In this example, we'll show how to add the following argument, passed to ``tty-command``, as a command-line injection sink: .. code-block:: ruby tty = TTY::Command.new tty.run(cmd) # <-- add 'cmd' as a taint sink -For this example, you can use the following data extension: + +We need to add a tuple to the ``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -52,15 +55,14 @@ For this example, you can use the following data extension: - ["TTY::Command", "Method[run].Argument[0]", "command-injection"] -- Since we're adding a new sink, we add a tuple to the **sinkModel** extensible predicate. -- The first column, **"TTY::Command"**, identifies a set of values from which to begin the search for the sink. - The string **"TTY::Command""** means we start at the places where the codebase constructs instances of the class **TTY::Command**. +- The first column, ``"TTY::Command"``, identifies a set of values from which to begin the search for the sink. + The string ``"TTY::Command""`` means we start at the places where the codebase constructs instances of the class ``TTY::Command``. - The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column. - - **Method[run]** selects calls to the **run** method of the **TTY::Command** class. - - **Argument[0]** selects the first argument to calls to that member. + - ``Method[run]`` selects calls to the ``run`` method of the ``TTY::Command`` class. + - ``Argument[0]`` selects the first argument to calls to that member. -- **command-injection** indicates that this is considered a sink for the command injection query. +- ``command-injection`` indicates that this is considered a sink for the command injection query. Example: Taint sources from 'sinatra' block parameters ------------------------------------------------------ @@ -75,7 +77,7 @@ In this example, we'll show how the 'x' parameter below could be marked as a rem end end -For this example you could use the following data extension: +We need to add a tuple to the ``sourceModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -90,13 +92,12 @@ For this example you could use the following data extension: "remote", ] -- Since we're adding a new taint source, we add a tuple to the **sourceModel** extensible predicate. -- The first column, **"Sinatra::Base!"**, begins the search at references to the **Sinatra::Base** class. - The **!** suffix indicates that we want to search for references to the class itself, rather than instances of the class. -- **Method[get]** selects calls to the **get** method of the **Sinatra::Base** class. -- **Argument[block]** selects the block argument to the **get** method call. -- **Parameter[0]** selects the first parameter of the block argument (the parameter named **x**). -- Finally, the kind **remote** indicates that this is considered a source of remote flow. +- The first column, ``"Sinatra::Base!"``, begins the search at references to the ``Sinatra::Base`` class. + The ``!`` suffix indicates that we want to search for references to the class itself, rather than instances of the class. +- ``Method[get]`` selects calls to the ``get`` method of the ``Sinatra::Base`` class. +- ``Argument[block]`` selects the block argument to the ``get`` method call. +- ``Parameter[0]`` selects the first parameter of the block argument (the parameter named ``x``). +- Finally, the kind ``remote`` indicates that this is considered a source of remote flow. Example: Using types to add MySQL injection sinks ------------------------------------------------- @@ -110,7 +111,7 @@ In this example, we'll show how to add the following SQL injection sink: client.query(q) # <-- add 'q' as a SQL injection sink end -We can recognize this using the following extension: +We need to add a tuple to the ``sinkModel(type, path, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -121,16 +122,16 @@ We can recognize this using the following extension: data: - ["Mysql2::Client", "Method[query].Argument[0]", "sql-injection"] -- The first column, **"Mysql2::Client"**, begins the search at any instance of the **Mysql2::Client** class. -- **Method[query]** selects any call to the **query** method on that instance. -- **Argument[0]** selects the first argument to the method call. -- **sql-injection** indicates that this is considered a sink for the SQL injection query. +- The first column, ``"Mysql2::Client"``, begins the search at any instance of the ``Mysql2::Client`` class. +- ``Method[query]`` selects any call to the ``query`` method on that instance. +- ``Argument[0]`` selects the first argument to the method call. +- ``sql-injection`` indicates that this is considered a sink for the SQL injection query. Continued example: Using type models ------------------------------------ Consider this variation on the previous example, the mysql2 EventMachine API is used. -The client is obtained via a call to **Mysql2::EM::Client.new**. +The client is obtained via a call to ``Mysql2::EM::Client.new``. .. code-block:: ruby @@ -139,10 +140,10 @@ The client is obtained via a call to **Mysql2::EM::Client.new**. client.query(q) end -So far we have only one model for **Mysql2::Client**, but in the real world we -may have many models for the various methods available. Because **Mysql2::EM::Client** is a subclass of **Mysql2::Client**, it inherits all of the same methods. -Instead of updating all our models to include both classes, we can add a type -model to indicate that **Mysql2::EM::Client** is a subclass of **Mysql2::Client**: +So far we have only one model for ``Mysql2::Client``, but in the real world we +may have many models for the various methods available. Because ``Mysql2::EM::Client`` is a subclass of ``Mysql2::Client``, it inherits all of the same methods. +Instead of updating all our models to include both classes, we can add a tuple to the ``typeModel(type, subtype, ext)`` extensible predicate to indicate that +``Mysql2::EM::Client`` is a subclass of ``Mysql2::Client``: .. code-block:: yaml @@ -162,7 +163,7 @@ In this example, we'll show how to add flow through calls to 'URI.decode_uri_com y = URI.decode_uri_component(x); # add taint flow from 'x' to 'y' -We can model this using the following data extension: +We need to add a tuple to the ``summaryModel(type, path, input, output, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -179,28 +180,26 @@ We can model this using the following data extension: "taint", ] - -- Since we're adding flow through a method call, we add a tuple to the **summaryModel** extensible predicate. -- The first column, **"URI!"**, begins the search for relevant calls at references to the **URI** class. -- The **!** suffix indicates that we are looking for the class itself, rather than instances of the class. -- The second column, **Method[decode_uri_component]**, is a path leading to the method calls we wish to model. - In this case, we select references to the **decode_uri_component** method from the **URI** class. -- The third column, **Argument[0]**, indicates the input of the flow. In this case, the first argument to the method call. -- The fourth column, **ReturnValue**, indicates the output of the flow. In this case, the return value of the method call. -- The last column, **taint**, indicates the kind of flow to add. The value **taint** means the output is not necessarily equal +- The first column, ``"URI!"``, begins the search for relevant calls at references to the ``URI`` class. + The ``!`` suffix indicates that we are looking for the class itself, rather than instances of the class. +- The second column, ``Method[decode_uri_component]``, is a path leading to the method calls we wish to model. + In this case, we select references to the ``decode_uri_component`` method from the ``URI`` class. +- The third column, ``Argument[0]``, indicates the input of the flow. In this case, the first argument to the method call. +- The fourth column, ``ReturnValue``, indicates the output of the flow. In this case, the return value of the method call. +- The last column, ``taint``, indicates the kind of flow to add. The value ``taint`` means the output is not necessarily equal to the input, but was derived from the input in a taint-preserving way. Example: Adding flow through 'File#each' ---------------------------------------- -In this example, we'll show how to add flow through calls to **File#each** from the standard library, which iterates over the lines of a file: +In this example, we'll show how to add flow through calls to ``File#each`` from the standard library, which iterates over the lines of a file: .. code-block:: ruby f = File.new("example.txt") f.each { |line| ... } # add taint flow from `f` to `line` -We can model this using the following data extension: +We need to add a tuple to the ``summaryModel(type, path, input, output, kind)`` extensible predicate by updating a data extension file. .. code-block:: yaml @@ -217,18 +216,73 @@ We can model this using the following data extension: "taint", ] - -- Since we're adding flow through a method call, we add a tuple to the **summaryModel** extensible predicate. -- The first column, **"File"**, begins the search for relevant calls at places where the **File** class is used. -- The second column, **Method[each]**, selects references to the **each** method on the **File** class. -- The third column specifies the input of the flow. **Argument[self]** selects the **self** argument of **each**, which is the **File** instance being iterated over. +- The first column, ``"File"``, begins the search for relevant calls at places where the ``File`` class is used. +- The second column, ``Method[each]``, selects references to the ``each`` method on the ``File`` class. +- The third column specifies the input of the flow. ``Argument[self]`` selects the ``self`` argument of ``each``, which is the ``File`` instance being iterated over. - The fourth column specifies the output of the flow: - - **Argument[block]** selects the block argument of **each** (the block which is executed for each line in the file). - - **Parameter[0]** selects the first parameter of the block (the parameter named **line**). + - ``Argument[block]`` selects the block argument of ``each`` (the block which is executed for each line in the file). + - ``Parameter[0]`` selects the first parameter of the block (the parameter named ``line``). + +- The last column, ``taint``, indicates the kind of flow to add. + +Example: Taint barrier using the 'escape' method +------------------------------------------------ + +In this example, we'll show how to add the return value of ``Mysql2::Client#escape`` as a barrier for SQL injection. + +.. code-block:: ruby + + client = Mysql2::Client.new + escaped = client.escape(input) # The return value of this method is safe for SQL injection. + client.query("SELECT * FROM users WHERE name = '#{escaped}'") + +We need to add a tuple to the ``barrierModel(type, path, kind)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/ruby-all + extensible: barrierModel + data: + - ["Mysql2::Client!", "Method[escape].ReturnValue", "sql-injection"] + +- The first column, ``"Mysql2::Client!"``, begins the search for relevant calls at references to the ``Mysql2::Client`` class. + The ``!`` suffix indicates that we want to search for references to the class itself, rather than instances of the class. +- The second column, ``"Method[escape].ReturnValue"``, selects the return value of the ``escape`` method. +- The third column, ``"sql-injection"``, is the kind of the barrier. + +Example: Add a barrier guard +---------------------------- + +This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. +Consider a validation method ``Validator.is_safe`` which returns ``true`` when the data is considered safe. + +.. code-block:: ruby + + if Validator.is_safe(user_input) + # The check guards the use, so the input is safe. + client.query("SELECT * FROM users WHERE name = '#{user_input}'") + end + +We need to add a tuple to the ``barrierGuardModel(type, path, acceptingValue, kind)`` extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/ruby-all + extensible: barrierGuardModel + data: + - ["Validator!", "Method[is_safe].Argument[0]", "true", "sql-injection"] -- The last column, **taint**, indicates the kind of flow to add. +- The first column, ``"Validator!"``, begins the search at references to the ``Validator`` class. + The ``!`` suffix indicates that we want to search for references to the class itself, rather than instances of the class. +- The second column, ``"Method[is_safe].Argument[0]"``, selects the first argument of the ``is_safe`` method. This is the value being validated. +- The third column, ``"true"``, is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. +- The fourth column, ``"sql-injection"``, is the kind of the barrier guard. Reference material ------------------ @@ -243,9 +297,9 @@ sourceModel(type, path, kind) Adds a new taint source. Most taint-tracking queries will use the new source. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to the source. -- **kind**: Kind of source to add. Currently only **remote** is used. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to the source. +- ``kind``: Kind of source to add. Currently only ``remote`` is used. Example: @@ -263,9 +317,9 @@ sinkModel(type, path, kind) Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to the sink. -- **kind**: Kind of sink to add. See the section on sink kinds for a list of supported kinds. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to the sink. +- ``kind``: Kind of sink to add. See the section on sink kinds for a list of supported kinds. Example: @@ -283,11 +337,11 @@ summaryModel(type, path, input, output, kind) Adds flow through a method call. -- **type**: Name of a type from which to evaluate **path**. -- **path**: Access path leading to a method call. -- **input**: Path relative to the method call that leads to input of the flow. -- **output**: Path relative to the method call leading to the output of the flow. -- **kind**: Kind of summary to add. Can be **taint** for taint-propagating flow, or **value** for value-preserving flow. +- ``type``: Name of a type from which to evaluate ``path``. +- ``path``: Access path leading to a method call. +- ``input``: Path relative to the method call that leads to input of the flow. +- ``output``: Path relative to the method call leading to the output of the flow. +- ``kind``: Kind of summary to add. Can be ``taint`` for taint-propagating flow, or ``value`` for value-preserving flow. Example: @@ -311,9 +365,9 @@ typeModel(type1, type2, path) Adds a new definition of a type. -- **type1**: Name of the type to define. -- **type2**: Name of the type from which to evaluate **path**. -- **path**: Access path leading from **type2** to **type1**. +- ``type1``: Name of the type to define. +- ``type2``: Name of the type from which to evaluate ``path``. +- ``path``: Access path leading from ``type2`` to ``type1``. Example: @@ -335,44 +389,44 @@ Types A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. -A type can be defined by adding **typeModel** tuples for that type. +A type can be defined by adding ``typeModel`` tuples for that type. Access paths ------------ -The **path**, **input**, and **output** columns consist of a **.**-separated list of components, which is evaluated from left to right, +The ``path``, ``input``, and ``output`` columns consist of a ``.``-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values. The following components are supported: -- **Argument[**\ `number`\ **]** selects the argument at the given index. -- **Argument[**\ `string`:\ **]** selects the keyword argument with the given name. -- **Argument[self]** selects the receiver of a method call. -- **Argument[block]** selects the block argument. -- **Argument[any]** selects any argument, except self or block arguments. -- **Argument[any-named]** selects any keyword argument. -- **Argument[hash-splat]** selects a special argument representing all keyword arguments passed in the method call. -- **Parameter[**\ `number`\ **]** selects the argument at the given index. -- **Parameter[**\ `string`:\ **]** selects the keyword argument with the given name. -- **Parameter[self]** selects the **self** parameter of a method. -- **Parameter[block]** selects the block parameter. -- **Parameter[any]** selects any parameter, except self or block parameters. -- **Parameter[any-named]** selects any keyword parameter. -- **Parameter[hash-splat]** selects the hash splat parameter, often written as **\*\*kwargs**. -- **ReturnValue** selects the return value of a call. -- **Method[**\ `name`\ **]** selects a call to the method with the given name. -- **Element[any]** selects any element of an array or hash. -- **Element[**\ `number`\ **]** selects an array element at the given index. -- **Element[**\ `string`\ **]** selects a hash element at the given key. -- **Field[@**\ `string`\ **]** selects an instance variable with the given name. -- **Fuzzy** selects all values that are derived from the current value through a combination of the other operations described in this list. +- ``Argument[``\ `number`\ ``]`` selects the argument at the given index. +- ``Argument[``\ `string`:\ ``]`` selects the keyword argument with the given name. +- ``Argument[self]`` selects the receiver of a method call. +- ``Argument[block]`` selects the block argument. +- ``Argument[any]`` selects any argument, except self or block arguments. +- ``Argument[any-named]`` selects any keyword argument. +- ``Argument[hash-splat]`` selects a special argument representing all keyword arguments passed in the method call. +- ``Parameter[``\ `number`\ ``]`` selects the argument at the given index. +- ``Parameter[``\ `string`:\ ``]`` selects the keyword argument with the given name. +- ``Parameter[self]`` selects the ``self`` parameter of a method. +- ``Parameter[block]`` selects the block parameter. +- ``Parameter[any]`` selects any parameter, except self or block parameters. +- ``Parameter[any-named]`` selects any keyword parameter. +- ``Parameter[hash-splat]`` selects the hash splat parameter, often written as **\*\*kwargs**. +- ``ReturnValue`` selects the return value of a call. +- ``Method[``\ `name`\ ``]`` selects a call to the method with the given name. +- ``Element[any]`` selects any element of an array or hash. +- ``Element[``\ `number`\ ``]`` selects an array element at the given index. +- ``Element[``\ `string`\ ``]`` selects a hash element at the given key. +- ``Field[@``\ `string`\ ``]`` selects an instance variable with the given name. +- ``Fuzzy`` selects all values that are derived from the current value through a combination of the other operations described in this list. For example, this can be used to find all values that appear to originate from a particular class. This can be useful for finding method calls from a known class, but where the receiver type is not known or is difficult to model. Additional notes about the syntax of operands: -- Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, **Method[foo,bar]** matches the union of **Method[foo]** and **Method[bar]**. -- Numeric operands to **Argument**, **Parameter**, and **Element** may be given as a lower bound. For example, **Argument[1..]** matches all arguments except 0. +- Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, ``Method[foo,bar]`` matches the union of ``Method[foo]`` and ``Method[bar]``. +- Numeric operands to ``Argument``, ``Parameter``, and ``Element`` may be given as a lower bound. For example, ``Argument[1..]`` matches all arguments except 0. Kinds ----- @@ -380,7 +434,7 @@ Kinds Source kinds ~~~~~~~~~~~~ -- **remote**: A generic source of remote flow. Most taint-tracking queries will use such a source. Currently this is the only supported source kind. +- ``remote``: A generic source of remote flow. Most taint-tracking queries will use such a source. Currently this is the only supported source kind. Sink kinds ~~~~~~~~~~ @@ -388,15 +442,15 @@ Sink kinds Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query. -- **code-injection**: A sink that can be used to inject code, such as in calls to **eval**. -- **command-injection**: A sink that can be used to inject shell commands, such as in calls to **Process.spawn**. -- **path-injection**: A sink that can be used for path injection in a file system access, such as in calls to **File.open**. -- **sql-injection**: A sink that can be used for SQL injection, such as in an ActiveRecord **where** call. -- **url-redirection**: A sink that can be used to redirect the user to a malicious URL. -- **log-injection**: A sink that can be used for log injection, such as in a **Rails.logger** call. +- ``code-injection``: A sink that can be used to inject code, such as in calls to ``eval``. +- ``command-injection``: A sink that can be used to inject shell commands, such as in calls to ``Process.spawn``. +- ``path-injection``: A sink that can be used for path injection in a file system access, such as in calls to ``File.open``. +- ``sql-injection``: A sink that can be used for SQL injection, such as in an ActiveRecord ``where`` call. +- ``url-redirection``: A sink that can be used to redirect the user to a malicious URL. +- ``log-injection``: A sink that can be used for log injection, such as in a ``Rails.logger`` call. Summary kinds ~~~~~~~~~~~~~ -- **taint**: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well. -- **value**: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved. +- ``taint``: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well. +- ``value``: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved. diff --git a/docs/ql-libraries/dataflow/dataflow.md b/docs/ql-libraries/dataflow/dataflow.md index ff7c71abfaec..7227f7cfd0fd 100644 --- a/docs/ql-libraries/dataflow/dataflow.md +++ b/docs/ql-libraries/dataflow/dataflow.md @@ -279,7 +279,7 @@ Content ContentSet::getAReadContent(); which means that a `storeStep(n1, cs, n2)` will be interpreted as storing into _any_ of `cs.getAStoreContent()`, and dually that a `readStep(n1, cs, n2)` will be interpreted as reading from _any_ of `cs.getAReadContent()`. In most cases, there -will be a simple bijection between `ContentSet` and `Content`, but when modelling +will be a simple bijection between `ContentSet` and `Content`, but when modeling for example flow through arrays it can be more involved (see [Example 4](#example-4)). It generally makes sense for stores to target `PostUpdateNode`s, but this is not a strict diff --git a/go/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/go/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..ee1b51de861f --- /dev/null +++ b/go/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for Go](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-go/). diff --git a/java/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/java/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..f8bcbb1fcb2a --- /dev/null +++ b/java/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for Java and Kotlin](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-java-and-kotlin/). diff --git a/javascript/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/javascript/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..d849f4c0c698 --- /dev/null +++ b/javascript/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for JavaScript](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-javascript/). diff --git a/python/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/python/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..522801a0e46d --- /dev/null +++ b/python/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for Python](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-python/). diff --git a/ruby/ql/docs/flow_summaries.md b/ruby/ql/docs/flow_summaries.md index bb5fe5d71787..e588bdcaf26b 100644 --- a/ruby/ql/docs/flow_summaries.md +++ b/ruby/ql/docs/flow_summaries.md @@ -39,7 +39,7 @@ If `preservesValue = true` then value flow is propagated. If it is `false` then only taint flow is propagated. Any call to `chomp` in the database will be translated, in the dataflow graph, -to a call to this fake definition. +to a call to this fake definition. `input` and `output` define the "from" and "to" locations in the flow summary. They use a custom string-based syntax which is similar to that used in `path` @@ -232,7 +232,7 @@ preceding access path. It takes the same specifiers as `WithElement` and `Element`. It is only valid in an input path. This component has the effect of excluding the relevant elements when copying -from input to output. It is useful for modelling methods that remove elements +from input to output. It is useful for modeling methods that remove elements from a collection. For example to model a method that removes the first element from the receiver, we can do so like this: diff --git a/ruby/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/ruby/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..da53d584e11d --- /dev/null +++ b/ruby/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for Ruby](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-ruby/). diff --git a/rust/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md b/rust/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md new file mode 100644 index 000000000000..5e97a1533a9e --- /dev/null +++ b/rust/ql/lib/change-notes/2026-03-20-data-extensions-barriers.md @@ -0,0 +1,4 @@ +--- +category: feature +--- +* Data flow barriers and barrier guards can now be added using data extensions.