Pipeline Rules

Rules are the cornerstone of processing pipelines. They contain the logic about how to change, enrich, route, and drop messages.

To avoid the complexities of a complete programming language, Graylog supports a small rule language to express processing logic. The rule language is intentionally limited to allow for easier understanding, faster learning, and better runtime optimization.

The real work of rules is done in functions, which are completely pluggable. Graylog already ships with a great number of built-in functions, providing data conversion, string manipulation, data retrieval using lookup tables, JSON parsing, and much more.

Often special purpose functions are written and shared by the community, enabling faster innovation and problem solving than previously possible.

Rule Structure

Building upon the example in the Pipeline documentation, let’s look at similar example rules.

Example 1

Copy
rule "has firewall fields"
when
    has_field("src_ip") && has_field("dst_ip")
then
end

Example 2

Copy
rule "from firewall subnet"
when
    cidr_match("10.10.10.0/24", to_ip($message.gl2_remote_ip))
then
end

Firstly, the rule structure follows a simple when, then pattern (except for the rule naming). In the when clause we specify a boolean expression which is evaluated in the context of the current message in the pipeline. These are the conditions used by the pipeline processor to determine whether to run a rule, and collectively (when evaluating the containing stage’s match all or match any requirement) whether to continue in a pipeline.

Note that the has firewall fields rule uses the built-in function has_field to check whether the message has the src_ip and dst_ip fields, as we want to use them in a later stage of the pipeline. This rule has no actions to run in its then clause, since we only want to use it to determine whether subsequent stages should run.

The second rule, from firewall subnet uses the built-in function, which takes a CIDR pattern and an IP address. In this case, we reference a field from the currently-processed message using the message reference syntax $message.

Graylog always sets the gl2_remote_ip field on messages, so we don’t need to check whether that field exists. If we wanted to use a field that might not exist on all messages we’d first use the has_field function to ensure its presence.

Note the call to to_ip around the gl2_remote_ip field reference. This is necessary since the field is stored as a string internally, and cidr_match requires an IP address object for its ip parameter.

Requiring an explicit conversion to an IP address object demonstrates an important feature of Graylog’s rule language, which is enforcement of type safety to ensure that you end up with the data in the correct format. All too often everything is treated as a string, which wastes enormous amounts of cycles on data conversion and prevents proper analysis of the data.

We again have no actions to run since we’re just using the rule to manage the pipeline’s flow, so the then block is empty.

You might be wondering why we didn’t just combine the has firewall fields and from firewall subnet rules, since they seem to be serving the same purpose. While we could absolutely do so, recall that rules are intended to be reusable building blocks. Imagine you have another pipeline for a different firewall subnet. Rather than duplicating the logic to check for src_ip and dst_ip, and updating each rule if anything ever changes (e.g. additional fields), you can simply add the has firewall fields rule to your new stage. With this approach you only need to update a single rule, with the change immediately taking effect for all pipelines referencing it. Nice!

Rule Simulation

This optional feature allows you to test a single rule in isolation from the pipeline. It allows for more focused examination while developing or debugging rules. For example, while debugging, it is easier to pinpoint a problem if each rule can be examined and modified independently.

In order to use the rule simulator:

  1. Navigate to Pipelines in the System menu.

  2. Go to the Manage Rules page and click on an existing rule.

  3. Scroll down and click on Run Rule Simulation .

  4. Enter a single message string in the box under Rule Simulation.

  5. The rule will be applied to the given input. It will provide a processed message for your reference.

  6. You may then either reset the simulator to run another message string or update the rule if you have chosen to make modifications.

Data Types

As we have seen in the previous section, we need to make sure to use the proper data types when calling functions.

Graylog’s rule language parser rejects invalid use of types, making it safe to write rules.

The six built-in types in Graylog are string (a UTF-8 string), double (corresponds to Java’s Double), long (Java’s Long), boolean (Boolean), void (indicating a function has no return value to prevent it being used in a condition), and ip (a subset of InetAddress), but plugins are free to add additional types as they see fit. The rule processor takes care of ensuring that values and functions agree on the types being used.

By convention, functions that convert types start with the prefix to_. Please refer to the functions index for a full list.

Hint: Before using the value of a message field, always convert it to the intended type with one of the to_ functions .

Additionally, if you have pipeline rule arguments with special characters, you can escape them with the back tick operator.

Example

Copy
set_field("timestamp", to_string(`$message.@extracted_timestamp`));

Conditions

In Graylog’s rules the when clause is a boolean expression, which is evaluated against the processed message.

Expressions support the common boolean operators AND (or &&), OR (||), NOT (!), and comparison operators (<;, <=, >, >=,==, !=).

Any function that returns a value can be called in the when clause, but it must eventually evaluate to a boolean. For example: we were able to use to_ip in the from firewall subnet since it was being passed to cidr_match, which returns a boolean, but could not use route_to_stream since it doesn’t return a value.

The condition must not be empty, but can simply consist of the boolean literal true. This is useful when you always want to execute a rule’s actions.

If a condition calls a function which is not present (perhaps due to a typo or missing plugin) the call evaluates to false.

Hint: Comparing two fields can be done when you use the same data type, e.g. to_string($message.src_ip)==to_string($message.dst_ip) will compare the two strings and will become true on match. Comparing different data types evaluates to false.

Actions

A rule’s then clause contains a list of actions which are evaluated in the order they appear.

There are two different types of actions:

  • Function calls
  • Variable assignments

Function calls look exactly like they do in conditions. All functions, including those which do not return a value, may be used in the then clause.

Variable assignments have the following form:

Copy
let name = value;

Variables are useful to avoid recomputing expensive parsing of data, holding on to temporary values, or making rules more readable.

Variables need to be defined before they can be used. Their fields (if any) can be accessed using the name.field notation in any place where a value of the field’s type is required.

Reserved Words

The following literals (case insensitive) should not be used as variable names in a pipeline rule as they are reserved tokens in the rule language parser:

All
Either
Pass
And
Or
Not
Pipeline
Rule
During
Stage
When
Then
End
Let
Match

For example, using this statement let match = regex(a,b); will result in an error due to the use of the variable name match.

The list of actions can be empty, in which case the rule is essentially a pluggable condition to help manage a pipeline’s processing flow.