Pipeline Rule Logic

Pipelines are made up of pipeline rules, which are logical expressions that allow you to inspect, transform, and route log data according to specified criteria before the data is stored or indexed.

Graylog supports a domain-specific language (DSL) to express processing logic. This language is highly controlled for easier understanding and better runtime optimization. Each pipeline rule expresses a condition and an action, which together determine how incoming messages should be processed. Additionally, understanding data types is crucial when writing conditions and actions in pipeline rules. Data types refer to the kind of value a field or variable holds and how it can be manipulated within the rule's logic.

The building blocks of pipeline rules are functions, which are methods to perform actions or transformations on log messages, allowing you to customize pipeline rules. Graylog supports many built-in functions, providing data conversion, string manipulation, data retrieval using lookup tables, JSON parsing, and much more.

Pipeline rules are built primarily via the rule builder interface, which can also aid you in testing rules before deployment.

Hint: You may also create rules manually by using the source code editor.

In this article, we will review a sample pipeline and pipeline rule structure, how to apply conditions and actions to a rule, and what data types apply to pipeline rules. Once you understand the basic structure and application of pipeline rules, we recommend you review Build Pipeline Rules for specific instructions on how to create new pipeline rules in Graylog.

Rule Structure

To understand how pipelines and pipeline rules are built, let's explore an example pipeline and break down the rules that compose it.

Example Pipeline

Internally, pipelines and the rules they are built on are represented as a logical sequence of actions. Let’s look at an example and review what each action does:

Copy
pipeline "My new pipeline"
stage 1 match all
  rule "has firewall fields";
  rule "from firewall subnet";
stage 2 match either
  rule "geocode IPs";
  rule "anonymize source IPs";
end

This pipeline declares a new pipeline named My new pipeline, which has two stages.

Stages are run in the order of their given priority. Stage priorities can be any integer, positive or negative, you prefer. In our example, the first stage has a priority of 1, and the second stage a priority of 2. Staging based upon priority gives you the ability to run certain rules before or after others, which might exist in other connected pipelines, without modifying those other connected pipelines. This is particularly handy when dealing with changing data formats.

Stages then list which rule you want to be executed as well as the rules’ conditions that must be satisfied to continue running the pipeline. In our example, imagine that the rule has firewall fields checks for the presence of message fields src_ip and dst_ip but does not have any actions to run. For a message without both fields, the rule’s condition would evaluate to false, and the pipeline would abort after stage 1 as the stage requires all rules be satisfied (i.e. match all). With the pipeline aborted, stage 2 would not run.

match either acts as an OR operator, only requiring a single rule’s condition to evaluate to true in order to continue processing. Note that actions still run for all matching rules in the stage, even if it is the final stage in the pipeline.

Rules are referred to by their unique names and can therefore be shared among many different pipelines. The intent is to enable you to create reusable building blocks, making it easier to process the data specific to your organization or use case.

Now, let's explore the individual rules themselves and their syntax.

Example Pipeline Rules

Consider the following example rules contained in the above pipeline: 

Example Rule 1

Copy
rule "has firewall fields"
when
    has_field("src_ip") && has_field("dst_ip")
then
end

Example Rule 2

Copy
rule "from firewall subnet"
when
    cidr_match("10.10.10.0/24", to_ip($message.gl2_remote_ip))
then
end

Firstly, the rule structure follows a simple "when, then" pattern, except for the rule naming. In the when clause we specify a Boolean expression that is evaluated in the context of the current message in the pipeline. These are the conditions used by the pipeline processor to determine whether to run a rule and, when evaluating the containing stage’s match all or match any requirement, whether to continue in a pipeline.

Note that the has firewall fields rule uses the built-in function has_field to check whether the message has the src_ip and dst_ip fields as we want to use them in a later stage of the pipeline. This rule has no actions to run in its then clause since we only want to use it to determine whether subsequent stages should run.

The second rule, from firewall subnet, uses the built-in function cidr_match, which takes a CIDR pattern and an IP address. In this case, we refer to a field from the currently processed message using the message reference syntax $message.

Graylog always applies the gl2_remote_ip field on messages, so we do not need to check whether that field exists. If we wanted to use a field that might not exist on all messages, we would first use the has_field function to ensure its presence.

Hint: The call to to_ip is around the gl2_remote_ip field reference. This is necessary since the field is stored as a string internally, and cidr_match requires an IP address object for its ip parameter.

Requiring an explicit conversion to an IP address object demonstrates an important feature of Graylog’s rule language, which is enforcement of type safety to ensure that you end up with the data in the correct format.

We again have no actions to run since we are just using the rule to manage the pipeline’s flow, so the then action is empty.

While we could instead combine the has firewall fields and from firewall subnet rules for the same purpose, rules are intended to be reusable building blocks. Imagine you have another pipeline for a different firewall subnet. Rather than duplicating the logic to check for src_ip and dst_ip and updating each rule if anything changes (e.g. additional fields), you can simply add the has firewall fields rule to your new stage. With this approach you only need to update a single rule, with the change immediately taking effect for all pipelines referring to it.

Conditions

In Graylog’s rules the when clause is a Boolean expression, which is evaluated against the processed message.

Expressions support the common Boolean operators AND (or &&), OR (||), NOT (!), and comparison operators (<;, <=, >, >=,==, !=).

Any function that returns a value can be called in the when clause, but it must eventually evaluate to a Boolean. For example, we were able to use to_ip in the from firewall subnet since it was being passed to cidr_match, which returns a Boolean, but we could not use route_to_stream as it does not return a value.

The condition must not be empty, but it can simply consist of the Boolean literal true. This is useful when you always want to execute a rule’s actions.

If a condition calls a function that is not present, the call evaluates to false.

Hint: When comparing two fields, you must use the same data type, e.g. to_string($message.src_ip)==to_string($message.dst_ip) compares the two strings and is true on match. Comparing different data types evaluates to false.

Actions

The then clause contains a list of actions that are evaluated in the order they appear.

There are two different types of actions:

  • Function calls
  • Variable assignments

Function calls look exactly like they do in conditions. All functions, including those that do not return a value, may be used in the then clause.

Variable assignments have the following syntax:

Copy
let name = value;

Variables are beneficial in when avoiding expensive recomputing while parsing data, holding on to temporary values, and making rules more readable. But, variables need to be defined before they can be used. Their fields (if any) can be accessed using the name.field notation in any place where a value of the field’s type is required.

Reserved Words

The following literals (case insensitive) should not be used as variable names in a pipeline rule as they are reserved tokens in the rule language parser:

All
Either
Pass
And
Or
Not
Pipeline
Rule
During
Stage
When
Then
End
Let
Match

For example, using this statement let match = regex(a,b); will result in an error due to the use of the variable name match.

Hint: The list of actions can simply be empty, in which case the rule is essentially a pluggable condition to help manage a pipeline’s processing flow.

Data Types

It is important to use the correct data types when building and applying rules. The Graylog rule language parser rejects invalid use of types, so rules are easier to apply correctly.

There are six built-in data types:

Data Type Description
string A UTF-8 string
double Corresponds to Java's Double

long

Corresponds to Java's Long

boolean boolean that returns true or false
void Indicates that a function has no return value to prevent it being used in a condition
ip A subset of InetAddress

Plugins are free to add additional types as they see fit. The rule processor ensures that values and functions agree on the types being used.

By convention, functions that convert types start with the prefix to_. Refer to the functions index for a full list.

Hint: Before using the value of a message field, always convert it to the intended type with one of the to_ functions.

Additionally, if you have pipeline rule arguments with special characters, you can escape them with the back tick operator. For example:

Copy
set_field("timestamp", to_string(`$message.@extracted_timestamp`));