Pipeline Rule Logic
Pipelines are made up of pipeline rules, which are logical expressions that allow you to inspect, transform, and route log data according to specified criteria before the data is stored or indexed.
Graylog supports a domain-specific language (DSL) to express processing logic. This language is highly controlled for easier understanding and better runtime optimization. Each pipeline rule expresses a condition and an action, which together determine how incoming messages should be processed. Additionally, understanding data types is crucial when writing conditions and actions in pipeline rules. Data types refer to the kind of value a field or variable holds and how it can be manipulated within the rule's logic.
The building blocks of pipeline rules are functions, which are methods to perform actions or transformations on log messages, allowing you to customize pipeline rules. Graylog supports many built-in functions, providing data conversion, string manipulation, data retrieval using lookup tables, JSON parsing, and much more.
Pipeline rules are built primarily via the rule builder interface, which can also aid you in testing rules before deployment.
In this article, we will review a sample pipeline and pipeline rule structure, how to apply conditions and actions to a rule, and what data types apply to pipeline rules. Once you understand the basic structure and application of pipeline rules, we recommend you review Build Pipeline Rules for specific instructions on how to create new pipeline rules in Graylog.
Rule Structure
To understand how pipelines and pipeline rules are built, let's explore an example pipeline and break down the rules that compose it.
Example Pipeline
Internally, pipelines and the rules they are built on are represented as a logical sequence of actions. Let’s look at an example and review what each action does:
pipeline "My new pipeline"
stage 1 match all
rule "has firewall fields";
rule "from firewall subnet";
stage 2 match either
rule "geocode IPs";
rule "anonymize source IPs";
end
This pipeline declares a new pipeline named My new pipeline
, which has two stages.
Stages are run in the order of their given priority. Stage priorities can be any integer, positive or negative, you prefer. In our example, the first stage has a priority of 1, and the second stage a priority of 2. Staging based upon priority gives you the ability to run certain rules before or after others, which might exist in other connected pipelines, without modifying those other connected pipelines. This is particularly handy when dealing with changing data formats.
Stages then list which rule you want to be executed as well as the rules’ conditions that must be satisfied to continue running the pipeline. In our example, imagine that the rule has firewall fields
checks for the presence of message fields
and src_ip
dst_ip
but does not have any actions to run. For a message without both fields, the rule’s condition would evaluate to
, and the pipeline would abort after stage 1 as the stage requires all rules be satisfied (i.e. false
match all
). With the pipeline aborted, stage 2 would not run.
acts as an match either
operator, only requiring a single rule’s condition to evaluate to OR
in order to continue processing. Note that actions still run for all matching rules in the stage, even if it is the final stage in the pipeline.true
Rules are referred to by their unique names and can therefore be shared among many different pipelines. The intent is to enable you to create reusable building blocks, making it easier to process the data specific to your organization or use case.
Now, let's explore the individual rules themselves and their syntax.
Example Pipeline Rules
Consider the following example rules contained in the above pipeline:
Example Rule 1
rule "has firewall fields"
when
has_field("src_ip") && has_field("dst_ip")
then
end
Example Rule 2
rule "from firewall subnet"
when
cidr_match("10.10.10.0/24", to_ip($message.gl2_remote_ip))
then
end
Firstly, the rule structure follows a simple "when, then" pattern, except for the rule naming. In the when
clause we specify a Boolean expression that is evaluated in the context of the current message in the pipeline. These are the conditions used by the pipeline processor to determine whether to run a rule and, when evaluating the containing stage’s match all
or
requirement, whether to continue in a pipeline.match any
Note that the has firewall fields
rule uses the built-in function has_field
to check whether the message has the src_ip
and dst_ip
fields as we want to use them in a later stage of the pipeline. This rule has no actions to run in its then
clause since we only want to use it to determine whether subsequent stages should run.
The second rule, from firewall subnet
, uses the built-in function cidr_match
, which takes a CIDR pattern and an IP address. In this case, we refer to a field from the currently processed message using the message reference syntax $message
.
Graylog always applies the gl2_remote_ip
field on messages, so we do not need to check whether that field exists. If we wanted to use a field that might not exist on all messages, we would first use the has_field
function to ensure its presence.
to_ip
is around the gl2_remote_ip
field reference. This is necessary since the field is stored as a string internally, and cidr_match
requires an IP address object for its
parameter.ip
Requiring an explicit conversion to an IP address object demonstrates an important feature of Graylog’s rule language, which is enforcement of type safety to ensure that you end up with the data in the correct format.
We again have no actions to run since we are just using the rule to manage the pipeline’s flow, so the then
action is empty.
While we could instead combine the has firewall fields
and from firewall subnet
rules for the same purpose, rules are intended to be reusable building blocks. Imagine you have another pipeline for a different firewall subnet. Rather than duplicating the logic to check for src_ip
and dst_ip
and updating each rule if anything changes (e.g. additional fields), you can simply add the has firewall fields
rule to your new stage. With this approach you only need to update a single rule, with the change immediately taking effect for all pipelines referring to it.
Conditions
In Graylog’s rules the when
clause is a Boolean expression, which is evaluated against the processed message.
Expressions support the common Boolean operators
(or AND
&&
),
(OR
||
),
(NOT
!
), and comparison operators (<;
, <=
, >
, >=
,==
, !=
).
Any function that returns a value can be called in the when
clause, but it must eventually evaluate to a Boolean. For example, we were able to use
in the to_ip
from firewall subnet
since it was being passed to cidr_match
, which returns a Boolean, but we could not use route_to_stream
as it does not return a value.
The condition must not be empty, but it can simply consist of the Boolean literal true
. This is useful when you always want to execute a rule’s actions.
If a condition calls a function that is not present, the call evaluates to false
.
to_string($message.src_ip)==to_string($message.dst_ip)
compares the two strings and is true
on match. Comparing different data types evaluates to false
.
Actions
The then
clause contains a list of actions that are evaluated in the order they appear.
There are two different types of actions:
- Function calls
- Variable assignments
Function calls look exactly like they do in conditions. All functions, including those that do not return a value, may be used in the then
clause.
Variable assignments have the following syntax:
let name = value;
Variables are beneficial in when avoiding expensive recomputing while parsing data, holding on to temporary values, and making rules more readable. But, variables need to be defined before they can be used. Their fields (if any) can be accessed using the
notation in any place where a value of the field’s type is required.name.field
Reserved Words
The following literals (case insensitive) should not be used as variable names in a pipeline rule as they are reserved tokens in the rule language parser:
All
Either
Pass
And
Or
Not
Pipeline
Rule
During
Stage
When
Then
End
Let
Match
For example, using this statement let match = regex(a,b);
will result in an error due to the use of the variable name match
.
Data Types
It is important to use the correct data types when building and applying rules. The Graylog rule language parser rejects invalid use of types, so rules are easier to apply correctly.
There are six built-in data types:
Data Type | Description |
---|---|
string
|
A UTF-8 string |
double
|
Corresponds to Java's Double |
|
Corresponds to Java's |
boolean
|
boolean that returns true or false |
void
|
Indicates that a function has no return value to prevent it being used in a condition |
ip
|
A subset of InetAddress |
Plugins are free to add additional types as they see fit. The rule processor ensures that values and functions agree on the types being used.
By convention, functions that convert types start with the prefix to_
. Refer to the functions index for a full list.
to_
functions.
Additionally, if you have pipeline rule arguments with special characters, you can escape them with the back tick operator. For example:
set_field("timestamp", to_string(`$message.@extracted_timestamp`));