Pipelines
Pipelines are an essential part of log message processing in Graylog, forming the backbone that ties together all processing steps applied to your data. They serve as a structured framework that enables you to define how incoming log data is evaluated, modified, and routed through various processing steps.
Each pipeline is composed of rules and can be linked to one or more streams, enabling detailed and powerful control over how messages are processed. This capability makes pipelines indispensable for creating efficient, customized workflows and ensures that your log data is handled with precision and flexibility.
Pipeline rules consist of conditions and corresponding actions that determine how messages are processed. To create an organized and efficient workflow, pipelines use stages. Stages group these conditions and actions, allowing them to execute in a defined sequence. This structure ensures a streamlined processing flow, enabling pipelines to handle messages in a logical and orderly manner.
Stages with the same priority level execute simultaneously across all connected pipelines. This design provides the necessary control flow to determine whether subsequent stages within the pipeline should run, ensuring a logical and efficient processing sequence.
Pipeline Structure
Internally pipelines are represented as code. Let’s have a look at a simple example and understand what each part does:
pipeline "My new pipeline"
stage 1 match all
rule "has firewall fields";
rule "from firewall subnet";
stage 2 match either
rule "geocode IPs";
rule "anonymize source IPs";
end
This code snippet declares a new pipeline named My new pipeline
, which has two stages.
Stages are run in the order of their given priority, and aren’t otherwise named. Stage priorities can be any integer, positive or negative, you prefer. In our example, the first stage has a priority of 1 and the second stage a priority of 2, however -99 and 42 could be used instead. Ordering based upon stage priority gives you the ability to run certain rules before or after others, which might exist in other connected pipelines, without modifying those other connected pipelines. This is particularly handy when dealing with changing data formats.
For example, if there was a second pipeline declared with a stage assigned priority 0, that stage’s rules would run before either of the ones from the example (priorities 1 and 2, respectively). Note that the order in which stages are declared is irrelevant, since they are sorted according to their priority.
Stages then list which rule references they want to be executed, as well as whether any or all of the rules’ conditions need to be satisfied to continue running the pipeline.
In our example, imagine that the rule “has firewall fields” checks for the presence of message fields
and src_ip
dst_ip
, but does not have any actions to run. For a message without both fields the rule’s condition would evaluate to
and the pipeline would abort after stage 1, as the stage requires all rules be satisfied (false
match all
). With the pipeline aborted, stage 2 would not run.
acts as an match either
operator, only requiring a single rule’s condition evaluate to OR
in order to continue pipeline processing. Note that actions are still run for all matching rules in the stage, even if it is the final stage in the pipeline.true
Rules are referenced by their names, and can therefore be shared among many different pipelines. The intention is to enable creation of reusable building blocks, making it easier to process the data specific to your organization or use case.
Read more about Rules in the next section.