Manage Your Log Data

Effectively managing log data is critical to ensuring observability, security, and performance within your environment. This section introduces key components and strategies that allow you to control how data flows through Graylog—from ingestion and enrichment to storage and retrieval.

This section of the documentation covers components that represent the key building blocks in Graylog's data management architecture. Together, they offer scalable and efficient ways to organize, route, process, and search your log data.

Graylog Data Node

Graylog Data Node simplifies the management of your search backend. It reduces operational complexity and enforces version compatibility and secure configuration.

The Data Node strengthens the security of Graylog’s data layer by managing certificatesand controlling cluster membership. It also ensures that the correct version of OpenSearch and its required extensions are installed to support full Graylog functionality.

Data Routing

Once logs have been ingested into Graylog, the process by which they are filtered, enriched, and routed to a destination is referred to as Data Routing. This process is fundamentally applied at the level of a stream and involves the application of various rules and filters to move data where you want it to go.

Streams

Ingested logs are assigned to specific streams, which are the mechanism for moving logs through and outside of Graylog. You can additionally establish pipeline rules to determine which log messages are routed to which stream. You can also have multiple streams that receive the same log data, allowing you to have different views and rules for the same subset of information.

Pipelines

Pipelines provide a flexible way to transform and enrich messages after they are routed into streams. A pipeline is a sequence of processing stages through which messages pass. Each stage can apply one or more pipeline rules consisting of various functions to perform specific operations on the log messages, such as filtering, transforming, or routing. Pipeline rules can be built using the rule builder function in the Graylog interface.

Destinations

Once logs have been channeled into streams and pipeline rules are applied, you can then route logs into three specific destinations:

Using specific filter rules, Data Routing allows you to route logs into one or multiple destinations based on the filters applied to the log data.

Data Lakes

This is a Graylog Enterprise feature. A valid Graylog Enterprise license is required.

Data Lakes are centralized repositories that allow you to store large amounts of log data without the need to write this data to your search backend, like OpenSearch. This log data is compressed for storage and may be retrieved in future so that the data can be used for search and analytics. Additionally, Data Lakes may use either an Amazon S3 bucket or network storage as a Data Lake backend.

Index Model

Data from a stream can also be directly written to one or more indices. The Graylog index model allows you to apply configurations to index sets, which are predetermined collections of indices. This process allows you to manage the lifecycle of indices, including rotation and retention, what storage backend is utilized, and when indices are archived (if desired).

You can also apply index set templates with predefined configuration settings that meet your desired performance and maintenance costs. These templates and all configuration settings for index sets are based on the different data tiers offered by Graylog depending on your specific performance parameters.

Outputs

An output is a mechanism that allows Graylog to send logs to external systems or destinations, like a specific database or another Graylog instance, after they have been collected. Graylog instances with an Enterprise license may also make use of the Enterprise Output Framework, which enables you to forward messages to external systems via a structured approach that can make use of additional pipeline rules to filter and enrich the log data.