Data Tiering

Graylog gives you the option to store and manage your index set data in tiers. When you set up index sets, you establish tier properties based on your data retention requirements. This method allows you to utilize storage effectively for both performance and cost-effectiveness.

This article explains how Graylog implements Data Tiering and what you need to know to establish effective policies for your index sets.

Hint: Data Tiering is not available for Graylog Cloud customers because Graylog handles advanced index configuration and storage for Cloud customers as part of the managed service.

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • You must have an Enterprise license to set up and use a warm tier.

  • You must be a Graylog administrator to set up and manage index sets and warm tiers.

Why Use Data Tiering?

With Graylog Data Tiering, each tier serves as a storage repository for data that needs to be handled in a similar way. Data tiers can be thought of as levels of data with the same storage and management specifications. Data is grouped based on how often it is used and the level of search performance required. Each tier has its own storage and accessibility criteria, meaning that you can move your data to a tier that best aligns with your needs.

Data Tiering can help lower storage costs because less frequently accessed data can be stored in a lower cost tier. For example, data that only needs to be retained for compliance checks can be stored in a tier that does not provide high performance storage and is therefore less expensive. You can reserve the costlier, high performance tier for more recent and more frequently searched data.

Data can be classified into tiers based on:

  • performance requirements

  • frequency of use

  • cost efficiency

We recommended Data Tiering as a cost-effective way of storing data for self-managed installations. Data Tiering provides three tiers for data storage: the hot tier, the warm tier, and the archive. Each of these tiers is described in detail below.

The Hot Tier

New indices that are part of a data stream are automatically allocated to the hot tier, which in the case of tiering refers to the search backend cluster you use for storage and acts as the default for all incoming data. Data in the hot tier is easy to access and search, but operating costs are generally higher because of the resources that must be allocated to maintain it.

The Warm Tier

The following section exclusively pertains to a Graylog Enterprise feature. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

Data in the warm tier is searchable, but search performance is lower compared to the hot tier. Warm tier data is stored in searchable snapshots and not directly in index sets. When a search is triggered in the warm tier, the system loads the data from this restored layer into the search backend cluster. The warm tier is suitable for storing data that does not require frequent access, such as logs from recent weeks.

Warning: The warm tier is searchable, but adding any warm indices to a search can slow down the search process. Note that a lower search speed can also hinder the performance of any search-based features such as widgets and dashboards.

A searchable snapshot index reads from the repository and does not download all data to the cluster at restore time. This method makes the warm tier a cost-effective storage solution. Snapshots are stored in a warm storage repository, which can be either an Amazon S3 bucket, or a local file system. Searchable snapshots remain in the repository in snapshot format and are read-only.

Warm Tier Backup

Warning: Utilizing an existing OpenSearch node in your cluster as a warm tier search node impacts overall cluster performance!

Warm tier data is stored in snapshots on OpenSearch. Therefore, if you opt to use a warm tier, then backing up your data on OpenSearch requires careful consideration. If you use snapshots directly on OpenSearch as a backup strategy and you want to use the warm tier option, then you should use a dedicated search node for warm tier data. This method might require you to reconfigure your OpenSearch cluster to create a dedicated search node.

Archiving

The following section exclusively pertains to a Graylog Enterprise feature. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

Graylog offers archiving for less critical data, making it a lower cost option for storing compliance and historical data.

The archive stores messages until you need to re-process them into Graylog for analysis. You can instruct Graylog to automatically archive log messages to compressed flat files on the local file system or an S3-compatible object storage. Messages are stored before retention cleaning begins, and they are not deleted from the search backend.

Hint: Currently, you can utilize both a Data Lake and archives to preserve your log data long term. Both features perform similar functions. However, there are benefits to using a Data Lake for less immediately valuable data. Retrieving logs from a Data Lake is a faster process because log retrieval is granular. Additionally, the data in a Data Lake is compressed, so it is generally a lower-cost option for data storage.

Set Up Data Tiering

You can establish Data Tiering when you create or update index sets. In fact, you can set different tier policies for each index set, based on the requirements of the data they contain.

If you intend to use data tiering with a warm tier, you first need to configuring your backend storage. You can choose from either an Amazon S3 bucket, or a local file system.

Hint: You must complete backend setup for each type of backend before you can create a backend of that type. Only backend types for which you have completed the prerequisite steps are available to select when you create a warm tier backend.

With any storage backend, the prerequisites and setup process vary based on your Graylog installation. Choose the correct path for your environment:

Monitor System Resources of the Warm Tier

After your initial set up of the warm tier, we recommend that you monitor the system resource utilization of the Graylog warm tier nodes to determine the optimal amount of disk space for their file caches. In particular, the active vs. used percentage metrics (along with the total, active, used, and evicted bytes of the file cache) should be closely observed.

The percentage of used file cache should be less than the percentage of active file cache, whose values are ideally 70% or less. Importantly, the number of active bytes in the file cache should be less than the used bytes in the file cache, and the used bytes should be less than the total bytes of the file cache, so:

Active bytes < used bytes < total bytes

Use with Graylog Data Node

If you enable Data Tiering with the Graylog Data Node, then you must first issue a certificate authority for the third-party tool you use to query OpenSearch Node's API. Details on issuing a certificate authority may be found in the Data Node documentation.

If you are using self-managed OpenSearch, proceed to the following section.

Retrieve File Cache Metrics

The OpenSearch node stats API allows you to retrieve statistics about your cluster. Here is an example cURL command that can be used to retrieve the file cache metrics discussed in the previous section. (This may be useful in cases where OpenSearch metrics are not being captured and stored within a time-series datastore such as InfluxDB or Prometheus.)

Copy
$ curl -s -XGET "http://admin:password@10.0.1.229:9200/_nodes/stats/file_cache?pretty"

The following is an example snippet of output from the response of the above command.

Copy
"jW4Q6SuXQASt8lM796CBHg" : {
      "timestamp" : 1712857680055,
      "name" : "10.0.1.229",
      "transport_address" : "10.0.1.229:9300",
      "host" : "10.0.1.229",
      "ip" : "10.0.1.229:9300",
      "roles" : [
        "search"
      ],
      "attributes" : {
        "shard_indexing_pressure_enabled" : "true"
      },
      "file_cache" : {
        "timestamp" : 1712857680055,
        "active_in_bytes" : 50066073022,
        "total_in_bytes" : 128849018880,
        "used_in_bytes" : 71497646618,
        "evictions_in_bytes" : 0,
        "active_percent" : 70,
        "used_percent" : 55,
        "hit_count" : 83077,
        "miss_count" : 877
      }
    }

For your reference, the following are the available OpenSearch metrics that may be retrieved via the node stats API:

  • nodes.stats.file_cache.active_in_bytes

  • nodes.stats.file_cache.total_in_bytes

  • nodes.stats.file_cache.used_in_bytes

  • nodes.stats.file_cache.evictions_in_bytes

  • nodes.stats.file_cache.active_percent

  • nodes.stats.file_cache.used_percent

  • nodes.stats.file_cache.hit_count

  • nodes.stats.file_cache.miss_count