Data Tiering
Graylog gives you the option to store and manage data in tiers. Each tier serves as a storage repository for data that needs to be handled in a similar way. Data tiers can be thought of as levels of data with the same storage and management specifications. This data is grouped based on how often it is used and the level of search performance required. Each tier has its own storage and accessibility criteria, meaning that you can move your data to a tier that better aligns with your needs.
Data tiering can help lower storage costs as it allows for practicality in data organization. Less frequently accessed data can be stored in a lower cost tier. For example, data that only needs to be retained for compliance checks can be stored in a tier that does not provide high performance storage and is therefore less expensive. You can reserve the costlier, high performance tier for more frequently searched data.
Data may be classified into tiers based on:
-
performance requirements
-
frequency of use
-
cost efficiency
We recommended data tiering as a cost-effective way of storing data for self-managed installations.
Explore the Tiers
In a data tiering model, there are essentially three tiers for data storage: the hot tier, the warm tier, and the archive.
The Hot Tier
New indices that are part of a data stream are automatically allocated to the hot tier, which in the case of tiering refers to the search backend cluster you use for storage and acts as the default for all incoming data. Data in the hot tier is easy to access and search, but operating costs are generally higher because of the resources that must be allocated to maintain it.
The Warm Tier
The following section exclusively pertains to a Graylog Enterprise feature. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.
Data in the warm tier is searchable, but search performance is lower compared to the hot tier. Warm data is stored in searchable snapshots and not directly in index sets. Once a search is triggered in the warm tier, the system loads the data from this restored layer into the search backend cluster. The warm tier is suitable for storing data that does not require frequent access, such as logs from recent weeks.
A searchable snapshot index reads from the repository and does not download all data to the cluster at restore time. This method makes the warm tier a cost-effective storage solution. Snapshots are stored in a warm storage repository, which may be either an AWS S3 bucket or a local file system. Searchable snapshots remain in the repository in snapshot format and are read-only.
Snapshots on OpenSearch
As warm tier data is stored in snapshots on OpenSearch, if you opt to utilize a warm tier, then backing up your data on OpenSearch will require careful consideration. If you are utilizing snapshots directly on OpenSearch as a backup strategy and you wish to use the warm tier option, then you need to consider using a dedicated search node for warm tier data. This may require you to rearchitect your OpenSearch cluster to create a dedicated search node.
Archiving
The following section exclusively pertains to a Graylog Enterprise feature. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.
Graylog offers archiving for less critical data, making it a lower cost option for storing compliance and historical data.
The archive stores messages until you need to re-process them into Graylog for analysis. You can instruct Graylog to automatically archive log messages to compressed flat files on the local file system or to an S3-compatible object storage. Messages are stored before retention cleaning begins, and they are not deleted from search backend.
Prepare Your Environment for a Warm Tier
Prerequisites
-
A valid Graylog Enterprise license is required.
-
You must utilize Graylog with OpenSearch 2.12+ OR with Graylog Data Node.
Confirm Compatibility
If your search backend cluster is not compatible with data tiering, Graylog displays a warning on the Indices and Index Sets page and the Index Set Overview page if:
-
your version of search backend is not compatible
-
your Security or Enterprise license has expired
Install the S3 OpenSearch Plugin and Add Keys to Keystore
If you are using S3 as your data storage, follow OpenSearch guidance on installing the S3 plugin and adding your AWS access and secret keys to the OpenSearch keystore.
Create a Repository
You must create at least one storage repository to store snapshots. We recommend that you locate your warm data in an S3 bucket (please note applicable security settings); however, you may also choose to store this data in any supported file system repository according to your preference.
You may create multiple repositories or split your storage between S3 and the file system. Repositories can be created through Graylog:
-
Navigate to System >Indices and locate the desired index set.
-
Click the Edit button found on the right side of the screen.
-
Toggle to select Data Tiering.
-
Scroll to the Rotation and Retention section.
-
Click Create new repository.
-
Select either S3 or FS (your local file system) as a repository type.
-
Give your repository a unique name.
-
Choose a location from the drop-down menu. (The selections are locations that are detected in your OpenSearch configuration file).
-
Click Create.
For the remainder of this article, we assume that you are utilizing an S3 bucket for data storage. Please follow the recommendations provided by your storage vendor if you opt to use another storage method. Note that for other file system repositories, you must at a minimum add a file system path to the OpenSearch configuration file using the path.repo
property:
path.repo: ["/mnt/snapshots"]
Configure the OpenSearch Configuration File
Assign the search
Role
OpenSearch nodes used for the warm tier must have the search
role. Assign this via OpenSearch 's configuration file:
node.roles: [search]
If you are not sure whether your configuration file includes this role, you can use the _cat/nodes
API endpoint:
curl "http://127.0.0.1:9200/_cat/nodes?v&h=ip,name,node.role,node.roles"
So, for example:
ip name node.role node.roles
192.168.0.153 glwn s search
Modify the Node Cache Size for Searchable Snapshots
Verify that the node_search_cache_size
parameter is included in the OpenSearch configuration file. If not, then it must be added.
Set the value to 10gb
:
node.search.cache.size: 10gb
Monitoring the performance of your warm tier is critical to optimizing this value. See the section on monitoring your system performance for more detail.
Set Up a Warm Tier
Once your environment is ready for data tiering, you can enable the warm tier for both new and existing index sets.
Enable the Warm Tier for a New Index Set
-
Navigate to System >Indices and click Create index set.
-
Scroll down to the Rotation and Retention section and select Data Tiering.
-
Enter the minimum and maximum amount of days you want your data to be stored. You may choose to save an index after the maximum time limit by checking the Archive before deletion box.
-
Select the Enable warm tier check box and enter the minimum number of days to keep your data in the hot tier. The visual synchronously displays how long your data will be kept in each tier as you make your selections.
-
Select the repository you want your data stored in from the Repository drop-down menu. The menu includes any repositories you created earlier.
-
Click Create index set.
Enable the Warm Tier for an Existing Index Set
-
Navigate to System > Indices and locate the desired index set.
-
Click the Edit Index Set button.
-
Scroll down to the Rotation and Retention section and select Data Tiering.
-
Enter the minimum and maximum amount of days you want your data to be stored.
-
Select the Enable warm tier check box and enter the minimum number of days to keep your data in the hot tier. The visual synchronously displays how long your data will be kept in each tier as you make your selections.
-
Select the repository you want your data stored in from the Repository drop-down menu. The menu includes any repositories you created earlier.
-
Click Update index set.
View Data Tiering Configuration
Once you have created or updated your index set:
-
Navigate to System > Indices and Index Sets
-
Click on the desired index set.
Here you can see warm displayed in the index title after the index prefix.
You may perform searches in the warm tier, and you can verify that your search results include the warm tier by checking the Stored in index section. You will see warm in the index set title.
Monitor System Resources of the Warm Tier
After your initial set up of the warm tier, we recommend that you monitor the system resource utilization of the Graylog warm tier nodes to determine the optimal amount of disk space for their file caches. In particular, the active vs. used percentage metrics (along with the total, active, used, and evicted bytes of the file cache) should be closely observed.
The percentage of used file cache should be less than the percentage of active file cache, whose values are ideally 70% or less. Importantly, the number of active bytes in the file cache should be less than the used bytes in the file cache, and the used bytes should be less than the total bytes of the file cache, so:
Active bytes < used bytes < total bytes
Use with Graylog Data Node
If you enable data tiering with the Graylog Data Node, then you must first issue a certificate authority for the third-party tool you use to query OpenSearch Node's API. Details on issuing a certificate authority may be found in the Data Node documentation.
If you are using self-managed OpenSearch, proceed to the following section.
Retrieve File Cache Metrics
The OpenSearch node stats API allows you to retrieve statistics about your cluster. Here is an example cURL
command that can be used to retrieve the file cache metrics discussed in the previous section. (This may be useful in cases where OpenSearch metrics are not being captured and stored within a time-series datastore such as InfluxDB or Prometheus.)
$ curl -s -XGET "http://admin:password@10.0.1.229:9200/_nodes/stats/file_cache?pretty"
The following is an example snippet of output from the response of the above command.
"jW4Q6SuXQASt8lM796CBHg" : {
"timestamp" : 1712857680055,
"name" : "10.0.1.229",
"transport_address" : "10.0.1.229:9300",
"host" : "10.0.1.229",
"ip" : "10.0.1.229:9300",
"roles" : [
"search"
],
"attributes" : {
"shard_indexing_pressure_enabled" : "true"
},
"file_cache" : {
"timestamp" : 1712857680055,
"active_in_bytes" : 50066073022,
"total_in_bytes" : 128849018880,
"used_in_bytes" : 71497646618,
"evictions_in_bytes" : 0,
"active_percent" : 70,
"used_percent" : 55,
"hit_count" : 83077,
"miss_count" : 877
}
}
For your reference, the following are the available OpenSearch metrics that may be retrieved via the node stats API:
-
nodes.stats.file_cache.active_in_bytes
-
nodes.stats.file_cache.total_in_bytes
-
nodes.stats.file_cache.used_in_bytes
-
nodes.stats.file_cache.evictions_in_bytes
-
nodes.stats.file_cache.active_percent
-
nodes.stats.file_cache.used_percent
-
nodes.stats.file_cache.hit_count
-
nodes.stats.file_cache.miss_count