Create a Warm Tier on Self-Managed OpenSearch

The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

When you use Graylog Data Tiering, you set data storage retention policies when you create or update index sets. In fact, you can set different tier policies for each index set, based on the requirements of the data they contain. Establishing a retention policy with a warm tier lets you maintain less-frequently searched data in lower cost and lower performance storage. See Data Tiering for complete information.

If you intend to use data tiering with a warm tier, you first need to configuring your backend storage. This article explains how to prepare your environment for a warm tier on a Graylogs installation with self-managed OpenSearch, and also includes steps to follow to enable a warm tier for both new and existing index sets.

Hint: If you are using Graylog with Data Node, the requirements and procedures are different. See Create a Warm Tier on Data Node for information.

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

You must be a Graylog administrator to set up a warm tier backend and enable a warm tier.
You must use Graylog with OpenSearch minimum version 2.12 and maximum version 2.19.4. If your search backend cluster is not compatible with Data Tiering, Graylog displays a warning on the Indices and Index Sets page (System > Indices).
For Amazon S3, Google Cloud Storage (GCS), and Apache Hadoop Distributed File System (HDFS) storage backends, you must have appropriate credentials to configure that storage.

Prepare Your Environment for a Warm Tier

Before you can enable a warm tier for your installation, you must set up a storage repository. We recommend that you locate your warm tier data in an S3 or GCS bucket, but you can also choose to store this data in any supported file system repository.

The initial steps for enabling a repository are completed outside of the Graylog web interface and differ depending on the storage backend you choose:

For Amazon S3, follow these steps▼

Install the OpenSearch snapshot repository plugin on all OpenSearch nodes:
Copy
```
sudo ./bin/opensearch-plugin install repository-s3
```
Add your Amazon Web Services (AWS) access and secret keys to the OpenSearch keystore on all search nodes. See the OpenSearch documentation for instructions.
Update the OpenSearch configuration file on all search nodes as described in Configure the OpenSearch Configuration File.
Restart all search nodes for the changes to take effect.

For Google Cloud Storage (GCS), follow these steps▼

Install the OpenSearch snapshot repository plugin on all OpenSearch nodes:
Copy
```
sudo ./bin/opensearch-plugin install repository-gcs
```
Add the service account key JSON file to the OpenSearch keystore on all search nodes. See Google documentation for information.
Update the OpenSearch configuration file on all search nodes as described in Configure the OpenSearch Configuration File.
Restart all search nodes for the changes to take effect.

Configure the OpenSearch Configuration File

As part of setup for a Data Tiering warm tier, you need to update or add configuration settings to the OpenSearch configuration file, opensearch.yml.

Hint: After you make any change to opensearch.yml, you must restart the node for the changes to take effect. If you update multiple nodes, be sure to update all nodes with changes!

Assign the `search` Role

OpenSearch nodes used for the warm tier must have the search role to enable searchable snapshots. Assign this role by adding the following line to opensearch.yml to each node used for warm tier data:

Copy

node.roles: [search]

If you are not sure whether your configuration file includes this role, you can use the _cat/nodes API endpoint:

Copy

curl "http://127.0.0.1:9200/_cat/nodes?v&h=ip,name,node.role,node.roles"

For example, a response might look like this:

Copy

ip            name node.role node.roles
192.168.0.153 glwn s         search

Modify the Node Cache Size for Searchable Snapshots

Verify that the node.search.cache.size parameter is included in opensearch.yml for each warm tier node. If not, you must add it to the file.

Set the value to 10gb:

Copy

node.search.cache.size: 10gb

Monitoring the performance of your warm tier is critical to optimizing this value. See the section on monitoring your system performance for detail.

Add a File System Location

For file system repositories, you must at a minimum add a file system path to opensearch.yml using the path.repo property:

Copy

path.repo: ["/mnt/snapshots"]

You might need to complete additional configuration, depending on your operating system. Review the OpenSearch documentation for specific requirements.

Create a Repository

After you complete all the prerequisite steps, you must create at least one storage repository to store snapshots. Remember, we recommend that you locate your warm tier data in an S3 or GCS bucket, but you can also choose to store this data in a local file system repository or Apache HDFS.

Create a warm tier storage repository in the Graylog web interface as follows:

Navigate to System >Indices.
Click Edit for the desired index set.
In the Rotation and Retention section, select Data Tiering.
Click Create new warm storage repository.
Select your Warm Storage Repository Location at the top of the form.

Hint: Only backend types for which you have completed the prerequisite setup steps are available to select when you create a warm tier backend. Be sure to complete the prerequisite setup for any type you want to use.
For Amazon S3▼
1. Give your repository a unique name.
2. Enter the name of the S3 bucket in which logs will be stored.
3. Enter the required base path where the archives should be stored within the S3 bucket.
  
  You can use a single bucket for multiple purposes. For instance, you could use the same bucket for a Data Lake backend and a warm tier snapshot backend. However, if you do, it is important to use different sub folders for each specific use. The base path you set here determines the sub folder structure for this backend.
4. Click Create.
For Google Cloud Storage▼
1. Give your repository a unique name.
2. Enter the name of the GCS bucket in which logs will be stored.
3. Enter the required base path where the archives should be stored within the GCS bucket.
  
  You can use a single bucket for multiple purposes. For instance, you could use the same bucket for a Data Lake backend and a warm tier snapshot backend. However, if you do, it is important to use different sub folders for each specific use. The base path you set here determines the sub folder structure for this backend.
4. Click Create.
For Apache HDFS▼
1. Give your repository a unique name.
2. Enter the URI of the Apache HDFS cluster in which logs will be stored.
3. In the HDFS cluster, create a directory for the snapshot repository. This directory must be owned by opensearch:superuser and have write permissions.
4. In Graylog, enter the path of the warm storage repository created in the previous step.
5. (Optional): Add HDFS client configuration properties if necessary for use with the HDFS cluster.
6. Click Create.
For File System▼
1. Give your repository a unique name.
2. Select your file system location from the dropdown.
3. Click Create.

Set Up a Warm Tier

When your environment is ready for Data Tiering, you can enable the warm tier for both new and existing index sets.

Enable the Warm Tier for a New Index Set

You can select warm tier for a new index set when you create it. You can create a new index set based on built-in index set templates provided by Graylog or you can create custom templates for your environment.

For complete information about creating index sets and index set templates, including how to enable the warm tier, see Index Set Templates.

Enable the Warm Tier for an Existing Index Set

To choose warm tier storage for an existing index set, follow these steps:

Navigate to System >Indices.
Click Edit for the desired index set.
In the Rotation and Retention section, select Data Tiering.
Enter the minimum and maximum number of days you want your data to be stored.
Select the Enable warm tier checkbox, then enter the minimum number of days to keep your data in the hot tier. The visual synchronously displays how long your data will be kept in each tier as you make your selections.
Select the repository you want your data stored in from the Repository dropdown. The menu includes any repositories you created earlier.
Click Update index set.

View Data Tiering Configuration

After you have created or updated your index set, you can view configuration information as follows:

Navigate to System > Indices
Select the desired index set to view the index set overview page.

Here you can see warm displayed in the index title after the index prefix.

You can perform searches in the warm tier, and you can verify that your search results include the warm tier by checking the Stored in index section. You will see warm in the index set title.

Warning: If the warm tier is disabled, you still may be able to perform searches in the warm tier, but data no longer rolls over from the hot tier to the warm tier. This limitation can cause performance issues.

Create a Warm Tier on Self-Managed OpenSearch

Prerequisites

Prepare Your Environment for a Warm Tier

Configure the OpenSearch Configuration File

Assign the search Role

Modify the Node Cache Size for Searchable Snapshots

Add a File System Location

Create a Repository

Set Up a Warm Tier

Enable the Warm Tier for a New Index Set

Enable the Warm Tier for an Existing Index Set

View Data Tiering Configuration

Assign the `search` Role