Archiving

The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

Storing extensive amounts of data in your search backend can be costly. Graylog allows you to store inactive data in the Graylog archive to help lower storage costs and maximize retention as needed. When archiving index sets, you can set a retention period based on a variety of factors. Archived indices are then deleted after the retention period is complete.

In Graylog, you can archive index sets to compressed flat files on the local file system, to as S3-compatible object storage, or to Google Cloud Storage (GCS). Note that archived index sets are stored before retention cleaning begins so that no data is lost.

Hint: Archived data is inactive and cannot be searched unless restored. If necessary, archived indices can be re-imported through the user interface. After the data is restored, you can search through and analyze this data via the web interface. See Restore an Archive for more information.

Configure the Graylog Archive

You can configure the archive directly via the Graylog web interface:

Navigate to Enterprise > Archives
Select the Configuration tab.

The following configuration options are available in this menu:

Name	Description
Backend	Backend on the node where the archived files are stored.
Enable multithreading	Enables faster archiving by using multiple threads per search node.
Max segment size	Maximum size (in bytes) of archive segment files.
Compression type	Compression type used to compress the archives.
Checksum type	Algorithm used to calculate the checksum for archives.
Restore index batch size	OpenSearch batch size when restoring archive files.
Streams to archive	Streams included in the archive.

Choose a Backend

Archived indices are stored in a backend. You can choose either type based on your environment or preference:

Configure a File System Backend

File system is the default backend. Initial server start up triggers the creation of a backend, but you can adjust the backend location in your Graylog configuration .yml file.

You can use a backend to store archived data. Graylog supports only a single-file system backend type.

The archived indices are stored in the Output base path directory. This directory needs to exist and be writable for the Graylog server process to store the files.

The archiving process runs on the leader node, so only the leader node needs access to the output base path directory. We recommend housing the output base path directory on a separate disk or partition to avoid any negative impacts on message processing when the archive completely fills a disk.

You can edit archive backend configuration options via Enterprise > Archives under Manage Backends.

File System Backend Configuration Options

To configure a file system backend on your local network storage:

Navigate to Enterprise > Archives, then select the Manage Backends tab.
Click Create backend.
Select File system from the Backend Type dropdown.

Enter configuration details for your network storage backend:

Title	Enter a unique and descriptive name for the backend.
Description	Enter a description of the backend.
Output Base Path	Enter the base path where the archives should be stored. Use a simple directory path string, or you can create a template string to build dynamic paths, as described in Template Strings.

Click Create backend.
Activate the backend as described in this section.

Configure an S3 Backend

The S3 archiving backend is built to work with AWS and can be used to upload archives to an Amazon S3 object storage service. The S3 backend option is also compatible with other object storage implementations, such as MinIO, CEPH, and Digital Ocean Spaces.

S3 Backend Configuration Options

To configure an S3 backend:

Navigate to Enterprise > Archives, then select the Manage Backends tab.
Click Create backend.
Select S3 from the Backend Type dropdown.

Enter the configuration options for your S3 backend:

Title	Enter a unique and descriptive name for the backend.
Description	Enter a description of the backend.
S3 Endpoint URL	Enter the URL that provides the location of the S3 server.
AWS Authentication Type	Choose between automatic or key and secret authentication. For more information, see the AWS credential configuration documentation.
AWS Assume Role (ARN) (optional)	Enter the Amazon Resource Name (ARN) with required cross-account permission.
S3 Bucket Name	Enter the name of the S3 bucket in which logs will be stored.
Spool directory	Enter a directory where archiving data is stored temporarily before it is uploaded. Ensure that the directory is writable and has sufficient space for 10 times the Max Segment Size. Adjust the segment size on the Configuration page, if required.
AWS Region	Select the physical location for your cluster data center.
S3 Output Base Path	Enter the base path where the archives should be stored within the S3 bucket. Use a simple directory path string, or you can create a template string to build dynamic paths, as described in Template Strings. You can use a single bucket for multiple purposes. For instance, you could use the same bucket for a Data Lake backend and a warm tier snapshot backend. However, if you do, it is important to use different sub folders for each specific use. The base path you set here determines the sub folder structure for this backend.

If you are using a non-Amazon S3 bucket, complete the fields that best suit your choice of archive.

Click Create backend.
Activate the backend as described in this section.

Apply AWS Security Permissions

When writing AWS security policies, make them as restrictive as possible. It is best practice to enable specific actions needed by the application rather than allowing all actions.

These permissions are required for Graylog to successfully make use of the S3 bucket:

Permission	Description
CreateBucket	Creates an S3 bucket.
HeadBucket	Determines if an action is useful and if you have permission to access it.
PutObject	Adds an object to a bucket.
CreateMultipartUpload	Initiates a multipart upload and returns an upload ID.
CompleteMultipartUpload	Completes a multipart upload by assembling previously uploaded parts.
UploadPart	Uploads a part in a multipart upload.
AbortMultipartUpload	Aborts a multipart upload.
GetObject	Retrieves objects from Amazon S3.
HeadObject	Retrieves metadata from an object without returning the object itself.
ListObjects	Returns some or all (up to 1,000) of the objects in a bucket with each request.
DeleteObjects	Enables you to delete multiple objects from a bucket using a single HTTP request.

Configure a Google Cloud Storage (GCS) Backend

Before you can establish Google Cloud Storage (GCS) as a backend, you must complete setup on your Google Cloud account.

Create a GCS bucket. Follow Google's documentation on buckets to complete this process. Note the following:
- To create a bucket, you must have the Storage Admin IAM role assigned for the project.
- The bucket name must be globally unique, and you cannot change this name after the bucket is created. Make sure to note your bucket name as you need to provide it in the backend setup process in Graylog.
- The default Standard storage class is recommended. However, depending on your use case, you might determine a different class is a better fit. Make sure that you understand cost implications with Google for each choice.
- When setting access control and data protection and retention, be sure to follow your company guidelines and security best practices. Also, be aware that your choices can have cost implications from Google.
Create a Google Cloud service account. Follow Google's documentation on service accounts to complete this process. Set permissions for this account such that it can read, write, and delete from the bucket.
In Google Cloud, set up Application Default Credentials (ADC). Follow Google's documentation on ADC to complete this process. Depending on your environment, the steps might be as follows:
1. Download your service account key file from the Google Cloud console.
2. On every Graylog node, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the key file with a command like the following:
  Copy
```
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key.json
```
  Be sure to update the path in the above command to the location for your service account key file.
3. To configure ADC with your Google account, run the following command in the Google Cloud CLI:
  Copy
```
gcloud auth application-default login
```
Hint: You must complete ADC setup on all Graylog nodes in your environment!

Google provides instructions for setting up ADC on multiple environment types, including development, on-premises, cloud, and containerized. Use the instructions that match your Graylog deployment.

GCS Backend Configuration Options

To configure a GCS backend:

Navigate to Enterprise > Archives, then select the Manage Backends tab.
Click Create backend.
Select GCS from the Backend Type dropdown.

Enter the configuration options for your GCS backend:

Title	Enter unique and descriptive name for the backend.
Description	Enter a description of the backend.
GCS Bucket	Enter the name of the GCS bucket in which logs will be stored.
Project ID (optional)	Enter your Google Cloud project’s unique identifier. The Google Cloud Project ID is a user-selected unique name that you can use to reference the Google project from Graylog. However, additional configuration is required, such as enabling the Google Cloud API, granting appropriate IAM roles, and additional authentication and authorization steps. Consult the Google Cloud documentation for complete information.
Endpoint URI (optional)	Enter a custom endpoint for accessing the storage. Leave this field blank to use the default value.
Spool directory	Enter a directory where archiving data is stored temporarily before it is uploaded. Ensure that the directory is writable and has sufficient space for 10 times the Max Segment Size. Adjust the segment size on the Configuration page, if required.
GCS Output base path	Enter the base path where the archives should be stored within the GCS bucket. Use a simple directory path string, or you can create a template string to build dynamic paths, as described in Template Strings. You can use a single bucket for multiple purposes. For instance, you could use the same bucket for a Data Lake backend and a warm tier snapshot backend. However, if you do, it is important to use different sub folders for each specific use. The base path you set here determines the sub folder structure for this backend.

Click Create backend.
Activate the backend as described in this section.

Template String for Storage

For the Output base path, you can define the path with a template string to help organize your data. For example, you can add the month, year, or day along with the index name to create subfolders based on these variables.

You can use the following variables to construct a dynamic value for each archive and give it structure:

Name	Description
`${year}`	Archival date year (e.g. `2025`).
`${month}`	Archival date month (e.g. `04`).
`${day}`	Archival date day (e.g. `01`).
`${hour}`	Archival date hour (e.g. `23`).
`${minute}`	Archival date minute (e.g. `24`).
`${second}`	Archival date second (e.g. `59`).
`${index-name}`	Name of the archived index (e.g. `graylog_0`).

The following example shows a defined template and what the resulting path would be:

# Template

/data/graylog-archive/${year}/${month}/${day}

# Result

/data/graylog-archive/2025/06/01/graylog_0

Activate the Backend

After you configure your backend, you still need to make that backend active so data is routed to it for archive storage. To activate a backend:

Navigate to Enterprise > Archive, then select the Configuration tab.
Select the backend you want to activate from the Backend dropdown.
You can choose to change configurations or use the defaults provided at this time.
Select Update configuration.

This action navigates you back to the Archives page on the Overview tab.

Enable Multithread Archiving

Multithread archiving allows for the use of more than one Java thread per search node. It is a faster option for creating archives because it expands write capacity by running a thread for each shard.

Multithread archiving is enabled by default on new installations. To enable this feature on existing instances, complete the following steps:

Navigate to Enterprise > Archives.
Select the Configuration tab.
Select the Enable multithreading check box.

Warning: Although selecting multithreading may increase archiving speed, it may also increase the load on your search cluster, which may impact the performance of your OpenSearch cluster.

Select a Max Segment Size

When you archive an index, the archive job writes the data into segments. The Max Segment Size setting sets the size limit for each of these data segments to control the size of the segment files and process them with tools with a file-size limit.

Once the size limit is reached, a new segment file is started. For example:

Copy

/path/to/archive/
  graylog_201/
    archive-metadata.json
    archive-segment-0.gz
    archive-segment-1.gz
    archive-segment-2.gz

Select a Compression Type

Archives are compressed with gzip by default, but you can switch to a different compression type.

The selected compression type significantly impacts the time it takes to archive an index. For example, gzip is generally slower, but the compression rate is more efficient. Snappy and LZ4 are quicker, but the archives will be larger.

Here is a general comparison between the available compression algorithms with select test data. Results will vary based on your data!

Type	Index Size	Archive Size	Duration
gzip	1 GB	134 MB	15 minutes, 23 seconds
Zstandard	1 GB	225.2 MB	5 minutes, 55 seconds
Snappy	1 GB	291 MB	2 minutes, 31 seconds
LZ4	1 GB	266 MB	2 minutes, 25 seconds

Warning: The current implementation of LZ4 is not compatible with LZ4 CLI tools, so it is currently impossible to decompress LZ4 archives outside of Graylog.

Select a Checksum Type

When Graylog writes archives, it also computes a CRC32 checksum over the files. You can select a different option to use a different checksum algorithm if needed.

To find the most appropriate type of checksum, you might consider that CRC32 and MD5 are quick to compute and are a reasonable choice to detect damaged files, but neither are suitable for protection against malicious changes in the files. Graylog supports using SHA-1 or SHA-256 checksums, which can ensure the files were not modified as they are cryptographic hashes.

When selecting a checksum type, we recommend that you determine:

Whether the necessary system tools to compute them are installed (SHA-256 utility, for example).
The speed of checksum calculation for larger files.
Security considerations.

Set the Restore Index Batch Size

The Batch Size setting controls the batch size for re-indexing archive data into OpenSearch. When set to 1000, the restore job re-indexes the archived data in document batches of 1000.

Use this setting to control the speed of the restore process and the amount of load it generates on the OpenSearch cluster. The higher the batch size, the faster the restore progresses, and the more load is put on your OpenSearch cluster beyond the normal message processing.

Tune this setting carefully to avoid any negative impact on your message indexing throughput and search speed.

Configure Index Retention with Data Tiering

Hint: Our model for rotation and retention of indices is referred to as data tiering. Data tiering is offered as an option to store and manage data in tiers for specific purposes. See Data Tiering for more information.

You can determine how long you want to retain the data in a new or existing index set. To do so:

Navigate to the Rotation and Retention section on the index set configuration page.
Toggle to the Data Tiering option.
Select the minimum and maximum amount of days you want to store your data.

Data is deleted once the maximum amount of days are reached. If you wish to keep this data, you may select Archive before deletion.

If you have an Enterprise license, you may also select Enable Warm Tier. You can then enter the minimum amount of days you want the data to stay in the hot tier before moving it to warm. You can also designate a warm storage repository to store the index set in. If none exist, you can create a new warm storage repository in this menu.

Configure Index Retention (Legacy)

Warning: We strongly recommend utilizing data tiering for the rotation and retention of index sets. The following legacy strategies will be deprecated in the near future.

Graylog uses configurable index retention strategies to delete old indices. By default, indices can be closed or deleted if they exceed the configured limit.

The Graylog archive offers a separate index retention strategy that you can configure to automatically archive an index before closing or deleting it. Select Archive index to enable this feature. (See Index Model for details on these strategies.)

As with the other index retention strategies, you can configure a maximum number of OpenSearch indices. When there are more indices than the configured limit, the oldest indices are archived in the backend and closed or deleted. You can also choose to do nothing after archiving an index by selecting NONE. In that case, no cleanup of old indices will happen, and you will be able to manage the archive yourself.

Select Streams To Archive

The Streams to Archive setting is included in the archive and allows you to archive only important data as determined by your streams, rather than everything that is brought into Graylog.

Hint: New streams are archived automatically. If you create a new stream and don’t want it to be archived, disable it in this configuration dialog.

Next Steps

Once you have completed setup for Graylog archiving, you can archive data for a specified index set!

Copy

/path/to/archive/
  graylog_201/
    archive-metadata.json
    archive-segment-0.gz
    archive-segment-1.gz
    archive-segment-2.gz