Create a Data Lake Backend on Amazon S3
The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.
Before you can start routing Graylog data to a Data Lake, you must first set up your backend storage.
You can also use Google Cloud Services (GCS) or a local network storage to set up your backend.
Prerequisites
Before proceeding, ensure that the following prerequisites are met:
-
You must be a Graylog administrator to set up and manage a Data Lake.
-
To use Amazon S3, you must have an existing Amazon S3 bucket and appropriate access credentials.
Create an Amazon S3 Storage Backend
To create an S3 storage backend for your Data Lake:
-
Navigate to Data Lake > Setup. If you have existing backends, select the Backend tab. Any existing storage backends are displayed here.
-
Select Create Data Lake Backend.
-
Select S3 from the dropdown as the Backend Type.
-
Enter configuration details for your S3 backend:
-
Click Create to complete configuration of the storage backend.
-
Click Activate to make this the active storage backend. You must activate the backend before it can be used for storage. You can have multiple backends defined, but only one can be active. See the warning below about data loss if you are switching from an existing storage backend.
Title |
Enter a unique and descriptive name for the backend. |
Description |
Enter a description of the backend. |
S3 Endpoint URL |
Enter the URL that provides the location of the S3 server. |
AWS Authentication Type |
Choose between automatic or key and secret authentication. For more information, see the AWS credential configuration documentation. |
AWS Assume Role (ARN) (optional) |
Enter the Amazon Resource Name (ARN) with required cross-account permission. |
S3 Bucket Name |
Enter the name of the S3 bucket in which logs will be stored. |
AWS Region |
Select the physical location for your cluster data center. |
S3 Output Base Path |
Enter the base path where the archives should be stored within the S3 bucket. You can use a single bucket for multiple purposes. For instance, you could use the same bucket for a Data Lake backend and a warm tier snapshot backend. However, if you do, it is important to use different sub folders for each specific use. The base path you set here determines the sub folder structure for this backend.
|
If you need to update settings for the Data Lake, such as changing access credentials, click Edit. You are presented with the same options as on initial creation. As noted, you cannot change the Output Base Path after your initial save, but you can update the other settings.
Change Your Storage Backend
To change your active storage backend:
-
Create a new storage backend or select one you have previously created.
-
Click Activate.
Graylog prompts you to confirm you want to change your storage backend. Graylog recommends you do not change your storage backend! All the data written to the previous storage backend must be deleted before you can switch.
Warning: Deleting Data Lake data requires you to first stop routing data to the Data Lake. Note that if the affected streams are routing only to the Data Lake, you risk losing new data until you complete the process and start routing again with the new storage backend. -
Click Confirm to proceed.
The storage backend has now been switched. As new logs arrive, they are routed to the newly activated Data Lake storage backend.
Delete Backend Data
Before you can switch a storage backend, you must delete any data in the old storage backend. It is recommended that you delete this data with the following steps:
-
Navigate to the Overview tab of Data Lake > Setup.
-
Disable the Data Lake for each stream that is routing data to this backend. Click Data Routing, then toggle the Data Lake to Disabled.
-
Delete the data from each stream.
-
Select More > Delete.
-
Select the Full Delete check box.
-
Click Delete.
-
-
Verify that the message count for all streams hits 0.
Further Reading
Explore the following additional resources and recommended readings to expand your knowledge on related topics: