Data Warehouses
The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.
A Data Warehouse is a repository for log data that allows you to store large amounts of data that are not immediately required for search and analysis in Graylog but that you still wish to retain. Logs can be routed to a Data Warehouse as a part of the Data Routing function in Graylog. You can utilize Amazon S3 or another preferred network storage solution as backend storage for your Data Warehouses.
Routing logs to a Data Warehouse is enabled on an individual stream basis, and all logs that are filtered from that stream to the Data Warehouse are written to the Data Warehouse immediately after processing. Log data routed to a Data Warehouse can also be retrieved at a later date for search and analysis, event and alert monitoring, building dashboards and reports, and much more.
Data Warehouse vs. Archive
Currently, you can utilize both Data Warehouses and archives to preserve your log data long term as both features perform similar functions; however, there are some benefits to utilizing a Data Warehouse for less immediately valuable data. Retrieving logs from a Data Warehouse is a faster process as log retrieval is granular. Additionally, the data in a Data Warehouse is compressed, so it is generally a lower cost option for data storage.
Create a New Storage Backend
To create a new storage backend for your Data Warehouse:
-
Navigate to Enterprise > Data Warehouse > Backend. Any existing storage backends are displayed here.
-
Select Create Data Warehouse Backend.
-
Select either S3 (preferred) or File system.
-
For Amazon S3, the following configuration options are available:
-
Title: A unique and descriptive name for the backend.
-
Description: Descriptive name of the backend.
-
S3 Endpoint URL: The URL that provides the location of the S3 server.
-
AWS Authentication Type: You may choose between automatic or key and secret authentication. For more information, see the AWS credential configuration documentation.
-
AWS Assume Role (ARN) (optional): The Amazon Resource Name (ARN) with required cross-account permission.
-
S3 Bucket Name: The name of the S3 bucket in which logs will be stored.
-
AWS Region: The physical location for your cluster data center.
-
S3 Output Base Path: The base path where the archives should be stored within the S3 bucket.
Warning: This value can only be set on backend creation and cannot be changed at a later date!
-
-
If you select a different file system storage option, the following configuration options are available:
-
Title: A unique and descriptive name for the backend.
-
Description: Descriptive name of the backend.
-
Output Base Path: The base path where the archives should be stored.
Warning: This value can only be set on backend creation and cannot be changed at a later date!
-
-
-
Click Create to complete configuration of the storage backend.
Route Your Logs to Your Data Warehouse
To route log data to the backend storage solution you have configured, you need to enable Data Routing on the stream containing the data you wish to store and select Data Warehouse as one of your log destinations. Additionally, you may wish to create filter rules for your selected stream that determine which logs are sent to the Data Warehouse and which logs may be sent to other preferred destinations, like an index set or an output.
Retrieve Your Logs from a Data Warehouse
If you wish to search and analyze your logs from a Data Warehouse, you first need to retrieve them so that they can be written to your search backend. Logs are restored to the index set you specified upon initially creating the stream.
-
Navigate to the stream that contains the data you wish to retrieve by selecting Streams from the top-level menu.
-
Locate your desired stream from the list and select Data Routing.
-
From the Data Routing menu, proceed to the third step, Destinations, and select the Data Warehouse destination.
-
Under the Actions column, select Retrieve from Data Warehouse.
-
In this menu, you can select the time range of the data you wish to retrieve.
-
Additionally, you can choose to retrieve data solely from the Data Warehouse or your search backend, or you can choose to retrieve the data from both sources.
-
Once you have made your selections, select Retrieve. (Note that a time estimation for how long retrieval can take will appear. Depending on the size of the data being retrieved, this process can take some time to complete.)
-
When the retrieval process is complete, Graylog provides you with a notification. Now you may search your retrieved log data!
Change Your Storage Backend
-
Navigate to Enterprise > Data Warehouse > Configuration.
-
Under the Configuration information menu, select the down arrow next to your current Active Backend and select your new desired backend.
-
Select Update configuration.
-
At this point, Graylog prompts you to confirm that you wish to change your storage backend. Please note that Graylog recommends you do not change your storage backend! All the data written to this storage location must be deleted and cannot be retrieved at a later date.
-
Once you confirm you wish to proceed, you are prompted again to confirm that you wish to delete the data in your current backend. Select Confirm to proceed.
-
The storage backend has now been removed from Graylog, and you can create a new storage backend for your Data Warehouses.