Data Warehouses

The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

A Data Warehouse is a repository for log data that allows you to store large amounts of data that are not immediately required for search and analysis in Graylog but that you still wish to retain. Logs can be routed to a Data Warehouse as a part of the Data Routing function in Graylog. You can utilize Amazon S3 or another preferred network storage solution as backend storage for your Data Warehouses.

Routing logs to a Data Warehouse is enabled on an individual stream basis, and all logs that are filtered from that stream to the Data Warehouse are written to the Data Warehouse immediately after processing. Log data routed to a Data Warehouse can also be retrieved at a later date for search and analysis, event and alert monitoring, building dashboards and reports, and much more.

Hint: If your license expires, you can still write data to the Data Warehouse, but these logs cannot be retrieved until the license is once again valid.

Data Warehouse vs. Archive

Currently, you can utilize both Data Warehouses and archives to preserve your log data long term as both features perform similar functions; however, there are some benefits to utilizing a Data Warehouse for less immediately valuable data. Retrieving logs from a Data Warehouse is a faster process as log retrieval is granular. Additionally, the data in a Data Warehouse is compressed, so it is generally a lower cost option for data storage.

Create a New Storage Backend

Warning: We strongly recommend that you utilize an Amazon S3 bucket as your method of backend storage for logs routed to a Data Warehouse. If you store logs on a local file store and reach your storage capacity, changing your storage backend requires that you delete all of the data housed in your current backend storage solution!

To create a new storage backend for your Data Warehouse:

  1. Navigate to Enterprise > Data Warehouse > Backend. Any existing storage backends are displayed here.

  2. Select Create Data Warehouse Backend.

  3. Select either S3 (preferred) or File system.

    1. For Amazon S3, the following configuration options are available:

      1. Title: A unique and descriptive name for the backend.

      2. Description: Descriptive name of the backend.

      3. S3 Endpoint URL: The URL that provides the location of the S3 server.

      4. AWS Authentication Type: You may choose between automatic or key and secret authentication. For more information, see the AWS credential configuration documentation.

      5. AWS Assume Role (ARN) (optional): The Amazon Resource Name (ARN) with required cross-account permission.

      6. S3 Bucket Name: The name of the S3 bucket in which logs will be stored.

      7. AWS Region: The physical location for your cluster data center.

      8. S3 Output Base Path: The base path where the archives should be stored within the S3 bucket.

        Warning: This value can only be set on backend creation and cannot be changed at a later date!

    2. If you select a different file system storage option, the following configuration options are available:

      1. Title: A unique and descriptive name for the backend.

      2. Description: Descriptive name of the backend.

      3. Output Base Path: The base path where the archives should be stored.

        Warning: This value can only be set on backend creation and cannot be changed at a later date!

  4. Click Create to complete configuration of the storage backend.

Route Your Logs to Your Data Warehouse

To route log data to the backend storage solution you have configured, you need to enable Data Routing on the stream containing the data you wish to store and select Data Warehouse as one of your log destinations. Additionally, you may wish to create filter rules for your selected stream that determine which logs are sent to the Data Warehouse and which logs may be sent to other preferred destinations, like an index set or an output.

Retrieve Your Logs from a Data Warehouse

If you wish to search and analyze your logs from a Data Warehouse, you first need to retrieve them so that they can be written to your search backend. Logs are restored to the index set you specified upon initially creating the stream.

Warning: Logs that were routed to a Data Warehouse and not previously sent to your search backendwill not count against license usage until those logs are retrieved. Log data counts against license usage upon retrieval! 

  1. Navigate to the stream that contains the data you wish to retrieve by selecting Streams from the top-level menu.

  2. Locate your desired stream from the list and select Data Routing.

  3. From the Data Routing menu, proceed to the third step, Destinations, and select the Data Warehouse destination.

  4. Under the Actions column, select Retrieve from Data Warehouse.

  5. In this menu, you can select the time range of the data you wish to retrieve.

  6. Additionally, you can choose to retrieve data solely from the Data Warehouse or your search backend, or you can choose to retrieve the data from both sources.

  7. Once you have made your selections, select Retrieve. (Note that a time estimation for how long retrieval can take will appear. Depending on the size of the data being retrieved, this process can take some time to complete.)

  8. When the retrieval process is complete, Graylog provides you with a notification. Now you may search your retrieved log data!

Change Your Storage Backend

Warning: When you change your storage backend, you are required to delete all the data stored in your current backend. At this time, we recommend that you do NOT change your storage backend unless absolutely necessary because this data will be lost!

  1. Navigate to EnterpriseData Warehouse > Configuration.

  2. Under the Configuration information menu, select the down arrow next to your current Active Backend and select your new desired backend.

  3. Select Update configuration.

  4. At this point, Graylog prompts you to confirm that you wish to change your storage backend. Please note that Graylog recommends you do not change your storage backend! All the data written to this storage location must be deleted and cannot be retrieved at a later date.

  5. Once you confirm you wish to proceed, you are prompted again to confirm that you wish to delete the data in your current backend. Select Confirm to proceed.

  6. The storage backend has now been removed from Graylog, and you can create a new storage backend for your Data Warehouses.