Retrieve Logs from a Data Lake

The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

When you want to search and analyze logs from a Data Lake, you first need to retrieve them so that they can be written to your search backend. You can retrieve logs from specific streams based on time ranges, and you can apply filters to further limit the logs you retrieve. Logs are restored to the index set you specified upon initially creating the stream.

Warning: Logs that are routed to a Data Lake and not sent to your search backend do not count against license usage until those logs are retrieved. Log data counts against license usage upon retrieval!

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • You must be a Graylog administrator or have the Data Lake User role to retrieve data from a Data Lake.

Retrieved Data Index

When you retrieve logs from the Data Lake, the restored data is sent to your search backend so that it can be indexed for search and other operations. Data retrieval creates an index for retrieved data. That index is named with the prefix restored-archive-data-lake followed by a unique numeric string for each retrieval operation. After the data is restored, it is available for search and other functions just as if it had been routed to the search backend originally.

Hint: Performing data retrieval does not remove logs from the Data Lake. A copy of restored data still remains in the Data Lake.

Retrieve Logs

To retrieve log data from a Data Lake:

  1. Navigate to the Overview tab of Data Lake > Setup.

  2. Locate your desired stream from the streams list.

  3. Click Retrieve logs.

  4. In the dialog box, use the date/time pickers to set the Time Range for the log data to retrieve.

  5. Select the Filter Retrieval by Original Destination option:

    • Must exclude: Search Cluster: Retrieves log data only from the Data Lake. This data has not previously been indexed by the search backend.

    • Must include: Search Cluster: Retrieves log data only from your search backend. This data has previously been indexed by the search backend.

    • Include All: Retrieve log data from both sources.

  6. (Optional) Add filters to limit the log data you retrieve. In the Filter by fields section, click Add filter, then select the field name from the drop-down list to filter by, and enter a value to filter on. You can add up to three filters. When using multiple filters, you can employ AND and OR logic.

    Hint: You are not able to apply custom filters or queries beyond the filters available. See Data Lake Preview for more information.

  7. Click Retrieve.

Note that an estimate appears at the bottom of this dialog box as you make selections showing the amount of data searched through to complete the retrieval. Depending on the amount of data being retrieved, this process can take some time to complete.

When the retrieval process is begins, Graylog adds the operation to the Data Lake Jobs section of the Overview tab of Data Lake > Setup. When retrieval is complete, you can search your retrieved log data!

Hint: The Data Lake Preview page also includes the ability to retrieve log data. This option lets you preview data to see if it holds information you need before committing to a data retrieval operation.

Further Reading

Explore the following additional resources and recommended readings to expand your knowledge on related topics: