Retrieve Logs from a Data Lake

The following article exclusively pertains to a Graylog Enterprise feature or functionality. To learn more about obtaining an Enterprise license, please contact the Graylog Sales team.

When you want to search and analyze logs from a Graylog Data Lake, you first need to retrieve the logs so that they can be written to your search backend. You can retrieve logs from both internal and external data lakes, and the procedure is nearly the same for both.

You retrieve logs from specific streams (internal) or tables (external) based on time ranges, and you can apply filters to further limit the logs you retrieve. Restored logs are indexed by the search backend so they are available to search and other analysis.

Warning: Logs that are routed to an internal Data Lake and not sent to your search backend do not count against license usage until those logs are retrieved. For both internal and external data lakes, log data counts against license usage upon retrieval!

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • You must be a Graylog administrator or have the Data Lake User role to retrieve data from a Data Lake.

Retrieved Data Index

When you retrieve logs from a data lake, the restored data is sent to your search backend so that it can be indexed for search and other operations. Data retrieval creates an index for retrieved data. That index is named with the prefix restored-archive-data-lake followed by a unique numeric string for each retrieval operation. After the data is restored, it is available for search and other functions just as if it had been routed to the search backend originally.

Hint: Performing data retrieval does not remove logs from the data lake. Original data remains in the data lake.

Retrieve Logs from an Internal Data Lake

To retrieve log data from an internal data lake:

  1. Navigate to the Overview tab of Data Lake > Internal Lake Setup.

  2. Locate your desired stream from the streams list.

  3. Click Retrieve logs.

  4. In the dialog box, use the date/time pickers to set the Time Range for the log data to retrieve.

  5. Select the Filter Retrieval by Original Destination option:

    • Must exclude: Search Cluster: Retrieves log data only from the data lake. This data has not previously been indexed by the search backend.

    • Must include: Search Cluster: Retrieves log data only from your search backend. This data has previously been indexed by the search backend.

    • Include All: Retrieve log data from both sources. This option is the default.

  6. (Optional) Add filters to limit the log data you retrieve. In the Filter by fields section, click Add filter, then select the field name from the dropdown to filter by, and enter a value to filter on. You can add up to three filters. When using multiple filters, you choose between AND or OR logic.

    Hint: You are not able to apply custom filters or queries beyond the filters available. See Data Lake Preview for more information.

  7. Click Retrieve.

An estimate appears at the bottom of this dialog box as you make selections showing the amount of data searched through to complete the retrieval. Depending on the amount of data, the retrieval process can take some time to complete.

To view retrieved log data, navigate to Data Lake > Retrievals to see a list of all completed retrievals. From here, you can select Show messages to view the retrieved log data. The Data Lake Jobs section on this page shows any running, queued, or recently completed retrieval operations. Retrieved data can be added to search queries, added to investigations, surfaced in dashboards and reports, and more.

Hint: The Data Lake Preview page also includes the ability to retrieve log data. This option lets you preview data to see if it holds information you need before committing to a data retrieval operation.

Retrieve Logs from an External Data Lake

To retrieve log data from an external data lake:

  1. Navigate to the Data Lake Connectors page at Data Lake > External Lake Connectors.

  2. Locate your desired data lake from the connectors list.

  3. Click Retrieve logs.

  4. In the dialog box, select the table from which you want to retrieve data.

  5. Use the date/time pickers to set the Time Range for the log data to retrieve.

  6. (Optional) Add filters to limit the log data you retrieve. In the Filter by fields section, click Add filter, then select the field name from the dropdown to filter by, and enter a value to filter on. You can add up to three filters. When using multiple filters, you choose between AND or OR logic.

    Hint: You are not able to apply custom filters or queries beyond the filters available in the dropdown. See Data Lake Preview for more information.

  7. Click Retrieve.

For external data lakes, no estimate of the amount of data to be retrieved is possible through Graylog. Review and estimate size of your data in your third-party source before beginning a retrieval. Depending on the amount of data, the retrieval process can take some time to complete.

To view retrieved log data, navigate to Data Lake > Retrievals to see a list of all completed retrievals. From here, you can select Show messages to view the retrieved log data. The Data Lake Jobs section on this page shows any running, queued, or recently completed retrieval operations. Retrieved data can be added to search queries, added to investigations, surfaced in dashboards and reports, and more.

Hint: The Data Lake Preview page also includes the ability to retrieve log data. This option lets you preview data to see if it holds information you need before committing to a data retrieval operation.

Further Reading

Explore the following additional resources and recommended readings to expand your knowledge on related topics: