Retrieve Logs from a Data Lake
When you want to search and analyze logs from a Graylog Data Lake, you first need to retrieve the logs so that they can be written to your search backend. You can retrieve logs from both internal and external data lakes, and the procedure is nearly the same for both.
You retrieve logs from specific streams (internal) or tables (external) based on time ranges, and you can apply filters to further limit the logs you retrieve. Restored logs are indexed by the search backend so they are available to search and other analysis.
Prerequisites
Before proceeding, ensure that the following prerequisites are met:
-
You must be a Graylog administrator or have the
Data Lake Userrole to retrieve data from a Data Lake.
Retrieved Data Index
When you retrieve logs from a data lake, the restored data is sent to your search backend so that it can be indexed for search and other operations. Data retrieval creates an index for retrieved data. That index is named with the prefix restored-archive-data-lake followed by a unique numeric string for each retrieval operation. After the data is restored, it is available for search and other functions just as if it had been routed to the search backend originally.
Retrieve Logs from an Internal Data Lake
To retrieve log data from an internal data lake:
-
Navigate to the Overview tab of Data Lake > Internal Lake Setup.
-
Locate your desired stream from the streams list.
-
Click Retrieve logs.
-
In the dialog box, use the date/time pickers to set the Time Range for the log data to retrieve.
-
Select the Filter Retrieval by Original Destination option:
-
Must exclude: Search Cluster: Retrieves log data only from the data lake. This data has not previously been indexed by the search backend.
-
Must include: Search Cluster: Retrieves log data only from your search backend. This data has previously been indexed by the search backend.
-
Include All: Retrieve log data from both sources. This option is the default.
-
-
(Optional) Add filters to limit the log data you retrieve. In the Filter by fields section, click Add filter, then select the field name from the dropdown to filter by, and enter a value to filter on. You can add up to three filters. When using multiple filters, you choose between
ANDorORlogic.Hint: You are not able to apply custom filters or queries beyond the filters available. See Data Lake Preview for more information. -
Click Retrieve.
An estimate appears at the bottom of this dialog box as you make selections showing the amount of data searched through to complete the retrieval. Depending on the amount of data, the retrieval process can take some time to complete.
To view retrieved log data, navigate to Data Lake > Retrievals to see a list of all completed retrievals. From here, you can select Show messages to view the retrieved log data. The Data Lake Jobs section on this page shows any running, queued, or recently completed retrieval operations. Retrieved data can be added to search queries, added to investigations, surfaced in dashboards and reports, and more.
Retrieve Logs from an External Data Lake
To retrieve log data from an external data lake:
-
Navigate to the Data Lake Connectors page at Data Lake > External Lake Connectors.
-
Locate your desired data lake from the connectors list.
-
Click Retrieve logs.
-
In the dialog box, select the table from which you want to retrieve data.
-
Use the date/time pickers to set the Time Range for the log data to retrieve.
-
(Optional) Add filters to limit the log data you retrieve. In the Filter by fields section, click Add filter, then select the field name from the dropdown to filter by, and enter a value to filter on. You can add up to three filters. When using multiple filters, you choose between
ANDorORlogic.Hint: You are not able to apply custom filters or queries beyond the filters available in the dropdown. See Data Lake Preview for more information. -
Click Retrieve.
For external data lakes, no estimate of the amount of data to be retrieved is possible through Graylog. Review and estimate size of your data in your third-party source before beginning a retrieval. Depending on the amount of data, the retrieval process can take some time to complete.
To view retrieved log data, navigate to Data Lake > Retrievals to see a list of all completed retrievals. From here, you can select Show messages to view the retrieved log data. The Data Lake Jobs section on this page shows any running, queued, or recently completed retrieval operations. Retrieved data can be added to search queries, added to investigations, surfaced in dashboards and reports, and more.
Further Reading
Explore the following additional resources and recommended readings to expand your knowledge on related topics:
