Create an External Data Lake Connector

Graylog external Data Lake allows you to connect to existing third-party data lakes by defining connectors. These connectors enable you to preview and retrieve log data similarly to how these functions work for an internal data lake.

Warning: Retrieving log data from an external data lake counts against license use. Previewing logs in place does not count against your license.

For complete information about internal and external data lakes, see Data Lakes.

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • You must be a Graylog administrator to set up and manage a data lake connector.

  • To use an external data lake, you must have an existing third-party data lake source and appropriate access credentials.

Hint: Currently, only Amazon Security Lake is supported for external data lakes.

The Data Retrieval Stream

Each connector you create must be associated with a system-managed stream to which data retrieved from this connector is stored. This stream is created automatically when you create a connector, at which time you must assign a name to the stream. You cannot add stream rules, pipeline rules, routing destinations, or filter rules to a stream associated with a connector.

Users without the Admin role must be granted permission to view this stream in order to select this connector on the Data Lake > Preview page. For details about roles and sharing in Graylog, see Permission Management.

Create a Connector

To create a third-party data lake connector:

  1. Navigate to Data Lake > External Lake Connectors.

  2. Select Add Connector.

  3. Enter all required information:

    Data Lake Name

    Enter a unique and descriptive name for this data lake.

    S3 Output Bucket

    Enter the path to the S3 bucket where data lake query results are stored temporarily.

    AWS Region

    Select the AWS region where this service is running. If you want to connect log data from different regions, you must create separate connectors for each region or create a rollup region in your Amazon Security Lake.

    AWS IAM Role

    Enter the Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role that Graylog will assume to access the external data lake. AWS recommends using IAM roles with temporary credentials instead of long-term static access keys. This is the preferred authentication method and supports cross-account access.

    AWS Access Key (optional)

    Enter the AWS access key ID associated with an IAM user. Use this option only if role-based authentication is not feasible.

    AWS Secret Key (optional)

    Enter the AWS secret access key associated with the IAM user. Use this option only if role-based authentication is not feasible.

    Stream Title

    Enter a descriptive name for the stream associated with this data lake. Remember, each external data lake has one associated system-managed stream, which is used for retrieved data.

    Stream Description (Optional)

    Use this field to provide a detailed description of the log data routed to this stream, if desired.

  4. Click Save.

The new connector is added to the list on the Data Lake Connectors page. You can use this list to preview data in the data lake or begin a data retrieval operation.

Further Reading

Explore the following additional resources and recommended readings to expand your knowledge on related topics: