Self-Managed OpenSearch

Review the following information for Graylog installations deployed with self-managed OpenSearch.

Installing OpenSearch

Warning: We caution you not to install or upgrade to OpenSearch 2.16+! It is not supported. Doing so will break your instance!

The installation process for OpenSearch is similar to Elasticsearch. Noteworthy differences between Elasticsearch and OpenSearch from an installation perspective include the software packages and minor differences in parameter names within configuration files.

When installing the OpenSearch software, its destination should be different from any existing Elasticsearch software. Depending on how the OpenSearch software is deployed, be mindful of where the archived contents are extracted (e.g. tarballs). This will prevent overwriting Elasticsearch configuration files and data in the indices.

At the time of writing, OpenSearch is available for download via HTTP and installation via the following package types depending on your operating system and/or method of deployment:

  • Tarball

  • RPM package (available in v1.3.2 & above)

  • YUM repository

  • Docker image

The configuration file for an OpenSearch node also has a similar location to an Elasticsearch node:

  • Linux (RPM/YUM): /etc/opensearch/opensearch.yml

  • Tar-ball: /opensearch-1.x.x/config/opensearch.yml

  • Docker: /usr/share/opensearch/config/opensearch.yml

Graylog has tested upgrades of Elasticsearch versions 6.8.23 and 7.10.2 to OpenSearch versions 1.1-2.3 on the following platforms:

  • Red Hat Enterprise Linux 8 (RPM+YUM installation)

  • Ubuntu 20.04 LTS (Tar-ball installation)

  • Docker Engine v20.10.17

For specific installation instructions, see the comprehensive OpenSearch installation documentation.

Upgrading to OpenSearch

Warning: We caution you not to install or upgrade to OpenSearch 2.16+! It is not supported. Doing so will break your instance!

There are three different approaches to upgrading from Elasticsearch to OpenSearch.

  • Full-cluster restart upgrade (in-place)
  • Rolling restart upgrade (in-place)
  • Restore snapshot (new cluster)

The recommended upgrade process for most of the Graylog community is the full-cluster restart upgrade (in-place). Therefore, this method will be the primary focus of our upgrade guide. You can, however, find a high-level overview of each method in the following sections to help you choose the right method for your needs and environment.

In-Place Upgrades

In-place upgrade methods repurpose your existing Elasticsearch nodes and are more like a software upgrade than a software migration. You will not need to create and restore a snapshot of your Elasticsearch data with an in-place upgrade.

The two types of in-place upgrades are a full-cluster restart and a rolling restart.

There are several differences between the two methods, most importantly, varying levels of complexity, risk for error, and downtime. The full-cluster restart process shuts down the entire Elasticsearch cluster while the rolling restart method only shuts down and upgrades one Elasticsearch node at a time until all nodes in the cluster are running OpenSearch.

Full-Cluster Restart

The full-cluster restart upgrade is generally considered the simpler of the two in-place upgrade methods. This method consists of shutting down the entire Elasticsearch cluster, installing and configuring the OpenSearch software, copying data from Elasticsearch data.dir file systems to OpenSearch data.dir file systems, and then starting up the OpenSearch cluster. This method requires your Graylog nodes to have sufficient available disk space to store all incoming messages in the journal while OpenSearch is installed and configured.

Before you install OpenSearch, find the PATH assigned to the data.dir of your Elasticsearch nodes. You can find this parameter and its assigned value within the elasticearch.yml files. This assigned value defines the file system location of your Elasticsearch indices and other data. It is important to note this location so you do not overwrite it during the installation or configuration of OpenSearch. If you plan to reuse the same file system location of your former Elasticsearch nodes, then there is no concern for overwriting your Elasticsearch indices.

After the OpenSearch software is installed and configured on each node of the Elasticsearch cluster, copy the data within the Elasticsearch nodes data.dir to the OpenSearch nodes data.dir. This enables you to reuse your existing data in the new OpenSearch cluster while offering a potential method to revert. It is technically possible to configure OpenSearch to use the same data.dir as the former Elasticsearch nodes; however, doing so prevents you from being able to revert to the previous working state.

Once the Elasticsearch data has been copied into the OpenSearch data.dir file system locations, all of the nodes of the OpenSearch cluster can be started. When the OpenSearch cluster reaches a "green" state, you will then need to restart all of your Graylog nodes to complete the upgrade process.

As the entire Elasticsearch cluster is offline in this method, no changes can be made to its data; therefore, no time is spent waiting on replication between shards, unlike with the rolling restart upgrade.

Rolling Restart

A rolling restart upgrade is defined here as keeping your Elasticsearch cluster online throughout the process of upgrading it to OpenSearch. This method is more complex, slower, and prone to error than a full-cluster restart, and it requires replicas for every index of every Graylog index set; however, it allows for the indexing of incoming messages and search queries to be serviced by Graylog throughout the upgrade process.

Warning: Should you want to do a rolling upgrade, begin by upgrading the OS 1.x to its latest version (1.3.8) and then continue with 2.x; otherwise, there is a risk of communication errors between nodes of the cluster occurring.

Steps involved with the rolling restart method span, but are not limited to, disabling and re-enabling shard allocation when you upgrade an Elasticsearch node to OpenSearch to waiting for the cluster to perform replication in the "yellow" state to then return to a "green" state before proceeding to upgrade the next Elasticsearch node in the cluster.

These steps can complicate what might otherwise be a more straightforward process to upgrade the entire Elasticsearch cluster while it is offline. And like a full-cluster restart, you need to make copies of Elasticsearch data.dir file systems for OpenSearch to reuse in a different file system location for its nodes' data.dir, unless you wish to forgo the potential of reverting.

Once all nodes have been successfully upgraded to OpenSearch and the cluster is in a healthy "green" state, the Graylog nodes must then be restarted as the final step in the upgrade process.

OpenSearch provides a detailed step-by-step description of how to upgrade Elasticsearch node(s) to OpenSearch node(s), including notation on specific steps to repeat when doing a rolling-restart upgrade.

WarningRolling upgrades must first be done on all data nodes before dedicated leader nodes can be upgraded from Elasticsearch v7.10.2 to OpenSearch 2.x.

New Cluster Upgrade

The new-cluster upgrade method requires a duplicate OpenSearch cluster configured exactly like your existing Elasticsearch cluster. This method can be done with virtual machines or other platforms that may not require new hardware and financial resources.

After installation and configuration validation, while running both clusters simultaneously, a snapshot of the data is created on the Elasticsearch cluster and restored to the OpenSearch cluster. After that, you must reconfigure Graylog to use the new OpenSearch cluster. Restart Graylog for the changes to take effect.

Another way of framing this method is a blue/green deployment. A great example of the new cluster upgrade method can be found with some vendors that offer Elasticsearch as a service and blue/green deployment features.

AWS OpenSearch Service

In some cases, Graylog environments make use of Elasticsearch as a service. The most common example is the Amazon AWS OpenSearch service, formerly Elasticsearch service. Upgrading from one major version of Elasticsearch to another (or to OpenSearch) is done via the new cluster upgrade method, e.g. blue/green deployment. This method only requires a change to the configuration of the AWS OSS domain that defines the version of Elasticsearch or OpenSearch to use, and AWS manages the rest of the upgrade.

As a best practice, create a snapshot before you initiate the upgrade. More information on this upgrade process can be found in the AWS documentation.

Graylog Configuration Settings

The most important setting to make a successful connection is a list of comma-separated URIs to one or more OpenSearch nodes. Graylog needs to know the address of at least one other OpenSearch node given in the elasticsearch_hosts setting. The specified value should at least contain the scheme (http:// for unencrypted, https:// for encrypted connections), the hostname or IP and the port of the HTTP listener of this node (which is 9200 unless otherwise configured). Optionally, you can also specify an authentication section containing a user name and a password, if either of your OpenSearch nodes use Shield/X-Pack or Search Guard, or you have an intermediate HTTP proxy requiring authentication between the Graylog server and the OpenSearch node. Additionally you can specify an optional path prefix at the end of the URI.

A sample specification of elasticsearch_hosts:

Copy
elasticsearch_hosts = http://es-node-1.example.org:9200/foo,https://someuser:somepassword@es-node-2.example.org:19200
Warning: Graylog assumes that all nodes in the cluster are running the same versions of OpenSearch. While it still might work when patch-levels differ, we highly encourage you to keep versions consistent.

Graylog does not currently react to externally triggered index changes (creating/closing/reopening/deleting an index). These actions need to be performed through the Graylog REST API in order to retain index consistency.

Available OpenSearch Configuration Tunables

The following configuration options are used to configure connectivity to OpenSearch:

Config Setting Type Comments Default
elasticsearch_connect_timeout Duration Timeout when connection to individual OpenSearch hosts 10s (10 Seconds)
elasticsearch_hosts List<URI> Comma-separated list of URIs of OpenSearch hosts http://127.0.0.1:9200
elasticsearch_idle_timeout Duration Timeout after which idle connections are terminated -1s (Never)
elasticsearch_max_total_connections int Maximum number of total OpenSearch connections 20
elasticsearch_max_total_connections_per_route int Maximum number of OpenSearch connections per route/host 2
elasticsearch_socket_timeout Duration Timeout when sending/receiving from OpenSearch connection 60s (60 Seconds)
elasticsearch_discovery_enabled boolean Enable automatic OpenSearch node discovery false
elasticsearch_discovery_default_user String The default username used for authentication for all newly discovered nodes. empty (no authentication used for discovered nodes)

elasticsearch_discovery_default_password

String

The default password used for authentication for all newly discovered nodes.

empty (no authenticationused for discovered nodes)

elasticsearch_discovery_default_scheme

String

The default scheme used for all newly discovered nodes.

http

elasticsearch_discovery_filter

String

Filter by node attributes for the discovered nodes

empty (use all nodes)

elasticsearch_discovery_frequency

Duration

Frequency of the OpenSearch node discovery

30s (30 Seconds)

elasticsearch_compression_enabled

boolean

Enable GZIP compression of Elasticseach request payloads

false

elasticsearch_version

String

Major version of the OpenSearch version used. If not specified, the version will be auto-sensed with the configured nodes. Will disable auto-sensing if specified.

<not set> (auto-sense)

Values: 6 / 7

elasticsearch_mute_deprecation_warnings

boolean

Enable muting of deprecation warnings for deprecated configuration settings in OpenSearch. These warnings are attached as “Warnings” in HTTP-Response headers and might clutter up the logs. Works only with ES7.

false

elasticsearch_version_probe_attempts

int

Maximum number of retries to connect to OpenSearch on boot for the version probe before finally giving up. Use 0 to try until a connection can be made.

0 (defaults to try toconnect until a connectioncould be made)

elasticsearch_version_probe_delay

Duration

Waiting time in between connection attempts for elasticsearch_version_probe_attempts

5s (defaults to waitfor 5 seconds between retries)

Automatic Version Sensing

We support multiple major versions of OpenSearch (starting with Graylog 4.0) which are partially incompatible with each other (ES6 & ES7). Therefore, we need to know which OpenSearch version is running in the cluster. This is why we make a single request to the first reachable OpenSearch node and parse the version of the response it sends back. There are a few things which could go wrong at this point. You might want to run an unsupported version. If you feel comfortable doing so, you can set the elasticsearch_version configuration variable. It will disable auto-sensing and force Graylog to pretend that this OpenSearch major version is running in the cluster. It will load the corresponding support module.

Automatic Node Discovery

WarningAutomatic node discovery does not work when using Amazon OpenSearch Service because Amazon blocks certain OpenSearch API endpoints.

Graylog uses automatic node discovery to gather a list of all available OpenSearch nodes in the cluster at runtime and distributes requests among them to potentially increase their performance and availability. To enable this feature, you need to set the elasticsearch_discovery_enabled to true. Optionally, you can define a filter allowing to selectively include/exclude discovered nodes using the elasticsearch_discovery_filter setting, or by tuning the frequency of the node discovery using the elasticsearch_discovery_frequency configuration option. If your OpenSearch cluster uses authentication, you need to specify the elasticsearch_discovery_default_user and elasticsearch_discovery_default_password settings. The username/password specified in these settings will be used for all nodes discovered in the cluster. If your cluster uses HTTPS, you also need to set the elasticsearch_discovery_default_scheme setting. It specifies the scheme used for discovered nodes and must be consistent across all nodes in the cluster.

Configuration of OpenSearch Nodes

Control Access to OpenSearch Ports

If you are not using Shield/X-Pack or Search Guard to authenticate access to your OpenSearch nodes, make sure to restrict access to the OpenSearch ports (default: 9200/tcp and 9300/tcp). Otherwise the data is readable by anyone who has access to the machine over a network.

Open File Limits

Because OpenSearch has to keep a lot of files open simultaneously it requires a higher open file limit than most operating system defaults allow. Set it to at least 64000 open file descriptors.

Graylog will show a notification in the web interface when there is a node in the OpenSearch cluster which has an open file limit that is too low.

Heap Size

We strongly recommended that you raise the standard size of heap memory allocated to OpenSearch. For example, set the ES_HEAP_SIZE environment variable to 24g to allocate 24GB. We also recommend using around 50% of the available system memory for OpenSearch (when running on a dedicated host) to leave enough space for the system caches that OpenSearch uses to a great extent. But please take care that you don’t exceed 32 GB!

Tuning OpenSearch

Graylog sets specific configurations for every index it manages. This tuning is sufficient for a lot of use cases and setups.

Avoiding Split-Brain and Shard Shuffling

Split-Brain Events

OpenSearch sacrifices consistency in order to ensure availability and partition tolerance. The reasoning behind this is that short periods of misbehavior are less problematic than short periods of unavailability. In other words, when OpenSearch nodes within a cluster are unable to replicate changes to data, they will keep serving applications such as Graylog. When the nodes are able to replicate their data, they will attempt to converge the replicas and achieve eventual consistency .

OpenSearch tackles the previous by electing leader nodes, which are in charge of database operations such as creating new indices, moving shards around the cluster nodes and so forth. Leader nodes coordinate their actions actively with others, ensuring that the data can be converged by non-leaders. The cluster nodes that are not leader nodes are not allowed to make changes that would break the cluster.

The previous mechanism can in some circumstances fail, causing a split-brain event. When an OpenSearch cluster is split into two sections which work on the data independently, data consistency is lost. As a result nodes will respond differently to the same queries. This is considered a catastrophic event because the data originating from the two leaders can not be rejoined automatically and it takes quite a bit of manual work to remedy the situation.

Avoiding Split-Brain Events

OpenSearch nodes take a simple majority vote over who is leader. If the majority agrees on one, then most likely the disconnected minority will give in and everything will be just fine. This mechanism requires that at least 3 nodes work together, merely one or two nodes can not form a majority.

The minimum amount of leader nodes required to elect a leader must be configured manually in elasticsearch.yml:

Copy
# At least NODES/2+1 on clusters with NODES > 2, where NODES is the number of master nodes in the cluster
discovery.zen.minimum_master_nodes: 2

An example of what configuration values should typically be:

Leader Nodes minimum_master_nodes Comments
1 1  
2 1 With 2 the other nodes going down, this would stop the cluster from working!
3 2  
4 3  
5 3  
6 4  

Some of the leader nodes may be dedicated leader nodes, meaning that they are only configured to handle lightweight operational (cluster management) responsibilities. They will not be able to handle or store any of the cluster’s data. The function of such nodes is similar to so called witness servers on other database products. Setting them up on dedicated witness sites will greatly reduce the risk of OpenSearch cluster instability.

A dedicated leader node has the following configuration in elasticsearch.yml:

Copy
node.data: false
node.master: true

Shard Shuffling

When the cluster status changes because of a node restart or availability issues, OpenSearch will start automatically rebalancing the data in the cluster. The cluster works on making sure that the amount of shards and replicas will conform to the cluster configuration. This is a problem if status changes are just temporary. Moving shards and replicas around in the cluster takes up a considerable amount of resources and should be done only when necessary.

Avoiding Unnecessary Shuffling

OpenSearch has a couple of configuration options which are designed to allow short times of unavailability before starting the recovery process with shard shuffling. There are 3 settings that may be configured in elasticsearch.yml:

  • gateway.recover_after_nodes: 8

    • Recovers only after the given number of nodes have joined the cluster. Can be seen as "minimum number of nodes to attempt recovery at all."

  • gateway.recover_after_time: 5m

    • Time to wait for additional nodes after recover_after_nodes is met.

  • gateway.expected_nodes: 10

    • Informs OpenSearch about how many nodes form a full cluster. If this number is met, start up immediately.

The configuration options should be set up so that only minimal node unavailability is tolerated. For example server restarts are common and should be managed. The logic is that if you lose large parts of your cluster, you should not tolerate the situation and you probably should start re-shuffling the shards and replicas.

Custom Index Mappings

Sometimes it’s better to define a stricter schema for messages.

Hint: If the index mapping is conflicting with the actual message to be sent to OpenSearch, that message will fail to be indexed.

Graylog itself uses a default mapping which includes settings for the timestamp, message, full_message, and source fields of indexed messages:

Copy
$ curl -X GET 'http://localhost:9200/_template/graylog-internal?pretty'
{
"graylog-internal" : {
  "order" : -1,
  "index_patterns" : [
    "graylog_*"
  ],
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "analyzer_keyword" : {
            "filter" : "lowercase",
            "tokenizer" : "keyword"
          }
        }
      }
    }
  },
  "mappings" : {
    "message" : {
      "_source" : {
        "enabled" : true
      },
      "dynamic_templates" : [
        {
          "internal_fields" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string",
            "match" : "gl2_*"
          }
        },
        {
          "store_generic" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string"
          }
        }
      ],
      "properties" : {
        "gl2_processing_timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "gl2_accounted_message_size" : {
          "type" : "long"
        },
        "gl2_receive_timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "full_message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "streams" : {
          "type" : "keyword"
        },
        "source" : {
          "fielddata" : true,
          "analyzer" : "analyzer_keyword",
          "type" : "text"
        },
        "message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        }
      }
    }
  },
  "aliases" : { }
}

In order to extend the default mapping of OpenSearch and Graylog, you can create one or more custom index mappings and add them as index templates to OpenSearch.

Let’s say we have a schema for our data like the following:

Field Name Field Type Example
http_method keyword GET
http_response_code long 200
ingest_time date 2016-06-13T15:00:51.927Z
took_ms long 56

This would translate to the following additional index mapping in OpenSearch:

Copy
"mappings" : {
  "message" : {
    "properties" : {
      "http_method" : {
        "type" : "keyword"
      },
      "http_response_code" : {
        "type" : "long"
      },
      "ingest_time" : {
        "type" : "date",
        "format": "strict_date_time"
      },
      "took_ms" : {
        "type" : "long"
      }
    }
  }
}

When Graylog creates a new index in OpenSearch, it has to be added to an index template in order to apply additional index mapping. The Graylog default template (graylog-internal) has the lowest priority, and OpenSearch will merge it with the custom index template.

Warning: If default index mapping and custom index mapping cannot be merged (e. g. because of conflicting field datatypes), OpenSearch will throw an exception and won’t create the index. So be extremely cautious and conservative about the custom index mappings!

Creating a New Index Template

Save the following index template for the custom index mapping into a file named graylog-custom-mapping.json:

Copy
{
  "template": "graylog_*",
  "mappings": {
    "properties": {
      "http_method": {
        "type": "keyword"
      },
      "http_response_code": {
        "type": "long"
      },
      "ingest_time": {
        "type": "date",
        "format": "strict_date_time"
      },
      "took_ms": {
        "type": "long"
      }
    }
  }
}

Finally, load the index mapping into OpenSearch with the following command:

Copy
$ curl -X PUT -d @'graylog-custom-mapping.json' -H 'Content-Type: application/json' 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

Every OpenSearch index created thereon, will have an index mapping consisting of the original graylog-internal index template and the new graylog-custom-mapping template:

Copy
$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "dynamic_templates" : [
          {
            "internal_fields" : {
              "match" : "gl2_*",
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          },
          {
            "store_generic" : {
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          }
        ],

        "properties" : {
          "full_message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "http_method" : {
            "type" : "keyword"
          },
          "http_response_code" : {
            "type" : "long"
          },
          "ingest_time" : {
            "type" : "date",
            "format" : "strict_date_time"
          },
          "message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "text",
            "analyzer" : "analyzer_keyword",
            "fielddata" : true
          },
          "streams" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          },
          "took_ms" : {
            "type" : "long"
          }
        }
      }
    }
  }
}
Hint: When using different index sets, each can have its own mapping.

Deleting Custom Index Templates

If you want to remove an existing index template from OpenSearch, simply issue a DELETE request to OpenSearch:

Copy
$ curl -X DELETE 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

After you’ve removed the index template, new indices will only have the original index mapping:

Copy
$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "dynamic_templates" : [
          {
            "internal_fields" : {
              "match" : "gl2_*",
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          },
          {
            "store_generic" : {
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          }
        ],

        "properties" : {
          "full_message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "text",
            "analyzer" : "analyzer_keyword",
            "fielddata" : true
          },
          "streams" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}
Hint: Settings and index mappings in templates are only applied to new indices. After adding, modifying, or deleting an index template, you have to manually rotate the write-active indices of your index sets for the changes to take effect.

Rotate Indices Manually

Select the desired index set on the System > Indices page in the Graylog web interface by clicking on the name of the index set, then select “Rotate active write index” from the “Maintenance” drop-down menu.

Cluster Status Explained

The cluster status applies to different levels:

  • Shard level - see status descriptions below

  • Index level - inherits the status of the worst shard status

  • Cluster level - inherits the status of the worst index status

That means that the OpenSearch cluster status will turn red if a single index or shard has problems even though the rest of the indices/shards are okay.

Hint: Graylog checks the status of the current write index while indexing messages. If it is GREEN or YELLOW, Graylog will continue to write messages into OpenSearch regardless of the overall cluster status.

Explanation of different status levels:

Red

The RED status indicates that some or all of the primary shards are not available.

In this state, no searches can be performed until all primary shards have been restored.

Yellow

The YELLOW status means that all of the primary shards are available but some or all shard replicas are not.

When the index configuration includes replications with a count that is equal or higher than the number of nodes, your cluster cannot become green. In most cases, this can be solved by adding another OpenSearch node to the cluster or by reducing the replication factor of the indices.

Green

The cluster is fully operational. All primary and replica shards are available.