The process of a full-cluster restart (i.e. in-place upgrade) from Elasticsearch to OpenSearch impacts data receipt and data writing and reading and increases your Graylog journal usage. These effects stem from the fact that Elasticsearch or OpenSearch will not be online during the upgrade process. Ensure that Graylog has enough storage to buffer data it receives while upgrading to OpenSearch. Also ensure that Graylog and Elasticsearch/OpenSearch have enough compute capacity to catch up on message processing after the upgrade.
During the upgrade process, Graylog continues to receive data from configured inputs; however, Graylog will not be able to write new data to Elasticsearch (or OpenSearch) or read existing data in Elasticsearch to service search requests. Use of the journal in Graylog also increases because the data is buffered in the journal until it can be written to OpenSearch. Hence, a set journal configuration must be in place before upgrading.
Depending on how the Graylog journal has been configured and if you kept default values, configuration should be sufficient to buffer the data until OpenSearch is online and available to service requests for indexing new data. This includes, but may not be limited to, the size of the journal and how long data can rest in the journal before it is purged/deleted. Options that the journal supports can be found in our user documentation. To find whether your Graylog journal(s) are appropriately configured in terms of size and age of the buffered data, estimate how much data Graylog will receive and how long it will take to install and configure OpenSearch.
The Graylog metric org.graylog2.traffic.input is a counter of bytes received on all inputs from the node where the value is captured since startup. You can use it to calculate an average of bytes that a Graylog node receives over an hour, day, month, etc. You can use open-source software, such as Prometheus or Grafana, to monitor this metric. Otherwise, it can be viewed in the Graylog interface.
These averages help to confirm your journal(s) are appropriately configured for the amount of data you have estimated to receive throughout the upgrade process and for the time it may take before the data is written to OpenSearch. The default journal size of 5GB and a data expiration of 12 hours may not be large enough values to accommodate your upgrade, so you may want to increase these values in the parameters of your Graylog node(s) server.conf files. You may also need to make changes to the underlying operating system and/or storage hardware:
Testing the impacts of increased journal use in your Graylog environment on Elasticsearch can help you unearth helpful information and enable you to:
- Configure current values for your journal(s) for an increase in incoming data when you execute the upgrade process.
- Determine how long it takes journal(s) to return to normal usage after being allowed to grow to the anticipated size during the upgrade process.
- Determine additional system resources OpenSearch and/or Graylog may require to catch up on message processing alongside incoming new data.
The System > Nodes menu within Graylog has a More Actions selection that contains the action Pause Message Processing. This action stimulates Elasticsearch to be unavailable to Graylog, meaning that it causes Graylog to stop sending data to Elasticsearch and buffer incoming data in the journal. Once the use of the journal reaches the estimated maximum size as a result of the upgrade process, the Resume Message Processing action can be selected.
Monitor the process from this point until the journal's utilization returns to normal. This should allow you to discover any insufficient capacities.