Stream Processing

Every message that comes in is matched against the rules of a stream. For messages satisfying all or at least one of the stream rules (as configured in the stream), the internal ID of that stream is stored in the streams array of the processed message.

All analysis methods and searches that are bound to streams can now easily narrow their operation by searching with a streams:[STREAM_ID] limit. This is done automatically by Graylog and does not have to be provided by you

Stream Processing Runtime Limits

An important step during the processing of a message is the stream classification. Every message is matched against the user-configured stream rules. The message is added to the stream if all or any rules of a stream matches, depending on your selections. Applying stream rules is done during the indexing of a message only, so the amount of time spent for the classification of a message is crucial for the overall performance and message throughput the system can handle.

There are certain scenarios when a stream rule takes very long to match. When this happens for a number of messages, message processing can stall, messages waiting for processing accumulate in memory, and the whole system could become non-responsive. Messages are lost, and manual intervention would be necessary.

To prevent this, the runtime of stream rule matching is limited. When it is taking longer than the configured runtime limit, the process of matching this exact message against the rules of this specific stream is aborted. Message processing in general and for this specific message continues though. As the runtime limit needs to be configured relatively high (usually a magnitude higher as a regular stream rule match takes), any excess of it is considered a fault and is recorded for this stream. If the number of recorded faults for a single stream is higher than a configured threshold, the stream rule set of this stream is considered faulty, and the stream is disabled. This is done to protect the overall stability and performance of message processing. Generally, this is a tradeoff, and based on the assumption that the total loss of one or more messages is worse than a loss of stream classification for these.

There are scenarios where this might not be applicable or even detrimental. If there is a high fluctuation of the message load including situations where the message load is much higher than the system can handle, overall stream matching can take longer than the configured timeout. If this happens repeatedly, all streams get disabled. This is a clear indicator that your system is overutilized and not able to handle the peak message load.

How to Configure the Timeout Values

If the default values do not match, consider modifying two configuration variables in the configuration file of the server, which influence the behavior of this functionality:

  • stream_processing_timeout: Defines the maximum amount of time the rules of a stream are able to spend. When this is exceeded, stream rule matching for this stream is aborted and a fault is recorded. This setting is defined in milliseconds, the default is 2000 (2 seconds).

  • stream_processing_max_faults: The maximum number of times a single stream can exceed this runtime limit. When it happens more often, the stream is disabled until it is manually reenabled. The default for this setting is 3.

Possible Causes of Excessive Runtimes

If a single stream has been disabled and all others are doing well, the chances are high that one or more stream rules are performing badly under certain circumstances. In most cases, this is related to stream rules that are utilizing regular expressions. For most other stream rule types, the general runtime is constant, while it varies very much for regular expressions, influenced by the regular expression itself and the input matched against it. In some special cases, the difference between a match and a non-match of a regular expression can be in the order of 100 or even 1000. This is caused by a phenomenon called catastrophic backtracking.

Troubleshooting Runtimes

If you continue to encounter these issues:

  1. Check the rules of the stream that is disabled for rules that could take very long (especially regular expressions).

  2. Modify or delete those stream rules.

  3. Re-enable the stream.