Logging Rules

With Graylog API Security, API calls are always captured in the context of a set of logging rules that govern what kind of data is collected. This section will help when defining logging rules specific to your APIs.

What are Logging Rules?

With API Security, logging is always done in the context of a set of rules. These describe when consent has been given to collect user data, and what kinds of data may be collected. All rules are applied within a logger before any usage data is sent to your API Security database.

Rules can perform many different actions:

Keeping a random percentage of messages to improve privacy and reduce data volume
Discarding entire messages based on matching one or more details
Removing details based on type, name, entire value, or portion of value
Masking credit card numbers and other sensitive fields regardless of where they appear
Copying user session fields into the outgoing message

Rules are expressed in code, like a regular part of your application, and so can easily be kept in sync and validated with your app as it changes. Rules are portable between logger implementations in different languages, so they can be shared across your organization.

Best of all, you don't have to be a programmer to create or manage rules for your applications. Rules are expressed with a simple syntax described below.

Basic Rule Syntax

A set of logging rules is a block of text where:

each rule appears on a separate line
rules are identified by name and take zero or more parameters, separated by spaces or tabs
comments begin with # and may appear at the start of a line or within a line
blank or empty lines are ignored
rules may appear in any order

The example below configures two rules and has some helpful comments. Here the sample rule takes parameter 10, while the skip_compression rule takes no parameters.

Copy

# example of custom rules

sample 10         # keep 10% at random
skip_compression  # reduce CPU time

Because comments and whitespace are ignored and order of rules is not significant, this next set of rules has exactly the same meaning as the previous example.

Copy

skip_compression
      sample     10

All the simplest rules, including allow_http_url, include, sample, and skip_compression, take zero or one string parameters, depending on how the rule is defined.

Regular Expressions

To create more interesting rules, we rely on regular expressions. These are very flexible and efficient for matching and transforming strings. Regular expressions are also portable between languages, which is ideal for sharing rules across loggers in different languages.

Regular expressions admittedly require some training for the uninitiated, but are far easier to learn than a full-blown programming language.

The following examples are regular expressions delimited with slashes.

Copy

/.*/       # match any value
/foo.*/    # starts with foo
/.*foo.*/  # contains foo
/.*foo/    # ends with foo

In our syntax, regular expressions can be written using one of several delimiters: / ~ ! % |

Copy

/foo.*/   # starts with foo
~foo.*~   # starts with foo
!foo.*!   # starts with foo
%foo.*%   # starts with foo
|foo.*|   # starts with foo

If a delimiter character appears in a regular expression, then it must be escaped with a preceding backslash. This is where having a choice of delimiters is helpful, as you can pick the one that requires the least amount of escaping. This is great for matching against structured content like JSON or XML or HTML that have different conventions for escaping special characters.

Copy

# match 'A/B', with an escaped delimiter (yuck!)
/A\/B/

Copy

# match 'A/B', with a different delimiter (better!)
|A/B|

Simple rules like copy_session_field take a single regular expression as a parameter, where keyed rules take multiple regular expressions as parameters.

Keyed Rules

These rules are the most powerful since they act directly on details of a logged message. A message is internally represented as a list of key/value pairs, which is the same structure used for our JSON format. The following is an example of the key/value pairs for a message.

Copy

Key string                                Value string
-------------------------------           --------------------------------------
request_method                            GET
request_url                               http://localhost:5000/?action=new
request_header:user-agent                 Mozilla/5.0...
request_param:action                      new
response_code                             200
response_header:content-type              text/html; charset=utf-8
response_header:content-length            8803
response_body                             { "result": 1 }
session_field:session_id                  8687e4ba9

Keyed rules are those where the first parameter is always a regular expression against a key string. This special regular expression always appears to the left of the name of the rule. These rules will only be evaluated against details where the left-hand regular expression matches the key string.

The following example deletes the response_body detail but keeps the rest.

Copy

/response_body/ remove

If the keyed rule takes additional parameters, these appear to the right of the name of the rule, like any regular parameter. The following example is a rule that takes a second regular expression as a parameter.

Copy

# remove response bodies containing foo
/response_body/ remove_if /.*foo.*/

Keyed rules are the largest category of rules, featuring: remove, remove_if, remove_if_found, remove_unless, remove_unless_found, replace, stop, stop_if, stop_if_found, stop_unless, stop_unless_found.

Supported Rules

allow_http_url

By default, loggers will refuse to send messages over HTTP, as this is not secure. Add this rule to allow logger URLs with HTTP to be configured, but be advised this should never be used in real production environments.

Copy

allow_http_url

copy_session_field

This copies data from the active user session into the outgoing message. Only session field names that match the specified regular expression will be copied. Session data is copied before any other rules are run, so that stop and replace rules can inspect session fields just like any detail from the request or response. When no user session is active, nothing will be done.

Copy

# copy any available fields
copy_session_field /.*/

# copy any fields starting with 'foo'
copy_session_field /foo.*/

remove

This removes any detail from the message where the specified regular expression matches its key. The value associated with the key is not checked. If all details are removed, the entire message will be discarded before doing any further processing.

Copy

# block cookie headers
/request_header:cookie/ remove
/response_header:set-cookie/ remove

remove_if

This removes any detail from the message where the first regular expression matches its key, and the second regex matches its entire value. If all details are removed, the message will be discarded.

Copy

# block response body if directed by comment
/response_body/ remove_if |<html>.*<!--SKIP_LOGGING-->.*|

remove_if_found

This removes any detail from the message where the first regular expression matches its key, and the second regex is found at least once in its value. This is faster than matching against the entire value. If all details are removed, the message will be discarded.

Copy

# block response body if directed by comment
/response_body/ remove_if_found |<!--SKIP_LOGGING-->|

remove_unless

This removes any detail from the message where the first regular expression matches its key, but the second regex does not match its entire value. If all details are removed, the message will be discarded.

Copy

# block response body without opt-in comment
/response_body/ remove_unless |<html>.*<!--DO_LOGGING-->.*|

remove_unless_found

This removes any detail from the message where the first regular expression matches its key, but the second regex is not found at least once in its value. This is faster than matching against the entire value. If all details are removed, the message will be discarded.

Copy

# block response body without opt-in comment
/response_body/ remove_unless_found |<!--DO_LOGGING-->|

replace

This masks sensitive user information that appears in message. When the first regular expression matches the key of a message detail, all instances of the second regex in its value will be found and replaced. The third parameter is the safe mask string, which can be just a static value or an expression that includes backreferences. (Please note backreferences are specified in a language-specific manner)

Copy

# chop out long sequence of numbers from all details
/.*/ replace /[0-9\.\-\/]{9,}/, /xyxy/

# chop url after first '?' (Node & Java)
/request_url/ replace /([^\?;]+).*/, |$1|

# chop url after first '?' (Python & Ruby)
/request_url/ replace /([^\?;]+).*/, |\\1|

sample

This discards messages at random while attempting to keep the specified percentage of messages over time. The percentage must be between 1 and 99. Sampling is applied only to messages that were not intentionally discarded by any form of stop rule.

Copy

sample 10

Hint: Unlike most rules, sample may appear only once in a set of rules.

skip_compression

This disables deflate compression of messages, which is ordinarily enabled by default. This reduces CPU overhead related to logging, at the expense of higher network utilization to transmit messages.

Copy

skip_compression

stop

This discards the entire message if the specified regular expression matches any available key. The value associated with the key is not checked.

Copy

# block messages if requested via header
/request_header:nolog/ stop

stop_if

This discards the message if the first regular expression matches an available key, and the second regex matches its entire value.

Copy

# block messages if directed by body comment
/response_body/ stop_if |<html>.*<!--STOP_LOGGING-->.*|

stop_if_found

This discards the message if the first regular expression matches an available key, and the second regex is found at least once in its value. This is faster than matching against the entire value string.

Copy

# block messages if directed by body comment
/response_body/ stop_if_found |<!--STOP_LOGGING-->|

stop_unless

This discards the message if the first regular expression matches an available key, but the second regex fails to match its entire value. If several of these rules are present, then all must be satisfied for logging to be done.

Copy

# block messages without url opt-in
/request_url/ stop_unless |.*/fooapp/.*log=yes.*|

stop_unless_found

This discards the message if the first regular expression matches an available key, but the second regex fails to be found at least once in its value. This is faster than matching against the entire value. If several of these rules are present, then all must be satisfied.

Copy

# block messages without url opt-in
/request_url/ stop_unless_found |log=yes|

Predefined Rule Sets

The easiest way to configure rules for a logger is by including a predefined set of rules. This is done with an include statement that gives the name of the set of rules to load. This example includes the current default rules as a starting point.

Copy

include default

Predefined rules cannot be modified, but they can be extended by adding more rules. The next example includes default rules and randomly keeps 10% of all logged messages.

Copy

include default
sample 10

As in the example above, you'll often start with a set of predefined rules and then add more rules specific to your applications. Next we'll dive into the predefined sets of rules, strict and debug, and when to use each.

Strict Rules

This predefined set of rules logs a minimum amount of detail, similar to a traditional weblog. Interesting details like body content and request parameters and most headers are dropped. You're unlikely to need additional rules to avoid logging sensitive user information, but the trade-off is that not many details are actually retained.

Strict rules are applied by default, either when no rules are specified or when include default is used for most configurations. Redefining the meaning of include default can be done through the logger API for advanced configurations — but unless you've done so, include default and include strict will have the same meaning.

Copy

include strict

OR

include default   # strict unless redefined

Actions taken by strict rules:

Keep URL but strip off any query params (everything after the first ?)
Remove request body, request parameters, and response body
Remove request headers except User-Agent
Remove response headers except Content-Length and Content-Type

Debug Rules

This predefined set of rules logs every available detail, including user session fields, without any filtering or sensitive data protections at all. Debug rules are helpful for application debugging and testing, but are not appropriate for real environments with real users.

Copy

include debug

Actions taken by debug rules:

Copy all fields from active session
Keep all request and response details intact

Rule Ordering and Processing

Rules can be declared in any order. There is no special priority given to rules declared earlier versus later, nor to rules loaded by an include statement versus declared inline. Rules are always run in a preset order that gives ideal logging performance.

Why is this so crucial? Because if rules were run in declared order, this would force users to remember many important optimizations. Any rule that relies on a partial match (like remove_if_found) should be done before similar rules matching an entire value (like remove_if). Any sampling should be done only after all stop rules have run. Any replace rules are the slowest and should be run last (and so on). It would be very difficult to create efficient sets of custom rules if ordering was not automatically optimized.

The following algorithm is applied every time a HTTP request/response is logged:

The logger constructs an outgoing message from original request and response objects.
The logger runs copy_session_field rules to copy data from the user session to the message.
The logger attempts to quit early based on stop rules in the following order: stop, stop_if_found, stop_if, stop_unless, stop_unless_found.
The logger may now randomly discard the entire message based on a sample rule.
The logger discards message details based on remove rules in the following order: remove, remove_unless_found, remove_if_found, remove_unless, remove_if.
The logger discards the entire message if all details have been removed at this point.
The logger runs any replace rules to mask any sensitive fields present.
The logger removes any details with empty values (i.e. completely masked out).
The logger finishes the message by adding now and agent and version details.
The logger converts the message into a JSON message (with proper encoding and escaping).
The logger deflates the JSON message unless a skip_compression rule is present.
The logger transmits the JSON message to the intended destination (a remote URL).

Most rules (with the exception of sample) can appear more than once within a set of rules. This is helpful for some complex expressions that would not be possible otherwise. When multiple rules with the same name are present, they all will be run by the logger, but their relative order is not strictly guaranteed.

Loading Rules from a File

Rules are passed as a single string argument when creating new logger instances. This works in most cases, especially when using a predefined set of rules, like include strict or include debug. However, it can be both cumbersome to fit a more complex rule set into a single string, as well as inconvenient to modify your codebase when you wish to edit an existing rule set. In order to address these issues, you can create a plain text file containing your rule set and save it in a location reachable by your application. Then, its path is appended to the file:// prefix and passed as the rules string argument to the logger, like so:

Copy

# example: the rule set can be found at ./app/rules.txt
logger = HttpLogger(rules="file://app/rules.txt")  # python

Limitations

Some details (host, interval, now) are not visible to rules. These are added after rules have run against the message.
Rules are not able to change existing key strings, or add new keys (except for copy_session_field rules).
Rules cannot express certain types of matches between different details. For example, response_body can't be removed based on matching a request_header value.