Logging Rules
With Graylog API Security, API calls are always captured in the context of a set of logging rules that govern what kind of data is collected. This section will help when defining logging rules specific to your APIs.
What are Logging Rules?
With API Security, logging is always done in the context of a set of rules. These describe when consent has been given to collect user data, and what kinds of data may be collected. All rules are applied within a logger before any usage data is sent to your API Security database.
Rules can perform many different actions:
- Keeping a random percentage of messages to improve privacy and reduce data volume
- Discarding entire messages based on matching one or more details
- Removing details based on type, name, entire value, or portion of value
- Masking credit card numbers and other sensitive fields regardless of where they appear
- Copying user session fields into the outgoing message
Rules are expressed in code, like a regular part of your application, and so can easily be kept in sync and validated with your app as it changes. Rules are portable between logger implementations in different languages, so they can be shared across your organization.
Best of all, you don't have to be a programmer to create or manage rules for your applications. Rules are expressed with a simple syntax described below.
Basic Rule Syntax
A set of logging rules is a block of text where:
- each rule appears on a separate line
- rules are identified by name and take zero or more parameters, separated by spaces or tabs
- comments begin with
#
and may appear at the start of a line or within a line - blank or empty lines are ignored
- rules may appear in any order
The example below configures two rules and has some helpful comments. Here the sample
rule takes parameter 10
, while the skip_compression
rule takes no parameters.
# example of custom rules
sample 10 # keep 10% at random
skip_compression # reduce CPU time
Because comments and whitespace are ignored and order of rules is not significant, this next set of rules has exactly the same meaning as the previous example.
skip_compression
sample 10
All the simplest rules, including allow_http_url, include, sample, and skip_compression, take zero or one string parameters, depending on how the rule is defined.
Regular Expressions
To create more interesting rules, we rely on regular expressions. These are very flexible and efficient for matching and transforming strings. Regular expressions are also portable between languages, which is ideal for sharing rules across loggers in different languages.
Regular expressions admittedly require some training for the uninitiated, but are far easier to learn than a full-blown programming language.
The following examples are regular expressions delimited with slashes.
/.*/ # match any value
/foo.*/ # starts with foo
/.*foo.*/ # contains foo
/.*foo/ # ends with foo
In our syntax, regular expressions can be written using one of several delimiters: / ~ ! % |
/foo.*/ # starts with foo
~foo.*~ # starts with foo
!foo.*! # starts with foo
%foo.*% # starts with foo
|foo.*| # starts with foo
If a delimiter character appears in a regular expression, then it must be escaped with a preceding backslash. This is where having a choice of delimiters is helpful, as you can pick the one that requires the least amount of escaping. This is great for matching against structured content like JSON or XML or HTML that have different conventions for escaping special characters.
# match 'A/B', with an escaped delimiter (yuck!)
/A\/B/
# match 'A/B', with a different delimiter (better!)
|A/B|
Simple rules like copy_session_field
take a single regular expression as a parameter, where keyed rules take multiple regular expressions as parameters.
Keyed Rules
These rules are the most powerful since they act directly on details of a logged message. A message is internally represented as a list of key/value pairs, which is the same structure used for our JSON format. The following is an example of the key/value pairs for a message.
Key string Value string
------------------------------- --------------------------------------
request_method GET
request_url http://localhost:5000/?action=new
request_header:user-agent Mozilla/5.0...
request_param:action new
response_code 200
response_header:content-type text/html; charset=utf-8
response_header:content-length 8803
response_body { "result": 1 }
session_field:session_id 8687e4ba9
Keyed rules are those where the first parameter is always a regular expression against a key string. This special regular expression always appears to the left of the name of the rule. These rules will only be evaluated against details where the left-hand regular expression matches the key string.
The following example deletes the response_body
detail but keeps the rest.
/response_body/ remove
If the keyed rule takes additional parameters, these appear to the right of the name of the rule, like any regular parameter. The following example is a rule that takes a second regular expression as a parameter.
# remove response bodies containing foo
/response_body/ remove_if /.*foo.*/
Keyed rules are the largest category of rules, featuring: remove, remove_if, remove_if_found, remove_unless, remove_unless_found, replace, stop, stop_if, stop_if_found, stop_unless, stop_unless_found.
Supported Rules
allow_http_url
By default, loggers will refuse to send messages over HTTP, as this is not secure. Add this rule to allow logger URLs with HTTP to be configured, but be advised this should never be used in real production environments.
allow_http_url
copy_session_field
This copies data from the active user session into the outgoing message. Only session field names that match the specified regular expression will be copied. Session data is copied before any other rules are run, so that stop and replace rules can inspect session fields just like any detail from the request or response. When no user session is active, nothing will be done.
# copy any available fields
copy_session_field /.*/
# copy any fields starting with 'foo'
copy_session_field /foo.*/
remove
This removes any detail from the message where the specified regular expression matches its key. The value associated with the key is not checked. If all details are removed, the entire message will be discarded before doing any further processing.
# block cookie headers
/request_header:cookie/ remove
/response_header:set-cookie/ remove
remove_if
This removes any detail from the message where the first regular expression matches its key, and the second regex matches its entire value. If all details are removed, the message will be discarded.
# block response body if directed by comment
/response_body/ remove_if |<html>.*<!--SKIP_LOGGING-->.*|
remove_if_found
This removes any detail from the message where the first regular expression matches its key, and the second regex is found at least once in its value. This is faster than matching against the entire value. If all details are removed, the message will be discarded.
# block response body if directed by comment
/response_body/ remove_if_found |<!--SKIP_LOGGING-->|
remove_unless
This removes any detail from the message where the first regular expression matches its key, but the second regex does not match its entire value. If all details are removed, the message will be discarded.
# block response body without opt-in comment
/response_body/ remove_unless |<html>.*<!--DO_LOGGING-->.*|
remove_unless_found
This removes any detail from the message where the first regular expression matches its key, but the second regex is not found at least once in its value. This is faster than matching against the entire value. If all details are removed, the message will be discarded.
# block response body without opt-in comment
/response_body/ remove_unless_found |<!--DO_LOGGING-->|
replace
This masks sensitive user information that appears in message. When the first regular expression matches the key of a message detail, all instances of the second regex in its value will be found and replaced. The third parameter is the safe mask string, which can be just a static value or an expression that includes backreferences. (Please note backreferences are specified in a language-specific manner)
# chop out long sequence of numbers from all details
/.*/ replace /[0-9\.\-\/]{9,}/, /xyxy/
# chop url after first '?' (Node & Java)
/request_url/ replace /([^\?;]+).*/, |$1|
# chop url after first '?' (Python & Ruby)
/request_url/ replace /([^\?;]+).*/, |\\1|
sample
This discards messages at random while attempting to keep the specified percentage of messages over time. The percentage must be between 1 and 99. Sampling is applied only to messages that were not intentionally discarded by any form of stop rule.
sample 10
Hint: Unlike most rules, sample
may appear only once in a set of rules.
skip_compression
This disables deflate compression of messages, which is ordinarily enabled by default. This reduces CPU overhead related to logging, at the expense of higher network utilization to transmit messages.
skip_compression
stop
This discards the entire message if the specified regular expression matches any available key. The value associated with the key is not checked.
# block messages if requested via header
/request_header:nolog/ stop
stop_if
This discards the message if the first regular expression matches an available key, and the second regex matches its entire value.
# block messages if directed by body comment
/response_body/ stop_if |<html>.*<!--STOP_LOGGING-->.*|
stop_if_found
This discards the message if the first regular expression matches an available key, and the second regex is found at least once in its value. This is faster than matching against the entire value string.
# block messages if directed by body comment
/response_body/ stop_if_found |<!--STOP_LOGGING-->|
stop_unless
This discards the message if the first regular expression matches an available key, but the second regex fails to match its entire value. If several of these rules are present, then all must be satisfied for logging to be done.
# block messages without url opt-in
/request_url/ stop_unless |.*/fooapp/.*log=yes.*|
stop_unless_found
This discards the message if the first regular expression matches an available key, but the second regex fails to be found at least once in its value. This is faster than matching against the entire value. If several of these rules are present, then all must be satisfied.
# block messages without url opt-in
/request_url/ stop_unless_found |log=yes|
Predefined Rule Sets
The easiest way to configure rules for a logger is by including a predefined set of rules. This is done with an include
statement that gives the name of the set of rules to load. This example includes the current default rules as a starting point.
include default
Predefined rules cannot be modified, but they can be extended by adding more rules. The next example includes default rules and randomly keeps 10% of all logged messages.
include default
sample 10
As in the example above, you'll often start with a set of predefined rules and then add more rules specific to your applications. Next we'll dive into the predefined sets of rules, strict
and debug
, and when to use each.
Strict Rules
This predefined set of rules logs a minimum amount of detail, similar to a traditional weblog. Interesting details like body content and request parameters and most headers are dropped. You're unlikely to need additional rules to avoid logging sensitive user information, but the trade-off is that not many details are actually retained.
Strict rules are applied by default, either when no rules are specified or when include default
is used for most configurations. Redefining the meaning of include default
can be done through the logger API for advanced configurations — but unless you've done so, include default
and include strict
will have the same meaning.
include strict
OR
include default # strict unless redefined
Actions taken by strict rules:
- Keep URL but strip off any query params (everything after the first
?
) - Remove request body, request parameters, and response body
- Remove request headers except User-Agent
- Remove response headers except Content-Length and Content-Type
Debug Rules
This predefined set of rules logs every available detail, including user session fields, without any filtering or sensitive data protections at all. Debug rules are helpful for application debugging and testing, but are not appropriate for real environments with real users.
include debug
Actions taken by debug rules:
- Copy all fields from active session
- Keep all request and response details intact
Rule Ordering and Processing
Rules can be declared in any order. There is no special priority given to rules declared earlier versus later, nor to rules loaded by an include statement versus declared inline. Rules are always run in a preset order that gives ideal logging performance.
Why is this so crucial? Because if rules were run in declared order, this would force users to remember many important optimizations. Any rule that relies on a partial match (like remove_if_found) should be done before similar rules matching an entire value (like remove_if). Any sampling should be done only after all stop rules have run. Any replace rules are the slowest and should be run last (and so on). It would be very difficult to create efficient sets of custom rules if ordering was not automatically optimized.
The following algorithm is applied every time a HTTP request/response is logged:
- The logger constructs an outgoing message from original request and response objects.
- The logger runs copy_session_field rules to copy data from the user session to the message.
- The logger attempts to quit early based on stop rules in the following order: stop, stop_if_found, stop_if, stop_unless, stop_unless_found.
- The logger may now randomly discard the entire message based on a sample rule.
- The logger discards message details based on remove rules in the following order: remove, remove_unless_found, remove_if_found, remove_unless, remove_if.
- The logger discards the entire message if all details have been removed at this point.
- The logger runs any replace rules to mask any sensitive fields present.
- The logger removes any details with empty values (i.e. completely masked out).
- The logger finishes the message by adding
now
andagent
andversion
details. - The logger converts the message into a JSON message (with proper encoding and escaping).
- The logger deflates the JSON message unless a skip_compression rule is present.
- The logger transmits the JSON message to the intended destination (a remote URL).
Most rules (with the exception of sample) can appear more than once within a set of rules. This is helpful for some complex expressions that would not be possible otherwise. When multiple rules with the same name are present, they all will be run by the logger, but their relative order is not strictly guaranteed.
Loading Rules from a File
Rules are passed as a single string argument when creating new logger instances. This works in most cases, especially when using a predefined set of rules, like include strict
or include debug
. However, it can be both cumbersome to fit a more complex rule set into a single string, as well as inconvenient to modify your codebase when you wish to edit an existing rule set. In order to address these issues, you can create a plain text file containing your rule set and save it in a location reachable by your application. Then, its path is appended to the file://
prefix and passed as the rules string argument to the logger, like so:
# example: the rule set can be found at ./app/rules.txt
logger = HttpLogger(rules="file://app/rules.txt") # python
Limitations
- Some details (host, interval, now) are not visible to rules. These are added after rules have run against the message.
- Rules are not able to change existing key strings, or add new keys (except for
copy_session_field
rules). - Rules cannot express certain types of matches between different details. For example,
response_body
can't be removed based on matching arequest_header
value.