Cluster Support Bundle
The Cluster Support Bundle allowing you to bundle any useful information from your Graylog cluster for debugging and troubleshooting purposes. Graylog Enterprise users may utilize the Cluster Support Bundle to send information about their Graylog cluster to the Support team. You can attach your zip file directly to support tickets in the customer support portal.
Generate a Cluster Support Bundle
To create a Cluster Support Bundle:
-
Navigate to System > Logging.
-
Click the Create Support Bundle button to generate a .zip file containing system data.
-
Download the created bundle file from the Cluster Support Bundle list.
The support bundle can be shared with Support engineers for analysis.
Bundle Contents
A Cluster Support Bundle contains multiple data categories that provide visibility into different layers of the system. Understanding these categories helps you quickly identify the appropriate files for troubleshooting.
The Cluster Support Bundle includes the following primary diagnostic files:
system-stats.json
System-level performance and resource metrics, including:
File System Information
-
Disk usage statistics for
/graylog/data,journal,plugin, andbindirectories. -
Mount points and available or used space tracking.
-
Node utilization metrics.
JVM Configuration
-
Java version and vendor information.
-
Heap memory allocation settings (e.g.
-Xms1g -Xmx1g). -
Garbage collector configuration (
G1GC). -
System properties and runtime parameters.
Operating System Metrics
-
CPU information (processor type, core count, usage percentages).
-
Memory statistics (total RAM, used/free memory, swap configuration).
-
System load averages.
-
Uptime statistics.
Process Information
-
Process ID (PID).
-
Open file descriptor counts.
-
CPU and memory consumption per process.
cluster.json
Cluster-wide configuration and health status, including:
JVM Statistics (Per Node)
-
Memory usage (heap utilization, max heap size).
-
Node identification (Node ID, hostname).
-
Java runtime version.
Cluster Metrics
-
Cluster health status (Green/Yellow/Red).
-
Index counts and document statistics.
-
Shard distribution (primary and replica counts).
-
Storage utilization.
-
Installed plugins and versions.
-
Node count and configuration.
MongoDB Database Statistics
-
MongoDB version.
-
Database and collection counts.
-
Document counts and storage size.
-
Index statistics and storage overhead.
Graylog Configuration
-
Stream and stream rule counts.
-
User accounts and permissions.
-
Dashboard and input configurations.
-
Output configurations (
TCP,UDP,GELF). -
Installed Graylog plugins.
-
Cluster leadership and processing status.
Process Buffer State
-
Current buffer processor status and utilization.
certificates.json
SSL/TLS certificate trust relationships, including:
-
Certificate Authority Inventory.
-
Complete truststore listing (typically 100+ root certificates).
-
Certificate subjects and issuers.
-
Serial numbers and validity periods.
-
Subject Alternative Names (SANs).
metrics.json
Runtime metrics snapshot from the Graylog node. It is used to troubleshoot performance, throughput, and internal subsystem health. Review this file for data on:
-
Message throughput (ingest rate, processing rate, output rate).
-
Journal metrics (size, utilization, uncommitted messages).
-
Processing pipeline metrics (buffer usage, failures, execution timing).
-
Thread pool and executor metrics (queue depth, rejected tasks).
-
HTTP and API metrics (request rates, response timing).
thread-dump.txt
Real-time thread execution state, including:
Thread Analysis
-
Complete stack traces for all active threads.
-
Thread states (RUNNABLE, WAITING, TIMED_WAITING, BLOCKED).
-
Lock and synchronization information.
Key Thread Pools
-
HTTP worker threads (request handling).
-
System job executors (scheduled tasks).
-
Proxied request pools (API forwarding).
-
Netty transport threads (network I/O).
-
Output/input buffer processors.
-
Database connection pools.
Diagnostic Value
-
Performance bottleneck identification.
-
Deadlock detection.
-
Resource contention analysis.
-
Thread pool saturation monitoring.
server.log
Graylog server application logs used to diagnose Graylog node/server behavior, such as:
-
Graylog version, operating system, Java runtime, deployment type, and node ID.
-
Startup and migration timeline, including preflight checks, standard migrations, and periodical initialization.
-
Dependency connectivity, including MongoDB and Data Node versions and connection attempts.
-
Runtime warnings and errors that may explain reported symptoms.
Troubleshooting Workflow
Knowing where to start can save time and focus your troubleshooting efforts. Use the following approach:
-
Extract the bundle and verify all core files are present.
-
Review cluster.json for overall cluster health status.
-
Check system-stats.json for immediate resource constraints.
-
Scan thread-dump.txt for obvious deadlocks or blocked threads.
-
Review metrics.json for throughput, journal utilization, buffer usage, and subsystem performance indicators to determine whether the node was overloaded.
-
Review certificates.json to validate certificate configuration and status, including expiration dates, issuer details, and trust chain issues that may cause TLS handshake failures between Graylog components.
-
Examine server.log within the reported timeframe for WARN and ERROR entries, stack traces, startup issues, or dependency connectivity failures that explain the observed behavior.
Troubleshooting and Common Issues
The following section outlines troubleshooting steps for common issues to assist you in resolving potential challenges you may encounter.
Resource Exhaustion
This could be caused by disk space Issues. To troubleshoot:
-
Review system-stats.json file system metrics.
-
Check disk usage percentages (>85% warrants attention).
-
Verify inode utilization.
-
Examine journal directory growth patterns.
Memory Pressure
-
Analyze JVM heap usage in cluster.json and system-stats.json.
-
Compare used vs. maximum heap allocation.
-
Review system memory utilization (>90% indicates pressure).
-
Check for swap usage (should be minimal or zero).
CPU Saturation
-
Examine CPU usage percentages in system-stats.json.
-
Review thread states in thread-dump.txt for excessive RUNNABLE threads.
-
Identify hot threads consuming CPU cycles.
Data Node Issues
-
Check cluster status in cluster.json (Yellow/Red indicates problems).
-
Review shard allocation and unassigned shards.
-
Analyze index counts and document distribution.
-
Verify storage capacity and growth trends.
MongoDB Performance
-
Review collection and document counts in cluster.json.
-
Check storage size vs. data size ratios.
-
Analyze index count and effectiveness.
-
Verify MongoDB version compatibility.
-
Monitor application performance.
Buffer Saturation
-
Check process buffer dump in cluster.json.
-
Review input/output buffer processor states.
-
Identify message processing bottlenecks.
Network Issues
-
Analyze Netty transport thread states in thread-dump.txt.
-
Review socket-related threads for connection problems.
-
Check certificate validity in certificates.json for SSL/TLS issues.
Certificate Validation Failures
-
Review certificates.json for missing or expired certificates.
-
Verify certificate chain completeness.
-
Check validity periods against current date.
Cluster Configuration
-
Review node count and leadership status.
-
Verify stream and input configurations.
-
Check output configurations for connectivity.
Submitting Bundles to Graylog Support
Graylog Enterprise customers can attach support bundles directly to tickets in the customer support portal.
Before Submission
-
Review bundle contents for sensitive information.
-
Redact or sanitize passwords, API keys, and proprietary data.
-
Document specific issues in the ticket.
-
Note Graylog version and deployment environment details.
Further Reading
Explore the following additional resources and recommended readings to expand your knowledge on related topics:
