Centralized logging had been on our backlog for quite some time at Chartbeat. After taking care of some yaks to be shaved, we got to implementing Logstash about a year ago. We by no means have the largest cluster, averaging around 25k events/sec and peaking at 50k during backfills. We’ll share some tips from what we learned in deploying and scaling the ELK stack.
Architecture
We’re on the following versions of software as part of our ELK stack
- Elasticsearch 1.5.2
- Logstash 1.4.2
- Kibana 3
Diagram of our deployment
Logstash Forwarder (LSF)
There are many different ways you can pump data into Logstash given it’s number of input plugins. Ultimately we decided to go with using logstash-forwarder rather then using the Logstash agent itself on each server for reasons highlighted below
- Lightweight in resources both CPU and memory
- No need to run JVM everywhere, written in Go
- Will pause forwarding if errors are detected upstream and resume
- Data is encrypted
- No queueing system required
Unfortunately we ran into a lot of issues during the deployment where LSF would lose it’s connection to the indexers or lose track of it’s position in a file. There hasn’t been a lot of major work done to the LSF project in sometime. Etsy forked LSF (https://github.com/etsy/logstash-forwarder) and has fixed a lot of the outstanding issues and added things like multi-line support. While we’d love to stay on the mainline version, we’ll most likely be moving to Etsy’s version in the near future.
Logstash
You should specify the --filterworkers
argument when starting Logstash and give it more than the default of 1 filter worker. Logstash uses a worker for input and output, you want to make sure you set the number of filter workers with that in mind, so you don’t oversubscribe your CPUs. Filter workers value should be 2 less than the total number of CPUs on the machine.
We gained a large performance boost by converting our logging (where we could) to JSON to avoid having to write complex Grok patterns. For our python code we used a wrapper that utilized python-logstash to output into logstash JSON format. For Nginx logging, since it unfortunately doesn’t natively support JSON encoding of it’s logs, we did it via a hackish way and specified a JSON format in the access_log format string
log_format logstash_json '{ "@timestamp": "$time_iso8601", ' '"@version": "1", ' '"remote_addr": "$remote_addr", ' '"body_bytes_sent": "$body_bytes_sent", ' '"request_time": "$request_time", ' '"upstream_addr": "$upstream_addr", ' '"upstream_status": "$upstream_status", ' '"upstream_response_time": "$upstream_response_time", ' '"status": "$status", ' '"uri": "$uri", ' '"query_strings": "$query_string", ' '"request_method": "$request_method", ' '"http_referrer": "$http_referer", ' '"http_host": "$http_host", ' '"scheme": "$scheme", ' '"http_user_agent": "$http_user_agent" }';
Just beware, using this method won’t guarantee your data is properly JSON encoded and will lead to occasional dropped log lines when data isn’t encoded properly.
In order to monitor that logstash was sending data on all our servers, we setup a passive check on Nagios for all our hosts. We then configured a basic cronjob on all our systems that just does the following
*/30 * * * * ubuntu echo PING | logger -t logstash_ping
In logstash we setup a simple conditional to look for this data and send an OK to the passive check
if [program] == "logstash_ping" { nagios_nsca { host => "nagios.example.com" nagios_status => "0" send_nsca_config => "/etc/send_nsca.cfg" } }
If we haven’t received any data in an hour for this passive check, it will go critical and let us know of an issue.
Writing any grok filters? Grok debug is your best friend. We also found it useful to take complex filters and define a custom grok pattern to simplify the configuration.
Using tags can help you with your Kibana dashboards and searches later on. We have a few places where we have conditionals in place and then use the mutate filter to add a tag to make it easy to search for those events in Kibana. Example
grok { match => [ "message", "Accepted (publickey|password) for %{USER:user} from %{IP:ssh_client_ip}" ] add_tag => [ "auth_login" ] }
Using the KV filter on data that you don’t control? Consider using a whitelist of keys that it should accept. We recently had an issue where we were running the KV filter on Nginx error logs and the delimiter we were using ended up in some unexpected places resulting in a explosion of unique keys (100s of thousands) which caused the index’s mapping to grow to a very large size and caused the master nodes to OOM and crash, bringing the cluster down.
Elasticsearch
Purchase Marvel, seriously it’s worth it and helps so much on giving insight into what’s going on within the cluster. It’s free for development use.
Don’t give the JVM more than 32GB of RAM due to compressed pointers, see “Why 35GB Heap is Less Than 32GB – Java JVM Memory Oddities” for a nice explanation on why this is an issue.
The G1GC is currently not supported on Elasticsearch, so be aware of unexpected behavior like crashing and index corruption if you attempt to use it.
Ensure minimum_master_nodes
is set to N/2+1 of your servers to avoid potential split brain scenarios.
Using AWS? You definitely want to be using the AWS plugin . This allows Elasticsearch to use the EC2 APIs for host discovery, instead of relying on hardcoding a list of the servers. You can filter based on tags and security groups. The plugin also gives you the ability to setup S3 as a repository to snapshot index backups to.
We utilized client nodes in a lot of places to avoid having to maintain a list of the Elasticsearch servers. You’ll notice in our diagram we have a client node on each indexer and the Kibana server as well. That way the Logstash indexers can just point to localhost for their Elasticsearch output and Kibana can point to the server it’s running on and they don’t need a service restart or anything if we add or remove any nodes.
Separate out your master nodes from your data nodes , you don’t want to have a large query get run that causes your nodes to start garbage collecting like crazy and also have it bring the whole cluster down with it. These nodes can be small since their only job is to route queries and store index mappings. For more info on different node types see the docs on node types
Use Curator to perform maintenance on your indices such as optimizing and closing bloom filters on indices that are no longer being actively indexed to. You can also use it to maintain retention policies on indices and delete indices off that are older than a certain number of days. We’re currently using version 2.0.1.
Elasticsearch can be made to be rack or DC aware using node attributes and setting the cluster.routing.allocation.awareness.attributes
property to use the attribute. This will ensure that a shards primary and it’s replica will not be located in the same attribute value. We set the following
node.rack_id: $ec2_availability_zone cluster.routing.allocation.awareness.attributes: rack_id
Node attributes can be custom and set to whatever you want. You’ll notice we actually have 2 groups of servers in our diagram, our “fast” and “slow” nodes. These nodes are all apart of one cluster but store different indices depending on the age of them. As we started adding more data to the cluster, we decided to go with the approach of having new data get indexed onto beefier machines with SSDs and a lot of CPU and after 48 hours, have that data be migrated off to servers with a larger amount of spinning disk storage and a ton of RAM for querying. We can scale these 2 node types independently and add more “slow” nodes if we want longer retention periods. Elasticsearch makes it easy to do this through the use of routing allocation tags. Using Curator we setup a nightly cronjob for each index that does the following
curator --master-only allocation --prefix logstash-nginx- --older-than 2 --rule node_type=slow
This will tag the index to only live on nodes with the node attribute node_type set to slow. ES will then begin to migrate these shards to those nodes.
You cannot just restart an Elasticsearch node without causing a re-shuffling of shards around. This can be very IO intensive for larger amounts of data. Elasticsearch will think the node is down, it has no concept of a grace period and will immediately start re-routing shards even if the node comes right back up. In order to restart a node in a faster, less painful fashion, you can toggle routing allocation off and on during your restart as follows
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable": "none"}}'
restart node, wait for it to come up
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable": "all"}}'
Take a look at indices.recovery.concurrent_streams and indices.recovery.max_bytes_per_sec to speed up shard recovery time. The default values are pretty low in order to avoid interfering with searches or indexes but if you have fast machines, there’s no reason not to raise them from their defaults.
Looking ahead
In the next couple of months we’re going to look to upgrade to Logstash 1.5 release which fixes a few bugs we’re currently experiencing. The puppet module has been pretty great for managing Logstash but currently does not support the new plugin system in the 1.5 release. Moving to Kibana 4 will also come soon, but unfortunately we spent a lot of time building dashboards in Kibana 3 that will need to be completely redone in the new version and spend some time re-training everyone in the new UI.