Logging with ElasticSearch, LogStash, Kibana, and Filebeat

1 June 2018


In this post, I will go over setting up an ELK stack (Elasticsearch, Logstash, and Kibana) with the setup we've been working on throughout these posts. Right now if we deploy an update with the scripts from the last post, the Nginx logs will be lost since we created new Docker machines. With this new setup, we will let our ELK stack persist the logs in another Docker machine. We will install Filebeat in our Nginx Docker image and let it ship the logs to our ELK stack. Logstash will receive, process and send our logs to Elasticsearch and we will be able to visualize them with Kibana. Our new Docker machine will run in a t2.small EC2 instance, we can create it with 

docker-machine create -d amazonec2 --amazonec2-instance-type t2.small elk-machine-name

 

We'll start with the ELK side of things and then we'll move on the Nginx configuration.

I found an existing ELK image which made things a lot easier, sebp/elk:622 and sebp/elkx:622. I recommend you read their documentation,  http://elk-docker.readthedocs.io/, and https://hub.docker.com/r/sebp/elkx/

We will use the elkx:622 image, which is based on the elk one. It comes with X-Pack which provides some nice features, one of them is Kibana authentication. You can read more about X-Pack here: https://www.elastic.co/products/x-pack


/compose/production/elk/Dockerfile

FROM sebp/elkx:622

COPY ./compose/production/elk/logstash-beats.crt /etc/pki/tls/certs/logstash-beats.crt
COPY ./compose/production/elk/logstash-beats.key /etc/pki/tls/private/logstash-beats.key
COPY ./compose/production/elk/11-nginx.conf /etc/logstash/conf.d/11-nginx.conf
COPY ./compose/production/elk/nginx.pattern /opt/logstash/patterns/nginx

Our main file. We're going to replace the certificates that come with the image with our own, and we'll also replace the Logstash configuration for Nginx.


/compose/production/elk/logstash-beats.crt and /compose/production/elk/logstash-beats.key

The elkx image comes with some default certificates but we're going to generate our own, this procedure differs from operating system to operating system. In OS X, I was able to do it with

openssl req -x509 -newkey rsa:4096 -keyout logstash-beats.key -out logstash-beats.crt -nodes

For the domain you can use something like internal-elk.yourdomain.com, this subdomain will point to the private EC2 IP of the ELK instance.


/compose/production/elk/11-nginx.conf

filter {
  if "nginx-access-log" in [tags] {
    grok {
      match => { "message" => "%{NGINXACCESS}" }
    }
    geoip {
      source => "remote_addr"
    }
  }
}

We're replacing the original file because it would try to match [type] == "nginx-access". The problem with this is that document_type is deprecated and the latest Filebeat versions (6.2.2 at least) force all document_type values to "doc", so we would never match a log entry. Setting a tag on the Filebeat configuration (we will see how to do this later on) and looking for it on Logstash is a possible solution to this. We also want to use a nice feature, map IP addresses to geographical information and we do this with the geoip bit of the configuration.


/compose/production/elk/nginx.pattern

NGINXACCESS %{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{HTTPDATE:time_local}\] \"%{DATA:request}\" %{INT:status} %{NUMBER:bytes_sent} \"%{DATA:http_referer}\" \"%{DATA:http_user_agent}\"

This is the pattern we will match in Logstash Grok. It will try and match against each log line and split them into different fields. We replaced it because the one in the ELK image would not match the Nginx access_log format. To debug and match different formats, you can try out https://grokdebug.herokuapp.com/


elk.yml

version: '2'

volumes:
  elk_data_local: {}

services:
  elk:
    build:
      context: .
      dockerfile: ./compose/production/elk/Dockerfile
    env_file: .env
    ports:
      - "5601:5601"
      - "9200:9200"
      - "5044:5044"
    volumes:
      - elk_data_local:/var/lib/elasticsearch

We'll expose all ports on Docker so we can use this on our internal network, but only 5601 (Kibana) will be exposed to the outer world. We will accomplish this with AWS, adding the TCP port 5601 to our inbound rules of the instance's security group.

We're also going to use a volume for Elasticsearch data.


.env

...
INTERNAL_ELK_HOST=internal-elk.yourdomain.com
ELASTIC_BOOTSTRAP_PASSWORD=changeme
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
LOGSTASH_USER=elastic
LOGSTASH_PASSWORD=changeme
KIBANA_USER=kibana
KIBANA_PASSWORD=changeme
...

The bootstrap password will let us set up the other passwords and wrap up the initial ELK stack configuration. The documentation at https://hub.docker.com/r/sebp/elkx/ is very detailed regarding this procedure. INTERNAL_ELK_HOST will be the domain name pointing to the private IP address of the EC2 instance. We can also set up a domain pointing to the public IP address of the EC2 instance to access Kibana from anywhere. This is all for our elk.yml Docker machine, make sure it is running now and we'll move on to the Filebeat and Nginx side of things, all of which is based on https://github.com/spujadas/elk-docker/tree/622/nginx-filebeat.

 

/compose/production/nginx/Dockerfile

FROM nginx

ENV FILEBEAT_VERSION 6.2.2

RUN apt-get update -qq \
 && apt-get install -qqy curl \
 && apt-get clean

RUN curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-${FILEBEAT_VERSION}-amd64.deb \
 && dpkg -i filebeat-${FILEBEAT_VERSION}-amd64.deb \
 && rm filebeat-${FILEBEAT_VERSION}-amd64.deb

RUN rm /var/log/nginx/access.log /var/log/nginx/error.log

### configure Filebeat
# config file
COPY ./compose/production/nginx/filebeat.yml /etc/filebeat/filebeat.yml
RUN chmod 644 /etc/filebeat/filebeat.yml

# CA cert
RUN mkdir -p /etc/pki/tls/certs

COPY ./compose/production/elk/logstash-beats.crt /etc/pki/tls/certs/logstash-beats.crt

COPY ./compose/production/nginx/sites-enabled/portfolio.conf /etc/nginx/conf.d/default.conf

COPY ./compose/production/nginx/start.sh /usr/local/bin/start.sh
RUN chmod +x /usr/local/bin/start.sh

CMD [ "/usr/local/bin/start.sh" ]

Our first big change here is we switched to the official Nginx image, we were using one that wasn't being maintained. We install Filebeat and remove the Nginx log symlinks, copy over configuration files and the certificate we created. 

 

/compose/production/nginx/filebeat.yml

output:
  logstash:
    enabled: true
    hosts:
      - ${INTERNAL_ELK_HOST:elk}:5044
    timeout: 15
    ssl:
      certificate_authorities:
      - /etc/pki/tls/certs/logstash-beats.crt

filebeat:
  prospectors:
    - type: log
      paths:
        - /var/log/nginx/*.log
      document_type: nginx-access
      tags: ["nginx-access-log", "web"]

This is where we tag our Nginx logs so Logstash can parse them. We also configure where we're sending the logs, and the certificate configuration.


/compose/production/nginx/sites-enabled/portfolio.conf

...
access_log /var/log/nginx/access.log;
...

We've updated the configuration so that Nginx outputs the logs to a file instead of /dev/stdout.


/compose/production/nginx/start.sh

#!/bin/bash

echo "Starting Filebeat"
/etc/init.d/filebeat start
echo "Starting Nginx"
nginx
tail -f /var/log/nginx/access.log -f /var/log/nginx/error.log

The original start.sh had the Filebeat template loading, we will remove that from the file and do it manually based on https://www.elastic.co/guide/en/beats/filebeat/6.2/filebeat-template.html. We'll load it via our production.yml Docker machine, not elk.yml.

 

#Grab a shell from the nginx image
docker-compose -f production.yml exec nginx bash
#Export the filebeat template and update Elasticsearch
filebeat export template > /tmp/filebeat.template.json
curl -XPUT -H 'Content-Type: application/json' -u elastic:changeme http://internal-elk.yourdomain.com:9200/_template/filebeat-6.2.2 -d@/tmp/filebeat.template.json


By default the kibana username for Kibana won't have access to some indices, I recommend logging in with the elastic username or use the elastic username to create a new one with whatever access you need. First thing after logging in is to check out the Monitoring section. You'll probably see some indices health marked as yellow because they don't have enough nodes for the number of allocated shards. Since we are using only one node we are going to reduce the number of replicas to 0. 

 

#Still in our Nginx BASH
curl -X PUT -H 'Content-Type: application/json' -u elastic:changeme internal-elk.yourdomain.com:9200/<INDEX_NAME>/_settings -d '{"number_of_replicas": 0}'

We'll also want to set the number of replicas to 0 when a new Filebeat index gets created so we don't run into this issue again in the future. We will update the template we recently set.

 

#Still in our Nginx BASH
curl -X PUT -H 'Content-Type: application/json' -u elastic:changeme http://internal-elk.yourdomain.com:9200/_template/filebeat-6.2.2 -d '{"settings" : {"number_of_shards": 1, "number_of_replicas": 0}, "index_patterns": ["*"]}'

The default number of replicas can also be set when setting the initial Filebeat template and by setting the index.number_of_replicas config setting on elasticsearch.yml.

There are more things to do and improve on this image, but it is a good initial first step. This is all for now!