Public Repository

Last pushed: a year ago
Short Description
Fluentd with Elasticsearch and Kafka plugins
Full Description

FROM ubuntu:latest

Taken from Jimmi Dyson's Dockerfile

MAINTAINER Bill Jorgensen jorgensen.bill@gmail.com

Ensure there are enough file descriptors for running Fluentd

RUN ulimit -n 65536

Disable prompts from apt.

ENV DEBIAN_FRONTEND noninteractive

Install prerequisites.

RUN apt-get update && \
apt-get install -y -q curl make g++ git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/ /tmp/ /var/tmp/*

Install Fluentd

RUN /usr/bin/curl -L https://td-toolbelt.herokuapp.com/sh/install-ubuntu-trusty-td-agent2.sh | sh

Change the default user and group to root.

Needed to allow access to /var/log/docker/... files.

RUN sed -i -e "s/USER=td-agent/USER=root/" -e "s/GROUP=td-agent/GROUP=root/" /etc/init.d/td-agent

Install the Elasticsearch Fluentd plug-in.

RUN td-agent-gem install fluent-plugin-kubernetes_metadata_filter fluent-plugin-elasticsearch

Clone Fluentd Kafka plugin from github

RUN git clone https://github.com/htgc/fluent-plugin-kafka.git

Install Kafka plugin

RUN td-agent-gem install fluent-plugin-kafka

Copy the Fluentd configuration file.

COPY td-agent.conf /etc/td-agent/td-agent.conf

Run the Fluentd service.

ENTRYPOINT ["td-agent"]

Docker Pull Command
Owner
billjorgensen

Comments (1)
billjorgensen
a year ago

This configuration file for Fluentd / td-agent is used

to watch changes to Docker log files. The kubelet creates symlinks that

capture the pod name, namespace, container name & Docker container ID

to the docker logs for pods in the /var/log/containers directory on the host.

If running this fluentd configuration in a Docker container, the /var/log

directory should be mounted in the container.

#

These logs are then submitted to Elasticsearch which assumes the

installation of the fluent-plugin-elasticsearch & the

fluent-plugin-kubernetes_metadata_filter plugins.

See https://github.com/uken/fluent-plugin-elasticsearch &

https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter for

more information about the plugins.

Maintainer: Jimmi Dyson jimmidyson@gmail.com

#

Example

=======

A line in the Docker log file might like like this JSON:

#

{"log":"2014/09/25 21:15:03 Got request with path wombat\n",

"stream":"stderr",

"time":"2014-09-25T21:15:03.499185026Z"}

#

The time_format specification below makes sure we properly

parse the time format produced by Docker. This will be

submitted to Elasticsearch and should appear like:

$ curl 'http://elasticsearch-logging:9200/_search?pretty'

...

{

"_index" : "logstash-2014.09.25",

"_type" : "fluentd",

"_id" : "VBrbor2QTuGpsQyTCdfzqA",

"_score" : 1.0,

"_source":{"log":"2014/09/25 22:45:50 Got request with path wombat\n",

"stream":"stderr","tag":"docker.container.all",

"@timestamp":"2014-09-25T22:45:50+00:00"}

},

...

#

The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log

record & add labels to the log record if properly configured. This enables users

to filter & search logs on any metadata.

For example a Docker container's logs might be in the directory:

#

/var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b

#

and in the file:

#

997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log

#

where 997599971ee6... is the Docker ID of the running container.

The Kubernetes kubelet makes a symbolic link to this file on the host machine

in the /var/log/containers directory which includes the pod name and the Kubernetes

container name:

#

synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

->

/var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log

#

The /var/log directory on the host is mapped to the /var/log directory in the container

running this instance of Fluentd and we end up collecting the file:

#

/var/log/containers/synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

#

This results in the tag:

#

var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

#

The Kubernetes fluentd plugin is used to extract the namespace, pod name & container name

which are added to the log message as a kubernetes field object & the Docker container ID

is also added under the docker field object.

The final tag is:

#

kubernetes.var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

#

And the final log record look like:

#

{

"log":"2014/09/25 21:15:03 Got request with path wombat\n",

"stream":"stderr",

"time":"2014-09-25T21:15:03.499185026Z",

"kubernetes": {

"namespace": "default",

"pod_name": "synthetic-logger-0.25lps-pod",

"container_name": "synth-lgr"

},

"docker": {

"container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"

}

}

#

This makes it easier for users to search for logs by pod name or by

the name of the Kubernetes container regardless of how many times the

Kubernetes pod has been restarted (resulting in a several Docker container IDs).

#

TODO: Propagate the labels associated with a container along with its logs

so users can query logs using labels as well as or instead of the pod name

and container name. This is simply done via configuration of the Kubernetes

fluentd plugin but requires secrets to be enabled in the fluent pod. This is a

problem yet to be solved as secrets are not usable in static pods which the fluentd

pod must be until a per-node controller is available in Kubernetes.

<source>
type tail
path /var/log/containers/.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S
tag kubernetes.

format json
read_from_head true
</source>

<source>
type tail
#format none
format json
path /var/log/salt/minion
pos_file /var/log/gcp-salt.pos
tag salt
</source>

<source>
type tail
#format none
format json
path /var/log/startupscript.log
pos_file /var/log/es-startupscript.log.pos
tag startupscript
</source>

<source>
type tail
#format none
format json
path /var/log/docker.log
pos_file /var/log/es-docker.log.pos
tag docker
</source>

<source>
type tail
#format none
format json
path /var/log/etcd.log
pos_file /var/log/es-etcd.log.pos
tag etcd
</source>

<source>
type tail
#format none
format json
path /var/log/kubelet.log
pos_file /var/log/es-kubelet.log.pos
tag kubelet
</source>

<source>
type tail
#format none
format json
path /var/log/kube-apiserver.log
pos_file /var/log/es-kube-apiserver.log.pos
tag kube-apiserver
</source>

<source>
type tail
#format none
format json
path /var/log/kube-controller-manager.log
pos_file /var/log/es-kube-controller-manager.log.pos
tag kube-controller-manager
</source>

<source>
type tail
#format none
format json
path /var/log/kube-scheduler.log
pos_file /var/log/es-kube-scheduler.log.pos
tag kube-scheduler
</source>

<filter kubernetes.**>
type kubernetes_metadata
</filter>

<match **>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch-logging.kube-system.svc.cluster.local
port 9200
logstash_format true
# Set the chunk limit the same as for fluentd-gcp.
#buffer_chunk_limit 512K
buffer_chunk_limit 256K
# Cap buffer memory usage to 512KB/chunk * 128 chunks = 65 MB
#buffer_queue_limit 128
buffer_queue_limit 256
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 300
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
</match>

<match *.**>
@type kafka
#Brokers: you can choose either brokers or zookeeper.
brokers $${KAFKA_SERVICE_HOST}:$${KAFKA_SERVICE_PORT} # Set brokers directly
#zookeeper $${ZOOKEEPER_SERVICE_HOST}:$${ZOOKEEPER_SERVICE_PORT} # Set brokers via Zookeeper

default_topic kubernetes
default_partition_key (string) :default => nil
#output_data_type (json|ltsv|msgpack|attr:<record name>|<formatter name>) :default => json
output_include_tag (true|false) :default => false
output_include_time (true|false) :default => false
max_send_retries (integer) :default => 5
required_acks (integer) :default => 0
ack_timeout_ms (integer) :default => 1500
#compression_codec (none|gzip|snappy) :default => none
</match>