Log Everything with Scribe


Why Logging?

The most important thing of running a webservice like EZTABLE is to keep the service 24/7 available. However, things always happen in unexpected ways. Having a monitoring system improves the response time when the service operates abnormally. Having a logging system allows DevOps and software developers to foresee the problems before it happens.

If you are going to run a webserviice, I would suggest you to log everything at day 1.

What is a Good Logging System?

When choosing a logging system, I would always consider the following features:

  • Capability of logging from different programming languages and sources.
  • Integration with storage components like S3 and HDFS.
  • Supporting for server farm and fault recovery.

Currently, the best open source solutions are Apache Flume, fluentd, and Scribe.

Those solutions support at least one of the popular RPC libraries like Thrift, or have very good integration with language's logging frameworks like Log4j or Monolog. Being able to store logs in various storage system and a distributed architecture are the for-sure features.

Scribe, Open Sourced by Facebook

Facebook open sourced Scribe in 2008. As the following figure, the architecture is simple tree model.

          'client'                    'central'
----------------------------     --------------------
| Port 1464                 |    | Port 1463         |
|        ----------------   |    | ----------------  |
|     -> | scribe server |--|--->| | scribe server | |
|        ----------------   |    | ----------------  |
|                |          |    |    |         |    |
|            temp file      |    |    |    temp file |
|---------------------------     |-------------------
                                      |
                                   -------------------
                                   | /tmp/scribetest/ |
                                   -------------------

Running a Scribe deamon on each server. Applications use Thrift to communicate with Scribe server on localhost. The local Scribe deamon buffered the logs and forward to the upstream Scribe server. Finally the central Scribe server append the logs to the filesystem.

Detailed installation guide and examples are on Github. Basically you can get Scribe on production with the following steps:

1.Instell Thrift.
2. Install FB303 from Thrift's contrib folder.
3. Install Scribe.
4. Write the configuration file for buffers and central. Start Scribe deamon.
5. Generate the source code for your language from the scribe/if folder.

Log Format

I highly recommend you to store logs in JSON format. It can be processed by all languages.

Gotchas

I was having a hard time compiling Scribe after I upgrade the libboost package. I found this blog post solving my problem. Basically, just add the following when ./configure

$ ./configure CPPFLAGS="-DHAVE_INTTYPES_H -DHAVE_NETINET_IN_H -DBOOST_FILESYSTEM_VERSION=2"