wcld

wc -l (daemon)

Wcld is a process that will listen on TCP $PORT for incoming log data. Wcld will parse the crnl separated data looking for key=value substrings. When a key=value substring is found, wcld will write the keys and values to an hstore column in a PostgreSQL database.

Usage

Durability

Wcld can be configured for maximum throughput by relaxing it's durability constraints. There is a buffer for the database writing mechanism that can be configured. A buffer size of 1 will force wcld to commit each log line that is consumed. A buffer of size 1000 will allow wcld to write 1000 log lines to the database before commiting the transaction. Of course if the program crashes before the transaction is commited then the data will be lost.

The durability can be configured by using the -d flag. The default is 1.

$ bin/wcld -d=1000

Queries

Once your applications are draining their logs into a wcld process, you can begin reporting our your log data.

On a typical web process, the Heroku router will emit the following log message:

2012-02-16T06:06:16+00:00 heroku[router]: PUT shushu.herokuapp.com/resources/328408/billable_events/41143162 dyno=web.3 queue=0 wait=0ms service=89ms status=201 bytes=235

Notice how the log message contains the service time. This represents the time it took our web process to respond to the request. We can quickly group our app's average response time grouped by hour:

$ heroku pg:psql

Avg

SELECT
  date_trunc('hour', time) AS time_group,
  avg((data -> 'service')::interval)
FROM
  log_data
WHERE
  data ? 'service'
  GROUP BY time_group
  ORDER BY time_group
;

       time_group       |       avg
------------------------+-----------------
 2012-02-13 20:00:00+00 | 00:00:00.074848
 2012-02-13 21:00:00+00 | 00:00:00.076898
 2012-02-13 22:00:00+00 | 00:00:00.073627
 2012-02-13 23:00:00+00 | 00:00:00.075232
 2012-02-14 00:00:00+00 | 00:00:00.075852
 2012-02-14 01:00:00+00 | 00:00:00.073475
 2012-02-14 02:00:00+00 | 00:00:00.072609
 2012-02-14 03:00:00+00 | 00:00:00.073081

Percentile

SELECT
  perctile,
  avg(elapsed_time::interval)
FROM (
  SELECT
    data -> 'elapsed_time' as elapsed_time,
    ntile(100) over (order by (data -> 'elapsed_time')) as perctile
  FROM
    log_data
  WHERE
    data -> 'action' = 'find_prev_rec'
    and
    time > now() - '9 minutes'::interval
    and
    expired = false
) x
WHERE
  perctile = 95
GROUP BY perctile
;

 perctile |       avg
----------+-----------------
       95 | 00:00:00.008944
(1 row)

Indexing

One possible indexing strategy:

ALTER TABLE log_data ADD COLUMN expired boolean default false;
UPDATE log_data SET expired = 't' where time <= now() - '3 days'::interval;
CREATE INDEX recent_events on log_data (time) where expired = false;
-- use crom to REINDEX each day ??

Deploy to Heroku

Create app with Go buildpack
Attach database to app
Attach route to app
Point emitter app's at new wcld app

Create App

$ git clone git://github.com/ryandotsmith/wcld.git
$ cd wcld
$ heroku create -s cedar --buildpack=https://github.com/kr/heroku-buildpack-go
$ echo "wcld/wcld" >.godir
$ echo "wcld: bin/wcld -f=\"kv\"" > Procfile
$ git add . ; git commit -am "init"
$ git push heroku master

Attach Database

$ heroku addons:add heroku-postgresql:ika
$ heroku pg:wait
$ heroku pg:promote HEROKU_POSTGRESQL_<COLOR>
$ heroku pg:psql
psql- create extension hstore;
psql- create table events (time timestamptz, data hstore);
psql- create index index_events_by_time on events (time);

Attach Route

$ heroku routes:create
$ heroku routes:attach tcp://... wcld

Start WCLD Process

$ heroku scale wcld=2 #can use multiple processes

Use it to drain an emitter app:

$ heroku drains:add syslog://... -a other-app

Build

$ cd $GOROOT
$ hg update weekly
$ cd src; ./all.bash
$ cd $GOPATH/src
$ git clone git://github.com/ryandotsmith/wcld.git
$ cd wcld
$ go build .

Test

$ cd $GOPATH/src/wcld
$ go test .

Wcld

Install / Use

README