Dragnet
event stream analysis
Install / Use
/learn @TritonDataCenter/DragnetREADME
Dragnet
Dragnet is a tool for analyzing event stream data stored in files. There are three main commands:
- scan: scan over raw data to execute a query
- build: scan over raw data to produce an index for quickly answering predefined queries
- query: search indexes to execute a query
The prototypical use case is analyzing request logs from a production service. The workflow for Dragnet looks like this:
- Predefine a bunch of metrics you care about (like total request count, request count by server instance, request type, and so on).
- When you accumulate new logs (e.g., hourly or daily), you build the index.
- Whenever you want the values of those metrics, you query the index. This might be part of a constantly-updating dashboard, a daily report, or a threshold-based alarm.
- If you want to gather new metrics, you can define them and rebuild.
- If you want to run a complex query just once, you can scan the raw data rather than adding the query as a metric.
This project is still a prototype. The commands and library interfaces may change incompatibly at any time!
Getting started
dragnet only supports newline-separated JSON. Try it on the sample data in ./tests/data. Start by defining a new datasource:
$ dn datasource-add my_logs --path=$PWD/tests/data
$ dn datasource-list -v
DATASOURCE LOCATION
my_logs file://home/dap/dragnet/dragnet/tests/data
dataFormat: "json"
Now you can scan the data to count the total number of requests:
$ dn scan my_logs
VALUE
2252
You can also break out counts, e.g., by request method:
$ dn scan -b req.method my_logs
REQ.METHOD VALUE
DELETE 582
GET 556
HEAD 551
PUT 563
You can break out results by more than one field:
$ dn scan -b req.method,res.statusCode my_logs
REQ.METHOD RES.STATUSCODE VALUE
DELETE 200 75
DELETE 204 87
DELETE 400 94
DELETE 404 85
DELETE 499 83
DELETE 500 79
DELETE 503 79
GET 200 77
GET 204 83
GET 400 84
GET 404 74
GET 499 79
GET 500 73
GET 503 86
HEAD 200 71
HEAD 204 85
HEAD 400 66
HEAD 404 77
HEAD 499 88
HEAD 500 88
HEAD 503 76
PUT 200 80
PUT 204 79
PUT 400 83
PUT 404 88
PUT 499 68
PUT 500 83
PUT 503 82
(This is randomly-generated data, which is why you see some combinations that probably don't make sense, like a 200 from a DELETE.)
You can specify multiple fields separated by commas, like above, or using "-b" more than once. This example does the same thing as the previous one:
$ dn scan -b req.method -b res.statusCode my_logs
REQ.METHOD RES.STATUSCODE VALUE
DELETE 200 75
DELETE 204 87
DELETE 400 94
DELETE 404 85
DELETE 499 83
DELETE 500 79
DELETE 503 79
GET 200 77
GET 204 83
GET 400 84
GET 404 74
GET 499 79
GET 500 73
GET 503 86
HEAD 200 71
HEAD 204 85
HEAD 400 66
HEAD 404 77
HEAD 499 88
HEAD 500 88
HEAD 503 76
PUT 200 80
PUT 204 79
PUT 400 83
PUT 404 88
PUT 499 68
PUT 500 83
PUT 503 82
The order of breakdowns matters. If we reverse them, we get different output:
$ dn scan -b res.statusCode,req.method my_logs
RES.STATUSCODE REQ.METHOD VALUE
200 DELETE 75
200 GET 77
200 HEAD 71
200 PUT 80
204 DELETE 87
204 GET 83
204 HEAD 85
204 PUT 79
400 DELETE 94
400 GET 84
400 HEAD 66
400 PUT 83
404 DELETE 85
404 GET 74
404 HEAD 77
404 PUT 88
499 DELETE 83
499 GET 79
499 HEAD 88
499 PUT 68
500 DELETE 79
500 GET 73
500 HEAD 88
500 PUT 83
503 DELETE 79
503 GET 86
503 HEAD 76
503 PUT 82
Filters
You can filter records using node-krill filter syntax:
$ dn scan -f '{ "eq": [ "req.method", "GET" ] }' my_logs
VALUE
556
and you can combine this with breakdowns, of course:
$ dn scan -f '{ "eq": [ "req.method", "GET" ] }' -b operation my_logs
OPERATION VALUE
getjoberrors 181
getpublicstorage 176
getstorage 199
Numeric breakdowns
To break down by numeric quantities, it's usually best to aggregate nearby values into buckets. Here's a histogram of the "latency" field from this log:
$ dn scan -b latency[aggr=quantize] my_logs
value ------------- Distribution ------------- count
0 | 0
1 |@@ 113
2 |@@@@@@@@ 449
4 |@@@@@@ 348
8 | 0
16 |@@@@@@@@@@@@ 682
32 | 0
64 |@ 57
128 |@@@ 165
256 | 0
512 | 0
1024 |@@ 136
2048 |@@@@@ 302
4096 | 0
"aggr=quantize" specifies a power-of-two bucketization. You can also do a linear quantization, say with steps of size 200:
$ dn scan -b latency[aggr=lquantize,step=200] my_logs
value ------------- Distribution ------------- count
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1814
200 | 0
400 | 0
600 | 0
800 | 0
1000 | 23
1200 |@ 31
1400 |@ 35
1600 | 18
1800 | 24
2000 |@ 34
2200 |@ 35
2400 | 28
2600 |@ 33
2800 | 18
3000 |@ 34
3200 | 27
3400 |@ 34
3600 | 26
3800 | 25
4000 | 13
4200 | 0
These are modeled after DTrace's aggregating actions. You can combine these with filters and other breakdowns:
$ dn scan -f '{ "eq": [ "req.method", "GET" ] }' \
-b req.method,operation,latency[aggr=quantize] my_logs
GET, getjoberrors
value ------------- Distribution ------------- count
0 | 0
1 |@@ 9
2 |@@@@@@@ 32
4 |@@@@@ 24
8 | 0
16 |@@@@@@@@@@@@@@ 63
32 | 0
64 |@ 5
128 |@@@ 13
256 | 0
512 | 0
1024 |@@@
