SkillAgentSearch skills...

Vnlog

Process labelled tabular ASCII data using normal UNIX tools

Install / Use

/learn @dkogan/Vnlog
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

  • Talk

I just gave a talk about this at [[https://www.socallinuxexpo.org/scale/17x][SCaLE 17x]]. Here are the [[https://www.youtube.com/watch?v=Qvb_uNkFGNQ&t=12830s][video of the talk]] and the [[https://github.com/dkogan/talk-feedgnuplot-vnlog/blob/master/feedgnuplot-vnlog.org]["slides"]].

  • Summary

Vnlog ("vanilla-log") is a toolkit for manipulating tabular ASCII data with labelled fields using normal UNIX tools. If you regularly use =awk= and =sort= and =uniq= and others, these tools will make you infinitely more powerful. The vnlog tools /extend/, rather than replace the standard tooling, so minimal effort is required to learn and use these tools.

Everything assumes a trivially simple log format:

  • A whitespace-separated table of ASCII human-readable text
  • A =#= character starts a comment that runs to the end of the line (like in many scripting languages)
  • The first line that begins with a single =#= (not =##= or =#!=) is a /legend/, naming each column. This is required, and the field names that appear here are referenced by all the tools.
  • Empty fields reported as =-=

This describes 99% of the format, with some extra details [[#format-details][below]]. Example:

#+BEGIN_EXAMPLE #!/usr/bin/whatever

a b c

1 2 3

comment

4 5 6 #+END_EXAMPLE

Such data can be processed directly with almost any existing tool, and /this/ toolkit allows the user to manipulate this data in a nicer way by relying on standard UNIX tools. The core philosophy is to avoid creating new knowledge as much as possible. Consequently, the vnlog toolkit relies /heavily/ on existing (and familiar!) tools and workflows. As such, the toolkit is small, light, and has a /very/ friendly learning curve.

  • Synopsis

I have [[https://raw.githubusercontent.com/dkogan/vnlog/master/dji-tsla.tar.gz][two sets of historical stock data]], from the start of 2018 until now (2018/11):

#+BEGIN_SRC sh :results output :exports both < dji.vnl head -n 4 #+END_SRC

#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000

And

#+BEGIN_SRC sh :results output :exports both < tsla.vnl head -n 4 #+END_SRC

#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 342.33 348.58 339.04 348.44 348.44 4486339 : 2018-11-14 342.70 347.11 337.15 344.00 344.00 5036300 : 2018-11-13 333.16 344.70 332.20 338.73 338.73 5448600

I can add whitespace to make the headers more legible by humans:

#+BEGIN_SRC sh :results output :exports both < dji.vnl head -n 4 | vnl-align #+END_SRC

#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000

I can pull out the closing prices:

#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p Close | head -n4 #+END_SRC

#+RESULTS: : # Close : 25289.27 : 25080.50 : 25286.49

=vnl-filter= is primarily a wrapper around =awk= or =perl=, allowing the user to reference columns by name. I can then plot the closing prices:

#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC

#+RESULTS: [[file:guide-1.svg]]

Here I kept /only/ the closing price column, so the x-axis is just the row index. The data was in reverse chronological order, so this plot is also in reverse chronological order. Let's fix that:

#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC

#+RESULTS: [[file:guide-2.svg]]

The =vnl-sort= tool (and most of the other =vnl-xxx= tools) are wrappers around the core tools already available on the system (such as =sort=, in this case). With the primary difference being reading/writing vnlog, and referring to columns by name. Since we just strictly reversed the order of the data, =sort= was a bit overkill, and we could have equivalently done:

#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-tac | vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC

We now have the data in the correct order, but it'd be nice to see the actual dates on the x-axis. While we're at it, let's label the axes too:

#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p Date,Close | head -n4 #+END_SRC

#+RESULTS: : # Date Close : 2018-11-15 25289.27 : 2018-11-14 25080.50 : 2018-11-13 25286.49

#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p Date,Close | feedgnuplot --lines --unset grid --timefmt %Y-%m-%d --domain
--xlabel 'Date' --ylabel 'Price ($)' #+END_SRC

#+RESULTS: [[file:guide-3.svg]]

What was the highest value of the Dow-Jones index, and when did it happen?

#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -rgk Close | head -n2 | vnl-align #+END_SRC

#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-10-03 26833.47 26951.81 26789.08 26828.39 26828.39 280130000

Alrighty. Looks like the high was in October. Let's zoom in on that month:

#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter 'Date ~ /2018-10/' -p Date,Close | feedgnuplot --lines --unset grid --timefmt %Y-%m-%d --domain
--xlabel 'Date' --ylabel 'Price ($)' #+END_SRC

#+RESULTS: [[file:guide-4.svg]]

OK. Is this thing volatile? What was the largest single-day gain, looking at differences in consecutive closing prices?

#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p '.,d=diff(Close)' | head -n4 | vnl-align #+END_SRC

#+RESULTS: : # Date Open High Low Close AdjClose Volume d
: 2018-01-02 24809.35 24864.19 24741.70 24824.01 24824.01 341130000 -
: 2018-01-03 24850.45 24941.92 24825.55 24922.68 24922.68 456790000 98.67 : 2018-01-04 24964.86 25105.96 24963.27 25075.13 25075.13 403280000 152.45

#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p '.,gain_closeclose=diff(Close)',gain_openclose=Close-Open | vnl-sort -rgk gain_closeclose | head -n2 | vnl-filter -p Date,gain_ | vnl-align #+END_SRC

#+RESULTS: : # Date gain_closeclose gain_openclose : 2018-03-26 669.4 376.86

So the best single-gain day was 2018-03-26: the dow gained 669.4 points between closing on the previous trading day and 2018-03-26. In that, 376.86 points were gained during trading on 2018-03-26 itself.

What if I looked at maximum trading-day gains?

#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p '.,gain_closeclose=diff(Close)',gain_openclose=Close-Open | vnl-sort -rgk gain_openclose | head -n2 | vnl-filter -p Date,gain_ | vnl-align #+END_SRC

#+RESULTS: : # Date gain_closeclose gain_openclose : 2018-02-06 567.02 827.6

By that metric 2018-02-06 was much better. Since vnlog is a trivially simple data format, we can use non-vnlog tools to compute statistics such as this. For instance, we can do the same thing with ministat:

#+begin_src sh :results output :exports both < dji.vnl vnl-filter -p gain_openclose=Close-Open | ministat -A #+end_src

#+RESULTS: : x <stdin> : N Min Max Median Avg Stddev : x 222 -1041.84 827.6 20.04 -9.3664414 230.67518

Or [[https://www.gnu.org/software/datamash/][datamash]]:

#+begin_src sh :results output :exports both < dji.vnl vnl-filter -p gain_openclose=Close-Open | datamash -CW max 1 #+end_src

#+RESULTS: : 827.6

Datamash 1.9 knows about vnlog specifically, so we can do a bit better:

#+begin_src sh :results output :exports both < dji.vnl vnl-filter -p gain_openclose=Close-Open | datamash --vnlog max gain_openclose #+end_src

#+RESULTS: : # max(gain_openclose) : 827.6

Let's join the Dow-jones index data and the TSLA data, and let's look at them together:

#+BEGIN_SRC sh :results output :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | head -n4 | vnl-align #+END_SRC

#+RESULTS: : # Date Open_dji High_dji Low_dji Close_dji AdjClose_dji Volume_dji Open_tsla High_tsla Low_tsla Close_tsla AdjClose_tsla Volume_tsla : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 342.33 348.58 339.04 348.44 348.44 4486339
: 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 342.70 347.11 337.15 344.00 344.00 5036300
: 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000 333.16 344.70 332.20 338.73 338.73 5448600

#+BEGIN_SRC sh :results output :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | vnl-filter -p '^Close' | head -n4 | vnl-align #+END_SRC

#+RESULTS: : # Close_dji Close_tsla : 25289.27 348.44
: 25080.50 344.00
: 25286.49 338.73

#+BEGIN_SRC sh :results file link :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | vnl-filter -p '^Close' | feedgnuplot --domain --points --unset grid
--xlabel 'DJI price ($)' --ylabel 'TSLA price ($)' #+END_SRC

#+RESULTS: [[file:guide-5.svg]]

Huh. Apparently there's no obvious, strong correlation between TSLA and Dow-Jones closing prices. And we saw that with just a few shell commands, without dropping down into a dedicated analysis system.

  • Build and installation vnlog is a part of Debian/buster and Ubuntu/cosmic (18.10) and later. On those boxes you can simply

#+BEGIN_EXAMPLE $ sudo apt install vnlog libvnlog-dev libvnlog-perl python3-vnlog #+END_EXAMPLE

to get the binary tools, the C API, the perl and python3 inte

Related Skills

View on GitHub
GitHub Stars174
CategoryDevelopment
Updated2mo ago
Forks7

Languages

Perl

Security Score

80/100

Audited on Jan 26, 2026

No findings