Vnlog
Process labelled tabular ASCII data using normal UNIX tools
Install / Use
/learn @dkogan/VnlogREADME
- Talk
I just gave a talk about this at [[https://www.socallinuxexpo.org/scale/17x][SCaLE 17x]]. Here are the [[https://www.youtube.com/watch?v=Qvb_uNkFGNQ&t=12830s][video of the talk]] and the [[https://github.com/dkogan/talk-feedgnuplot-vnlog/blob/master/feedgnuplot-vnlog.org]["slides"]].
- Summary
Vnlog ("vanilla-log") is a toolkit for manipulating tabular ASCII data with labelled fields using normal UNIX tools. If you regularly use =awk= and =sort= and =uniq= and others, these tools will make you infinitely more powerful. The vnlog tools /extend/, rather than replace the standard tooling, so minimal effort is required to learn and use these tools.
Everything assumes a trivially simple log format:
- A whitespace-separated table of ASCII human-readable text
- A =#= character starts a comment that runs to the end of the line (like in many scripting languages)
- The first line that begins with a single =#= (not =##= or =#!=) is a /legend/, naming each column. This is required, and the field names that appear here are referenced by all the tools.
- Empty fields reported as =-=
This describes 99% of the format, with some extra details [[#format-details][below]]. Example:
#+BEGIN_EXAMPLE #!/usr/bin/whatever
a b c
1 2 3
comment
4 5 6 #+END_EXAMPLE
Such data can be processed directly with almost any existing tool, and /this/ toolkit allows the user to manipulate this data in a nicer way by relying on standard UNIX tools. The core philosophy is to avoid creating new knowledge as much as possible. Consequently, the vnlog toolkit relies /heavily/ on existing (and familiar!) tools and workflows. As such, the toolkit is small, light, and has a /very/ friendly learning curve.
- Synopsis
I have [[https://raw.githubusercontent.com/dkogan/vnlog/master/dji-tsla.tar.gz][two sets of historical stock data]], from the start of 2018 until now (2018/11):
#+BEGIN_SRC sh :results output :exports both < dji.vnl head -n 4 #+END_SRC
#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000
And
#+BEGIN_SRC sh :results output :exports both < tsla.vnl head -n 4 #+END_SRC
#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 342.33 348.58 339.04 348.44 348.44 4486339 : 2018-11-14 342.70 347.11 337.15 344.00 344.00 5036300 : 2018-11-13 333.16 344.70 332.20 338.73 338.73 5448600
I can add whitespace to make the headers more legible by humans:
#+BEGIN_SRC sh :results output :exports both < dji.vnl head -n 4 | vnl-align #+END_SRC
#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000
I can pull out the closing prices:
#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p Close | head -n4 #+END_SRC
#+RESULTS: : # Close : 25289.27 : 25080.50 : 25286.49
=vnl-filter= is primarily a wrapper around =awk= or =perl=, allowing the user to reference columns by name. I can then plot the closing prices:
#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC
#+RESULTS: [[file:guide-1.svg]]
Here I kept /only/ the closing price column, so the x-axis is just the row index. The data was in reverse chronological order, so this plot is also in reverse chronological order. Let's fix that:
#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC
#+RESULTS: [[file:guide-2.svg]]
The =vnl-sort= tool (and most of the other =vnl-xxx= tools) are wrappers around the core tools already available on the system (such as =sort=, in this case). With the primary difference being reading/writing vnlog, and referring to columns by name. Since we just strictly reversed the order of the data, =sort= was a bit overkill, and we could have equivalently done:
#+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-tac | vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC
We now have the data in the correct order, but it'd be nice to see the actual dates on the x-axis. While we're at it, let's label the axes too:
#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p Date,Close | head -n4 #+END_SRC
#+RESULTS: : # Date Close : 2018-11-15 25289.27 : 2018-11-14 25080.50 : 2018-11-13 25286.49
#+BEGIN_SRC sh :results file link :exports both
< dji.vnl vnl-sort -k Date |
vnl-filter -p Date,Close |
feedgnuplot --lines --unset grid --timefmt %Y-%m-%d --domain
--xlabel 'Date' --ylabel 'Price ($)'
#+END_SRC
#+RESULTS: [[file:guide-3.svg]]
What was the highest value of the Dow-Jones index, and when did it happen?
#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -rgk Close | head -n2 | vnl-align #+END_SRC
#+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-10-03 26833.47 26951.81 26789.08 26828.39 26828.39 280130000
Alrighty. Looks like the high was in October. Let's zoom in on that month:
#+BEGIN_SRC sh :results file link :exports both
< dji.vnl vnl-sort -k Date |
vnl-filter 'Date ~ /2018-10/' -p Date,Close |
feedgnuplot --lines --unset grid --timefmt %Y-%m-%d --domain
--xlabel 'Date' --ylabel 'Price ($)'
#+END_SRC
#+RESULTS: [[file:guide-4.svg]]
OK. Is this thing volatile? What was the largest single-day gain, looking at differences in consecutive closing prices?
#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p '.,d=diff(Close)' | head -n4 | vnl-align #+END_SRC
#+RESULTS:
: # Date Open High Low Close AdjClose Volume d
: 2018-01-02 24809.35 24864.19 24741.70 24824.01 24824.01 341130000 -
: 2018-01-03 24850.45 24941.92 24825.55 24922.68 24922.68 456790000 98.67
: 2018-01-04 24964.86 25105.96 24963.27 25075.13 25075.13 403280000 152.45
#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p '.,gain_closeclose=diff(Close)',gain_openclose=Close-Open | vnl-sort -rgk gain_closeclose | head -n2 | vnl-filter -p Date,gain_ | vnl-align #+END_SRC
#+RESULTS: : # Date gain_closeclose gain_openclose : 2018-03-26 669.4 376.86
So the best single-gain day was 2018-03-26: the dow gained 669.4 points between closing on the previous trading day and 2018-03-26. In that, 376.86 points were gained during trading on 2018-03-26 itself.
What if I looked at maximum trading-day gains?
#+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p '.,gain_closeclose=diff(Close)',gain_openclose=Close-Open | vnl-sort -rgk gain_openclose | head -n2 | vnl-filter -p Date,gain_ | vnl-align #+END_SRC
#+RESULTS: : # Date gain_closeclose gain_openclose : 2018-02-06 567.02 827.6
By that metric 2018-02-06 was much better. Since vnlog is a trivially simple data format, we can use non-vnlog tools to compute statistics such as this. For instance, we can do the same thing with ministat:
#+begin_src sh :results output :exports both < dji.vnl vnl-filter -p gain_openclose=Close-Open | ministat -A #+end_src
#+RESULTS: : x <stdin> : N Min Max Median Avg Stddev : x 222 -1041.84 827.6 20.04 -9.3664414 230.67518
Or [[https://www.gnu.org/software/datamash/][datamash]]:
#+begin_src sh :results output :exports both < dji.vnl vnl-filter -p gain_openclose=Close-Open | datamash -CW max 1 #+end_src
#+RESULTS: : 827.6
Datamash 1.9 knows about vnlog specifically, so we can do a bit better:
#+begin_src sh :results output :exports both < dji.vnl vnl-filter -p gain_openclose=Close-Open | datamash --vnlog max gain_openclose #+end_src
#+RESULTS: : # max(gain_openclose) : 827.6
Let's join the Dow-jones index data and the TSLA data, and let's look at them together:
#+BEGIN_SRC sh :results output :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | head -n4 | vnl-align #+END_SRC
#+RESULTS:
: # Date Open_dji High_dji Low_dji Close_dji AdjClose_dji Volume_dji Open_tsla High_tsla Low_tsla Close_tsla AdjClose_tsla Volume_tsla
: 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 342.33 348.58 339.04 348.44 348.44 4486339
: 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 342.70 347.11 337.15 344.00 344.00 5036300
: 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000 333.16 344.70 332.20 338.73 338.73 5448600
#+BEGIN_SRC sh :results output :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | vnl-filter -p '^Close' | head -n4 | vnl-align #+END_SRC
#+RESULTS:
: # Close_dji Close_tsla
: 25289.27 348.44
: 25080.50 344.00
: 25286.49 338.73
#+BEGIN_SRC sh :results file link :exports both
vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date |
vnl-filter -p '^Close' |
feedgnuplot --domain --points --unset grid
--xlabel 'DJI price ($)' --ylabel 'TSLA price ($)'
#+END_SRC
#+RESULTS: [[file:guide-5.svg]]
Huh. Apparently there's no obvious, strong correlation between TSLA and Dow-Jones closing prices. And we saw that with just a few shell commands, without dropping down into a dedicated analysis system.
- Build and installation vnlog is a part of Debian/buster and Ubuntu/cosmic (18.10) and later. On those boxes you can simply
#+BEGIN_EXAMPLE $ sudo apt install vnlog libvnlog-dev libvnlog-perl python3-vnlog #+END_EXAMPLE
to get the binary tools, the C API, the perl and python3 inte
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
