Arctee
Atomic tee
Install / Use
/learn @karlicoss/ArcteeREADME
#+EXPORT_EXCLUDE_TAGS: noexport
#+begin_src python :exports output :results replace raw import arctee return arctee.doc #+end_src
#+RESULTS:
Helper script to run your data exports. It works kind of like [[https://en.wikipedia.org/wiki/Tee_(command)][tee command]], but:
- a: writes output atomically
- r: supports retrying command
- c: supports compressing output
You can read more on how it's used [[https://beepb00p.xyz/exports.html#arctee][here]].
- Motivation Many things are very common to all data exports, regardless of the source. In the vast majority of cases, you want to fetch some data, save it in a file (e.g. JSON) along with a timestamp and potentially compress.
This script aims to minimize the common boilerplate:
- =path= argument allows easy ISO8601 timestamping and guarantees atomic writing, so you'd never end up with corrupted exports.
- =--compression= allows to compress simply by passing the extension. No more =tar -zcvf=!
- =--retries= allows easy exponential backoff in case service you're querying is flaky.
Example:
: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py
-
runs =/soft/export/rememberthemilk.py=, retrying it up to three times if it fails
The script is expected to dump its result in stdout; stderr is simply passed through.
-
once the data is fetched it's compressed as =zstd=
-
timestamp is computed and compressed data is written to =/exports/rtm/20200102T170015Z.ical.zstd=
- Do you really need a special script for that?
-
why not use =date= command for timestamps?
passing =$(date -Iseconds --utc).json= as =path= works, however I need it for most of my exports; so it ends up polluting my crontabs.
Next, I want to do several things one after another here. That sounds like a perfect candidate for pipes, right? Sadly, there are serious caveats:
-
pipe errors don't propagate. If one parts of your pipe fail, it doesn't fail everything
That's a major problem that often leads to unexpected behaviours.
In bash you can fix this by setting =set -o pipefail=. However:
-
default cron shell is =/bin/sh=. Ok, you can change it to ~SHELL=/bin/bash~, but
-
you can't set it to =/bin/bash -o pipefail=
You'd have to prepend all of your pipes with =set -o pipefail=, which is quite boilerplaty
-
-
you can't use pipes for retrying; you need some wrapper script anyway
E.g. similar to how you need a wrapper script when you want to stop your program on timeout.
-
it's possible to use pipes for atomically writing output to a file, however I haven't found any existing tools to do that
E.g. I want something like =curl https://some.api/get-data | tee --atomic /path/to/data.sjon=.
If you know any existing tool please let me know!
-
it's possible to pipe compression
However due to the above concerns (timestamping/retrying/atomic writing), it has to be part of the script as well.
It feels that cron isn't a suitable tool for my needs due to pipe handling and the need for retries, however I haven't found a better alternative. If you think any of these things can be simplified, I'd be happy to know and remove them in favor of more standard solutions!
- Installation
This can be installed with pip by running: =pip3 install --user git+https://github.com/karlicoss/arctee=
You can also manually install this by installing =atomicwrites= (=pip3 install atomicwrites=) and downloading and running =arctee.py= directly
** Optional Dependencies
-
=pip3 install --user backoff=
[[https://github.com/litl/backoff][backoff]] is a library to simplify backoff and retrying. Only necessary if you want to use --retries--.
-
=apt install atool=
[[https://www.nongnu.org/atool][atool]] is a tool to create archives in any format. Only necessary if you want to use compression.
end of autogenerated stuff
- Usage
#+begin_src sh :results output :exports output arctee --help #+end_src
TODO ugh. seems that github chokes over #+RESULT: here
#+begin_example usage: arctee [-h] [-r RETRIES] [-c COMPRESSION] path
Wrapper for automating boilerplate for reliable and regular data exports.
Example: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py --user "user@email.com"
Arguments past '--' are the actuall command to run.
positional arguments: path Path with borg-style placeholders. Supported: {utcnow}, {hostname}, {platform}.
Example: '/exports/pocket/pocket_{utcnow}.json'
(see https://manpages.debian.org/testing/borgbackup/borg-placeholders.1.en.html)
optional arguments: -h, --help show this help message and exit -r RETRIES, --retries RETRIES Total number of tries, 1 (default) means only try once. Uses exponential backoff. -c COMPRESSION, --compression COMPRESSION Set compression format.
See 'man apack' for list of supported formats. In addition, 'zstd' is also supported.
#+end_example
- TODOs :noexport:
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
