Pkglink
Space saving Node.js package hard linker. pkglink locates common JavaScript/Node.js packages from your node_modules directories and hard links the package files so they share disk space.
Install / Use
/learn @jeffbski/PkglinkREADME
pkglink
Space saving Node.js package hard linker.
pkglink locates common JavaScript/Node.js packages from your node_modules directories and hard links the package files so they share disk space.
<img src="https://cloud.githubusercontent.com/assets/5689/19868149/ccf7ded8-9f74-11e6-808e-247d24e68d27.gif" width="640" height="360" alt="demo" />Why?
As an instructor, I create lots of JavaScript and Node.js projects and many of them use the same packages. However due to the way packages are installed they all take up their own disk space. It would be nice to have a way for the installations of the same package to share disk space.
Modern operating systems and disk formats support the concept of hard links which is a way to have one copy of a file on disk that can be used from multiple paths. Since packages are generally read-only once they are installed, it would save much disk space if we could hard link their files.
pkglink is a command line tool that searches directory tree that you specify for packages in your node_modules directories. When it finds matching packages of the same name and version that could share space, it hard links the files. As a safety precaution it checks many file attributes before considering them for linking (see full details later in this doc).
pkglink keeps track of packages it has seen on previous scans so when you run on new directories in the future, it can quickly know where to look for previous package matches. It double checks the previous packages are still the proper version, inode, and modified time before linking, but this prevents performing full tree scans any time you add a new project. Simply run pkglink once on your project tree and then again on new projects as you create them.
pkglink has been tested on Ubuntu, Mac OS X, and Windows. Hard links are supported on most modern disk formats with the exception of FAT and ReFS.
How much savings?
It all depends on how many matching packages you have on your system, but you will probably be surprised.
After running pkglink on my project directories, it found 128K packages and saved over 20GB of disk space.
Assumptions for use
The main assumption that enables hard linking is that you are not manually modifying your packages after install from the registry. This means that installed packages of the same name and version should generally be the same. Additional checks at the file level are used to verify matches (see filter criteria later in this doc) before selecting them for linking.
Before running any tool that can modify your file system it is always a good idea to have a current backup and sync code with your repositories.
Hard linking will not work on FAT and ReFS file systems. Hard links can only be made between files on the same device (drive). pkglink has been tested on Mac OS X (hpfs), Ubuntu (ext4), and Windows (NTFS).
If you had to recover from an unforeseen defect in pkglink, the recovery process is to simply delete your project's node_modules directory and perform npm install again.
Installation
npm install -g pkglink
Quick start
To find and hard link matching packages
To hard link packages just run pkglink with one or more directory trees that you wish it to scan and link.
pkglink DIR1 DIR2 ...
You will get output similar to this:
jeffbski-laptop:~$ pkglink ~/projects ~/working
pkgs: 128,383 saved: 5.11GB
The run above indicated that pkglink found 128K packages and after linking it saved over 5GB of disk space. (Actual savings was higher since I had run pkglink on a portion of the tree previously)
Dryrun - just output a list of matching packages
If you wish to see what packages pkglink would link you can use the --dryrun or -d option. pkglink will output matching packages that it would normally link but it will NOT perform any linking.
pkglink -d DIR1 DIR2 ...
The --dryrun output looks like:
jeffbski-laptop:~$ pkglink -d ~/working/expect-test
tmatch-2.0.1
/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/tmatch
/Users/jeff/working/expect-test/node_modules/tmatch
object.entries-1.0.3
/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/object.entries
/Users/jeff/working/expect-test/node_modules/object.entries
object-keys-1.0.11
/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/object-keys
/Users/jeff/working/expect-test/node_modules/object-keys
# pkgs: 21 would save: 3.88MB
Generate link commands only
If you want to see exactly what it would be linking down to the file level, you can use the --gen-ln-cmds or -g option and it will output the equivalent bash commands for the hard links that it would normally create. It will not peform the linking. You can view this for correctness or even save it to a file and excute it with bash besides just running pkglink again wihout the -g option.
pkglink -g DIR1 DIR2 ...
The --gen-ln-cmds output looks like
jeffbski-laptop:~$ pkglink -g ~/working/expect-test
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/define-properties/index.js" "/Users/jeff/working/expect-test/node_modules/define-properties/index.js"
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/expect/CHANGES.md" "/Users/jeff/working/expect-test/node_modules/expect/CHANGES.md"
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/expect/LICENSE.md" "/Users/jeff/working/expect-test/node_modules/expect/LICENSE.md"
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/es-abstract/Makefile" "/Users/jeff/working/expect-test/node_modules/es-abstract/Makefile"
# pkgs: 21 would save: 3.88MB
Full Usage
Usage: pkglink {OPTIONS} [dir] [dirN]
Description:
pkglink - Space saving Node.js package hard linker
pkglink recursively searches directories for Node.js packages
installed in node_modules directories. It uses the package name
and version to match up possible packages to share. Once it finds
similar packages, pkglink walks through the package directory tree
checking for files that can be linked. If each file's modified
datetime and size match, it will create a hard link for that file
to save disk space. (On win32, mtimes are inconsistent and ignored)
It keeps track of modules linked in ~/.pkglink_refs to quickly
locate similar modules on future runs. The refs are always
double checked before being considered for linking. This makes
it convenient to perform future pkglink runs on new directories
without having to reprocess the old.
Standard Options:
-c, --config CONFIG_PATH
This option overrides the config file path, default ~/.pkglink
-d, --dryrun
Instead of performing the linking, just display the modules that
would be linked and the amount of disk space that would be saved.
-g, --gen-ln-cmds
Instead of performing the linking, just generate link commands
that the system would perform and output
-h, --help
Show this message
-m, --memory MEMORY_MB
Run with increased or decreased memory specified in MB, overrides
environment variable PKGLINK_NODE_OPTIONS and config.memory
The default memory used is 2560.
-p, --prune
Prune the refs file by checking all of the refs clearing out any
that have changed
-r, --refs-file REFS_FILE_PATH
Specify where to load and store the link refs file which is used to
quickly locate previously linked modules. Default ~/pkglink_refs.json
-t, --tree-depth N
Maximum depth to search the directories specified for packages
Default depth: 0 (unlimited)
-v, --verbose
Output additional information helpful for debugging
If your machine has less than 2.5GB of memory you can use pkglink_low instead of pkglink and it will run with the normal 1.5GB memory default.
Config
The default config file path is ~/.pkglink unless you override it with the --config command line option. If this file exists it should be a JSON file with an object having any of the following properties.
-
refsFile- location of the JSON file used to track the last 5 references to each package it finds, default:~/.pkglink_refs. This can also be overridden with the--refs-filecommand line argument. -
concurrentOps- the number of concurrent operations allowed for IO operations, default: 4 -
consoleWidth- the number of columns in your console, default: 70 -
ignoreModTime- ignore the modification time of the files, default is true on Windows, otherwise false -
memory- adjust the memory used in MB, default: 2560 (2.5GB). Can also be overridden by setting environment variable PKGLINK_NODE_OPTIONS=--max-old-space-size=1234 or by using the command line argument--memory. -
minFileSize- the minimum size file to consider for linking in bytes, default: 0 -
refSize- number of package refs to keep in the refsFile which is used to find matching packages on successive runs, default: 5 -
tree-depth- the maximum depth to search the directories for packages, default: 0 (unlimited). Can also be overridden with--tree-depthcommand line option.
How do I know it is working?
Well if you check your disk space before and after a run it should be at least as much savings as pkglink indicates during a run. pkglink indicates the file size saved, but the actual savings can be greater due to the block size of the disk.
On systems with bash, you can also use `ls -ali nod

