SkillAgentSearch skills...

Pkglink

Space saving Node.js package hard linker. pkglink locates common JavaScript/Node.js packages from your node_modules directories and hard links the package files so they share disk space.

Install / Use

/learn @jeffbski/Pkglink
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

pkglink

Space saving Node.js package hard linker.

pkglink locates common JavaScript/Node.js packages from your node_modules directories and hard links the package files so they share disk space.

Build Status Known Vulnerabilities

<img src="https://cloud.githubusercontent.com/assets/5689/19868149/ccf7ded8-9f74-11e6-808e-247d24e68d27.gif" width="640" height="360" alt="demo" />

Why?

As an instructor, I create lots of JavaScript and Node.js projects and many of them use the same packages. However due to the way packages are installed they all take up their own disk space. It would be nice to have a way for the installations of the same package to share disk space.

Modern operating systems and disk formats support the concept of hard links which is a way to have one copy of a file on disk that can be used from multiple paths. Since packages are generally read-only once they are installed, it would save much disk space if we could hard link their files.

pkglink is a command line tool that searches directory tree that you specify for packages in your node_modules directories. When it finds matching packages of the same name and version that could share space, it hard links the files. As a safety precaution it checks many file attributes before considering them for linking (see full details later in this doc).

pkglink keeps track of packages it has seen on previous scans so when you run on new directories in the future, it can quickly know where to look for previous package matches. It double checks the previous packages are still the proper version, inode, and modified time before linking, but this prevents performing full tree scans any time you add a new project. Simply run pkglink once on your project tree and then again on new projects as you create them.

pkglink has been tested on Ubuntu, Mac OS X, and Windows. Hard links are supported on most modern disk formats with the exception of FAT and ReFS.

How much savings?

It all depends on how many matching packages you have on your system, but you will probably be surprised.

After running pkglink on my project directories, it found 128K packages and saved over 20GB of disk space.

Assumptions for use

The main assumption that enables hard linking is that you are not manually modifying your packages after install from the registry. This means that installed packages of the same name and version should generally be the same. Additional checks at the file level are used to verify matches (see filter criteria later in this doc) before selecting them for linking.

Before running any tool that can modify your file system it is always a good idea to have a current backup and sync code with your repositories.

Hard linking will not work on FAT and ReFS file systems. Hard links can only be made between files on the same device (drive). pkglink has been tested on Mac OS X (hpfs), Ubuntu (ext4), and Windows (NTFS).

If you had to recover from an unforeseen defect in pkglink, the recovery process is to simply delete your project's node_modules directory and perform npm install again.

Installation

npm install -g pkglink

Quick start

To find and hard link matching packages

To hard link packages just run pkglink with one or more directory trees that you wish it to scan and link.

pkglink DIR1 DIR2 ...

You will get output similar to this:

jeffbski-laptop:~$ pkglink ~/projects ~/working

pkgs: 128,383 saved: 5.11GB

The run above indicated that pkglink found 128K packages and after linking it saved over 5GB of disk space. (Actual savings was higher since I had run pkglink on a portion of the tree previously)

Dryrun - just output a list of matching packages

If you wish to see what packages pkglink would link you can use the --dryrun or -d option. pkglink will output matching packages that it would normally link but it will NOT perform any linking.

pkglink -d DIR1 DIR2 ...

The --dryrun output looks like:

jeffbski-laptop:~$ pkglink -d ~/working/expect-test

tmatch-2.0.1
  /Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/tmatch
  /Users/jeff/working/expect-test/node_modules/tmatch

object.entries-1.0.3
  /Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/object.entries
  /Users/jeff/working/expect-test/node_modules/object.entries

object-keys-1.0.11
  /Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/object-keys
  /Users/jeff/working/expect-test/node_modules/object-keys

# pkgs: 21 would save: 3.88MB

Generate link commands only

If you want to see exactly what it would be linking down to the file level, you can use the --gen-ln-cmds or -g option and it will output the equivalent bash commands for the hard links that it would normally create. It will not peform the linking. You can view this for correctness or even save it to a file and excute it with bash besides just running pkglink again wihout the -g option.

pkglink -g DIR1 DIR2 ...

The --gen-ln-cmds output looks like

jeffbski-laptop:~$ pkglink -g ~/working/expect-test

ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/define-properties/index.js" "/Users/jeff/working/expect-test/node_modules/define-properties/index.js"
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/expect/CHANGES.md" "/Users/jeff/working/expect-test/node_modules/expect/CHANGES.md"
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/expect/LICENSE.md" "/Users/jeff/working/expect-test/node_modules/expect/LICENSE.md"
ln -f "/Users/jeff/projects/pkglink/fixtures/projects/foo1/node_modules/es-abstract/Makefile" "/Users/jeff/working/expect-test/node_modules/es-abstract/Makefile"
# pkgs: 21 would save: 3.88MB

Full Usage

Usage: pkglink {OPTIONS} [dir] [dirN]

Description:

     pkglink - Space saving Node.js package hard linker

     pkglink recursively searches directories for Node.js packages
     installed in node_modules directories. It uses the package name
     and version to match up possible packages to share. Once it finds
     similar packages, pkglink walks through the package directory tree
     checking for files that can be linked. If each file's modified
     datetime and size match, it will create a hard link for that file
     to save disk space. (On win32, mtimes are inconsistent and ignored)

     It keeps track of modules linked in ~/.pkglink_refs to quickly
     locate similar modules on future runs. The refs are always
     double checked before being considered for linking. This makes
     it convenient to perform future pkglink runs on new directories
     without having to reprocess the old.

Standard Options:

 -c, --config CONFIG_PATH

  This option overrides the config file path, default ~/.pkglink

 -d, --dryrun

  Instead of performing the linking, just display the modules that
  would be linked and the amount of disk space that would be saved.

 -g, --gen-ln-cmds

  Instead of performing the linking, just generate link commands
  that the system would perform and output

 -h, --help

  Show this message

 -m, --memory MEMORY_MB

  Run with increased or decreased memory specified in MB, overrides
  environment variable PKGLINK_NODE_OPTIONS and config.memory
  The default memory used is 2560.

 -p, --prune

  Prune the refs file by checking all of the refs clearing out any
  that have changed

 -r, --refs-file REFS_FILE_PATH

  Specify where to load and store the link refs file which is used to
  quickly locate previously linked modules. Default ~/pkglink_refs.json

 -t, --tree-depth N

  Maximum depth to search the directories specified for packages
  Default depth: 0 (unlimited)

 -v, --verbose

  Output additional information helpful for debugging

If your machine has less than 2.5GB of memory you can use pkglink_low instead of pkglink and it will run with the normal 1.5GB memory default.

Config

The default config file path is ~/.pkglink unless you override it with the --config command line option. If this file exists it should be a JSON file with an object having any of the following properties.

  • refsFile - location of the JSON file used to track the last 5 references to each package it finds, default: ~/.pkglink_refs. This can also be overridden with the --refs-file command line argument.

  • concurrentOps - the number of concurrent operations allowed for IO operations, default: 4

  • consoleWidth - the number of columns in your console, default: 70

  • ignoreModTime - ignore the modification time of the files, default is true on Windows, otherwise false

  • memory - adjust the memory used in MB, default: 2560 (2.5GB). Can also be overridden by setting environment variable PKGLINK_NODE_OPTIONS=--max-old-space-size=1234 or by using the command line argument --memory.

  • minFileSize - the minimum size file to consider for linking in bytes, default: 0

  • refSize - number of package refs to keep in the refsFile which is used to find matching packages on successive runs, default: 5

  • tree-depth - the maximum depth to search the directories for packages, default: 0 (unlimited). Can also be overridden with --tree-depth command line option.

How do I know it is working?

Well if you check your disk space before and after a run it should be at least as much savings as pkglink indicates during a run. pkglink indicates the file size saved, but the actual savings can be greater due to the block size of the disk.

On systems with bash, you can also use `ls -ali nod

View on GitHub
GitHub Stars212
CategoryDevelopment
Updated7mo ago
Forks10

Languages

JavaScript

Security Score

87/100

Audited on Aug 19, 2025

No findings