SkillAgentSearch skills...

Mwoffliner

MediaWiki scraper: all your wiki articles in one highly compressed ZIM file

Install / Use

/learn @openzim/Mwoffliner
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MWoffliner

MWoffliner is a tool for creating a local offline HTML snapshot of any online MediaWiki instance. It scrapes all articles (or a selection if specified) and creates the corresponding ZIM file. While primarily targeted for Wikimedia projects like Wikipedia and Wiktionary, MWoffliner also supports any recent MediaWiki instance (version 1.27+), though instances with custom skins or highly unusual configurations may have limitations.

Read CONTRIBUTING.md to learn more about MWoffliner development.

User help is available in the FAQ.

NPM

npm node Docker Build Status codecov CodeFactor License Join Slack

Features

  • Scrape with or without image thumbnails
  • Scrape with or without audio/video multimedia content
  • S3 cache (optional)
  • Image size optimization and WebP conversion
  • Scrape all articles in namespaces or title list based
  • Specify additional/non-main namespaces to scrape

Run mwoffliner --help to see all available options.

Prerequisites

  • Docker (or Docker-based engine)
  • amd64 architecture

Installation

The recommended way to install and run mwoffliner is using the pre-built Docker container:

docker pull ghcr.io/openzim/mwoffliner
<details> <summary>Run software locally / Build from source</summary>

Prerequisites for local execution

  • *NIX Operating System (GNU/Linux, macOS, etc.)

  • Redis — in-memory data store

  • Node.js version 24 (we support only one single Node.js version; other versions might work or might not)

  • Libzim — C++ library for creating ZIM files (automatically downloaded on GNU/Linux & macOS)

  • Various build tools which are probably already installed on your machine:

    • libjpeg-dev — JPEG image processing
    • libglu1 — OpenGL utility library
    • autoconf — automatic configuration system
    • automake — Makefile generator
    • gcc — C compiler

    (These packages are for Debian/Ubuntu systems)

An online MediaWiki instance with its API available.

Installation methods

Build your own container

  1. Clone the repository locally:

    git clone https://github.com/openzim/mwoffliner.git && cd mwoffliner
    
  2. Build the image:

    docker build . -f docker/Dockerfile -t ghcr.io/openzim/mwoffliner
    

Run the software locally using NPM

[!WARNING] Local installation requires several system dependencies (see above). Using the Docker image is strongly recommended to avoid setup issues.

Setting up MWoffliner locally for development can be tricky due to several dependencies and version requirements. Follow these steps carefully to avoid common errors.

1. Node.js Version

MWoffliner requires Node.js 24 (other versions may fail).

Compatible Node 24 ranges: >=24 <24.6 or >=24.7 <25.

Check your version:

node -v

If your version does not match, use nvm to install the correct Node.js version.

2. libzim Dependency

MWoffliner depends on @openzim/libzim, which requires the C++ libzim library.

  • On Linux/macOS, MWoffliner can download libzim automatically.
  • On Windows, you must install libzim manually because there are no prebuilt binaries. See the libzim installation guide for details.
3. Compiler Requirements (Windows)

Node 24 on Windows officially supports Visual Studio 2019 (v16) or Visual Studio 2022 (v17).

Ensure C++ build tools are installed and environment variables are set correctly. See Windows Setup for node-gyp for detailed instructions.

4. Node-gyp

MWoffliner uses node-gyp, which enforces strict checks for Node and compiler versions. Make sure you have:

Additional troubleshooting steps if errors persist:
  1. Clear npm cache — a corrupted cache can cause cryptic install failures:

    npm cache clean --force
    
  2. Delete node_modules and reinstall — stale or partially installed dependencies are a common source of errors:

    rm -rf node_modules package-lock.json
    npm install
    
  3. Check that all environment variables are set — especially on Windows, PATH, INCLUDE, and LIB must point to the correct Visual Studio and libzim directories. Reopen your terminal after installing new tools.

  4. Verify Redis is running before starting MWoffliner — MWoffliner will fail immediately if it cannot connect to Redis:

    redis-cli ping   # expected output: PONG
    
  5. Run npm install with verbose logging to see exactly where it fails:

    npm install --verbose
    
5. Common Errors & Troubleshooting

| Error | Cause | Solution | |-------|-------|----------| | Node.js version error | Node.js version incompatible | Install Node 24 with nvm | | Cannot find module @openzim/libzim | libzim not installed | Follow libzim installation guide; Windows users must install manually | | node-gyp rebuild failed | Wrong Node or compiler version | Check Node.js version, Visual Studio version, Python 3.x | | zim/archive.h not found | C++ headers missing | Install libzim system-wide, verify include paths |

[!NOTE] Even with these steps, other setup errors may occur. Using Docker is strongly recommended for a smoother experience.

Installation via NPM
npm i -g mwoffliner

[!WARNING] You might need to run this command with the sudo command, depending on how your npm / OS is configured. npm permission checking can be a bit annoying for newcomers. Please read the npm script documentation if you encounter issues.

</details>

Usage

Using Docker (Recommended)

# Get help
docker run -v $(pwd)/out:/out -ti ghcr.io/openzim/mwoffliner mwoffliner --help
# Create a ZIM for https://bm.wikipedia.org
docker run -v $(pwd)/out:/out -ti ghcr.io/openzim/mwoffliner \
       mwoffliner --mwUrl=https://bm.wikipedia.org --adminEmail=foo@bar.net
<details> <summary>Using NPM / Local Install</summary>
# Get help
mwoffliner --help
# Create a ZIM for https://bm.wikipedia.org
mwoffliner --mwUrl=https://bm.wikipedia.org --adminEmail=foo@bar.net
</details>

To use MWoffliner with an S3 cache, provide an S3 URL:

--optimisationCacheUrl="https://wasabisys.com/?bucketName=my-bucket&keyId=my-key-id&secretAccessKey=my-sac"

Contribute

If you've retrieved the MWoffliner source code (e.g., via a git clone), you can install and run it locally with your modifications:

npm i
npm run mwoffliner -- --help

Detailed contribution documentation and guidelines are available.

API

MWoffliner provides an API and can be used as a Node.js library. Here's a stub example for your index.mjs file:

import * as mwoffliner from 'mwoffliner';

const parameters = {
  mwUrl: "https://es.wikipedia.org",
  adminEmail: "foo@bar.net",
  verbose: true,
  format: "nopic",
  articleList: "./articleList"
};

mwoffliner.execute(parameters); // returns a Promise

Background

Complementary information about MWoffliner:

  • MediaWiki software is used by thousands of wikis, the most famous ones being the Wikimedia ones, including Wikipedia.
  • MediaWiki is a PHP wiki runtime engine.
  • Wikitext is the markup language that MediaWiki uses.
  • MediaWiki parser converts Wikitext to HTML, which displays in your browser.
  • Read the scraper functional architecture for more details.

License

GPLv3 or later, see LICENSE for more details.

Acknowledgements

This project received funding through NGI Zero Core, a fund established by NLnet with financial support from the European Commission's Next Generation Internet

View on GitHub
GitHub Stars444
CategoryDevelopment
Updated1d ago
Forks114

Languages

TypeScript

Security Score

100/100

Audited on Apr 1, 2026

No findings