Mitsuba
Lightweight 4chan board archive software (like Foolfuuka) written in Rust
Install / Use
/learn @reasv/MitsubaREADME
Mitsuba
Mitsuba is a lightweight 4chan board archiver written in Rust. It continuously monitors a set of 4chan boards, fetches new posts, thumbnails, and optionally full images, and makes them available through an imageboard web UI as well as a read-only JSON API that is compatible with 4chan's official API.
Mitsuba's main goal is to be very lightweight in terms of CPU and memory usage, and Rust helps accomplish this goal. Mitsuba is designed to be easy to deploy and doesn't currently have any runtime dependencies besides needing a Postgresql database.
The intended usage is self-hosting an archive on a low budget, however the Actix based web UI and API are quite performant, and should be capable of scaling to any amount of readers, with much lower resource consumption compared to competing frameworks in other languages, and without the possible latency spikes caused by garbage collection.
Mitsuba does not support "ghost posting" as it's not an imageboard engine. This could be supported in the future with some work (mostly on the front-end) but it requires actual administration tools and an accounts system, neither of which are actually present. What few options Mitsuba has are set through the CLI and environment variables (see example.env)
Features
- Very quick and easy to set up
- No runtime dependencies except a running Postgresql database
- Single static executable, all assets and dependencies embedded
- Extremely lightweight, can run on a budget VPS
- Fully integrated: Mitsuba archives boards, threads, and images, serves them through a JSON API and Web UI all in one
- Easy administration with a few CLI commands
- Configurable rate limiter
- Optional full image download setting per-board
- Web UI has a field that lets you jump to any post by typing its ID and selecting the board
- Sha256 image deduplication, doesn't rely on 4chan's MD5 hash
- Support for S3-compatible image storage backend
- Reduced database writes: the hash of every post is kept in memory, if a post hasn't changed, no DB operation is performed
- Can find an image from its original 4chan URL.
https://i.4cdn.org/po/1546293948883.pngcan be found on mitsuba at/po/1546293948883.png - Can be configured to load balance requests to 4chan between multiple proxies with different weights, to bypass rate limits
- Optional full text search through postgres. You can enable or disable postgres full text search indexing on a per board basis to avoid the performance hit.
- Supports basic but granular moderation through hide command, allowing you to entirely hide a post, only hide its comment field, or hide its image.
- Can delete an image associated with a post from disk and blacklist it through purge-image, so it will never be saved again
- Can remove all archive contents belonging to a particular board if you no longer want it (purge command), or just the full images, keeping thumbnails and posts
There are some important features missing:
- No "ghost posting" or posting of any kind. Read only archive.
- No admin UI, administration CLI only
- No account system
Dependencies
You need to have a Postgresql instance available somewhere Mitsuba can reach it with the DATABASE_URL env variable provided.
If you get an error about the server not accepting any more connections on startup, you might need to increase your database's max_connections configuration.
Quick Setup
export DATABASE_URL="postgres://user:password@127.0.0.1/mitsuba"
export RUST_LOG=mitsuba=info # Optional, to get feedback
mitsuba add po
mitsuba start
After some threads have been archived, you can visit http://127.0.0.1:8080/po/1 to see your new archive for the /po/ board.
This will only get posts and thumbnails but not full images.
Use mitsuba add po --full-images true to change that.
mitsuba add po --full-images true --full-text-search true
In order to also enable full-text search.
Mitsuba will not attempt to fetch images for a post it has already archived previously, unless it visits the post again and detects it as changed in some way. Moreover, if an image or thumbnail was already fetched for a particular post, mitsuba will never attempt to fetch the image or thumbnail or both, depending on the case, for that post again.
Mitsuba does not check whether images are still present on disk or object storage and trusts the database instead. During the archival process, image files/objects are only written to, never read from
However, if you enable full images on a board you were already archiving with thumbnails only, this will trigger the creation of an image fetch job for every post on that board which has not yet been deleted from 4chan. Eventually, all available full images should get fetched. There is a race condition where if a specific post was being processed right when you enabled full images on the board, that post would never end up having its full image downloaded. This rare edge case can be prevented by having full images on from the start when you first add a board, or by shutting mitsuba down before enabling full images on an existing board.
mitsuba add BOARD and mitsuba remove BOARD are safe to use while mitsuba is running. But in that scenario, they will not take effect until the current board archive cycle is completed, if they're adding a new board. Enabling full images on an existing board is effective immediately.
Setup Guide
Mitsuba is designed to be easy and quick to set up.
Download a binary build for your system, or clone the repository and build your executable. Docker containers are available and there is a docker-compose.yml file.
Currently all static files mitsuba uses are embedded in the executable. You should be able to run Mitsuba's binary in an empty folder.
Some options need to be passed as environment variables. Mitsuba uses dotenv, which means that instead of setting the environment variables yourself, you can specify their values in a file called .env which must be in the directory you are running the mitsuba executable in. Mitsuba will read this file and apply the values specified to the corresponding environment variables.
You will find an example.env file in this repository. Copy it and rename it to just .env, then edit its configuration as needed.
There are a couple of settings that you need to be aware of:
-
DATABASE_URL: you need to specify the connection URI for your Postgresql instance here. In the example file it's
postgres://user:password@127.0.0.1/mitsuba. Replace with the correct username and password, as well as address, port and database name. The user needs to either be the owner of the specified database (in this case calledmitsuba) if it already exists, or you need to create it yourself. -
DATA_ROOT: the directory in which to save the image files. This is an optional setting. If you leave it out entirely, Mitsuba will just create a "data" folder in the current working directory, and use that.
-
RUST_LOG: the recommended setting for this is "mitsuba=info". Controls the mitsuba's log output. Instead of setting this, you can create a
log4rs.ymlfile (or copy the example in this repository) in order to get more fine grained control over logging.
We will refer to the executable as mitsuba in this guide from now on, but on Windows® it is of course called mitsuba.exe .
Run mitsuba help to get a quick usage guide with a list of possible commands:
$ mitsuba help
mitsuba 1.0
High performance board archiver software. Add boards with `add`, then start with `start` command.
See `help add` and `help start`
USAGE:
mitsuba.exe <SUBCOMMAND>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
add Add a board to the archiver, or replace its settings.
help Prints this message or the help of the given subcommand(s)
list List all boards in the database and their current settings. Includes
disabled ('removed') boards
remove Stop and disable archiver for a particular board. Does not delete any
data. Archiver will only stop after completing the current cycle.
start Start the archiver, API, and webserver
start-read-only Start in read only mode. Archivers will not run, only API and webserver
You can use mitsuba help COMMAND in order to get more detailed information on each command and the possible options:
$ mitsuba help add
mitsuba-add
Add a board to the archiver, or replace its settings.
USAGE:
mitsuba add [OPTIONS] <NAME>
ARGS:
<NAME>
Board name (eg. 'po')
OPTIONS:
--full-images <FULL_IMAGES>
(Optional) If false, will only download thumbnails for this board. If true, thumbnails
and full images. Default is false.
-h, --help
Print help information
This command will add a database entry for the specified board and its settings. An archiver will start for this board and any other boards you added with add as soon as the next archive cycle begins or mitsuba is (re)started.
As you can see there is only one option in terms of board specific settings.
full-images=truewill make the archiver download full images (and files) for that board. The default isfalse, meaning only thumbnails will be downloaded.
Note that any time you use add on a board that was already added before, it enables that board if it was disabled with remove, and replaces the configuration for that board with the values you specify, or the defaults. The previous settings are ignored. So if you had full image download enabled on /po/ previously with a wait time of 100, and then do mitsuba add po, the settings will be reset to the default of no full image download, and wait time of 10.
So let's add our first board, /po/ is a good example because it's the slowes
