SkillAgentSearch skills...

Wzd

wZD is a powerful storage and database server, designed for big data storage systems with small and large files for mixed use and dramatically reduces count of small files for extend abilities any normal or clustered POSIX compatible file systems.

Install / Use

/learn @eltaline/Wzd
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img src="/images/logo.png" alt="wZD Logo"/>

Документация на русском: https://github.com/eltaline/wzd/blob/master/README-RUS.md

wZD is a server written in Go language that uses a <a href=https://github.com/eltaline/bolt>modified</a> version of the BoltDB database as a backend for saving and distributing any number of small and large files, NoSQL keys/values, in a compact form inside micro Bolt databases (archives), with distribution of files and values in BoltDB databases depending on the number of directories or subdirectories and the general structure of the directories. Using wZD can permanently solve the problem of a large number of files on any POSIX compatible file system, including a clustered one. Outwardly it works like a regular WebDAV server.

...and billions of files will no longer be a problem.

<img align="center" src="/images/wzd-scheme.png" alt="wZD Scheme"/>

Architecture

<img align="center" src="/images/wzd-arch.png" alt="wZD Arch"/>

Current stable version: 1.2.1

  • Update to Go 1.14
  • Update to Iris 12.1.8
  • Transition to Go module support

Features

  • Multi threading
  • Multi servers for fault tolerance and load balancing
  • Complete file and value search
  • Supports HTTPS and IP authorization
  • Supported HTTP methods: GET, HEAD, OPTIONS, PUT, POST and DELETE
  • Manage read and write behavior through client headers
  • Support for customizable virtual hosts
  • Linear scaling of read and write using clustered file systems
  • Effective methods of reading and writing data
  • Supports CRC data integrity when writing or reading
  • Support for Range and Accept-Ranges, If-None-Match and If-Modified-Since headers
  • Store and share 10,000 times more files than there are inodes on any POSIX compatible file system, depending on the directory structure
  • Support for adding, updating, deleting files and values, and delayed compaction/defragmentation of Bolt archives
  • Allows the server to be used as a NoSQL database, with easy sharding based on the directory structure
  • Bolt archives support for selective reading of a certain number of bytes from a value
  • Easy sharding of data over thousands or millions of Bolt archives based on the directory structure
  • Mixed mode support, with ability to save large files separately from Bolt archives
  • Semi-dynamic buffers for minimal memory consumption and optimal network performance tuning
  • Includes multi threaded <a href=https://github.com/eltaline/wza>wZA</a> archiver for migrating files without stopping the service

Incompatibilities

  • Multipart is not supported
  • There is no native protocol and drivers for different programming languages
  • There is no way to transparently mount the structure as a file system via WebDAV or FUSE
  • For security reasons, the server does not support recursive deletion of directories
  • The server does not allow uploading files to the root directory of the virtual host (applies only to Bolt archives)
  • Directories and subdirectories of virtual hosts do not allow other people's files with the .bolt extension
  • Data disks cannot simply be transferred from the Little Endian system to the Big Endian system, or vice versa

Multipart will not be supported, since a strict record of a specific amount of data is required so that underloaded files do not form and other problems arise.

Use only binary data transfer protocol to write files or values.

Requirements

  • Operating Systems: Linux, BSD, Solaris, OSX
  • Architectures: amd64, arm64, ppc64 and mips64, with only amd64 tested
  • Supported Byte Order: Little or Big Endian
  • Any POSIX compatible file system with full locking support (preferred clustered MooseFS)

Recommendations

  • It is recommended to upload large files directly to the wZD server, bypassing reverse proxy servers

Real application

Our cluster used has about 250,000,000 small pictures and 15,000,000 directories on separate SATA drives. It utilizes the MooseFS cluster file system. This works well with so many files, but at the same time, its Master servers consume 75 gigabytes of RAM, and since frequent dumps of a large amount of metadata occur, this is bad for SSD disks. Accordingly, there is also a limit of about 1 billion files in MooseFS itself with the one replica of each file.

With a fragmented directory structure, an average of 10 to 1000 files are stored in most directories. After installing wZD and archiving the files in Bolt archives, it turned out about 25 times less files, about 10,000,000. With proper planning of the structure, a smaller number of files could have been achieved, but this is not possible if the already existing structure remains unchanged. Proper planning would result in very large inodes savings, low memory consumption of the cluster FS, significant acceleration of the MooseFS operation itself, and a reduction in the actual space occupied on the MooseFS cluster FS. The fact is, MooseFS always allocates a block of 64 KB for each file, that is, even if a file has a size of 3 KB, will still be allocated 64 KB.

The multi threaded <a href=https://github.com/eltaline/wza>wZA</a> archiver has already been tested on real data.

Our cluster used (10 servers) is an Origin server installed behind a CDN network and served by only 2 wZD servers.

<p align="center"> <img align="center" src="/images/reduction-full.png"/> </p>

Mixed use

The wZD server was designed for mixed use. One can write not only ordinary files, but even html or json generated documents, and one can even simply use NoSQL as a sharding database consisting of a large number of small BoltDB databases, and carry out all sharding through the structure of directories and subdirectories.

Performance tests

Testing shows the read or write difference between working with regular files and with Bolt archives. The writeintegrity and readintegrity options are enabled; that is, when writing or reading files in Bolt archives, CRC is used.

Important: The time in the tests is indicated for full GET or PUT requests, and the full write or read of HTTP files by the client is included in these milliseconds.

Tests were carried out on SSD disks, since on SATA disks the tests are not very objective, and there is no clear difference between working with Bolt archives and ordinary files.

The test involved 32 KB, 256 KB, 1024 KB, 4096 KB, and 32768 KB files.

  • <b>GET 1000 files and GET 1000 files from 1000 Bolt archives</b>
<img align="center" src="/images/get.png"/>
  • <b>PUT 1000 files and PUT 1000 files in 1000 Bolt archives</b>
<img align="center" src="/images/put.png"/>

As can be seen from the graphs, the difference is practically insignificant.

Below is a more visual test done with files of 32 megabytes in size. In this case, writing to Bolt archives becomes slower compared to writing to regular files. Although this is a count, writing 32 MB for 250ms is generally quite fast. Reading such files works quite quickly, and if one wants to store large files in Bolt archives, and the write speed is not critical, such use is allowed but not recommended, and not more than 32 MB per uploaded file.

<b>GET 32M 1000 files and files from Bolt archives and PUT 32M 1000 files and files in Bolt archives</b>

<img align="center" src="/images/get-put-32M.png"/>

Documentation

Installation

Install packages or binaries

  • <a href=https://github.com/eltaline/wzd/releases>Download</a>
systemctl enable wzd && systemctl start wzd

Install docker image

  • Docker image automatically recursively change UID and GID in mounted /var/storage
docker run -d --restart=always -e bindaddr=127.0.0.1:9699 -e host=localhost -e root=/var/storage \
-v /var/storage:/var/storage --name wzd -p 9699:9699 eltaline/wzd

More advanced option:

docker run -d --restart=always -e bindaddr=127.0.0.1:9699 -e host=localhost -e root=/var/storage \
-e upload=true -e delete=true -e compaction=true -e search=true -e fmaxsize=1048576 \
-e writeintegrity=true -e readintegrity=true -e args=false \
-e getbolt=false -e getkeys=true -e getinfo=true -e getsearch=true \
-e getrecursive=true -e getjoin=true -e getvalue=true -e getcount=true -e getcache=true \
-e nonunique=false -e cctrl=2592000 -e delbolt=false -e deldir=false \
-v /var/storage:/var/storage --name wzd -p 9699:9699 eltaline/wzd

All ENV default parameters can be viewed here: <a href=/Dockerfile>Dockerfile</a>

  • Enable rotation on the host system for containers:

Put in /etc/logrotate.d/wzd:

/var/lib/docker/containers/*/*.log {
        rotate 7
        daily
        compress
        missingok
        delaycompress
        copytruncate
}

Configuring and using wZD server

For security reasons, if wZD is installed from deb or rpm packages, or from binaries, upload and delete options are disabled by default in the configuration file /etc/wzd/wzd.conf in the localhost virtual host.

In most cases it is enough to use the default configuration file. A full description of all product parameters is available here: <a href="/OPTIONS.md">Options</a>

General methods

Downloading file (the existing normal file is downloaded first and not the one in the Bolt archive)

curl -o test.jpg http://localhost/test/test.jpg

Downloading file from the file (forced)

curl -o test.jpg -H "FromFile: 1" http://localhost/test/test.jpg

Downloading file from the Bolt archive (forced)

curl -o test.jpg -H "FromArchive: 1" http://localhost/test/test.jpg

Downloading the whole Bolt archive from the directory (if the server parameter getbolt = true)

curl -o test.bolt http://localhost/test/test.bolt

Uploading file to the directory depending on the fmaxszie parameter

curl -X PUT --data-binary @test.jpg http://localhost/test/test.jpg

Uploading file to the regular file

curl -X PUT -H "File: 1" --data-binary @test.jpg http

Related Skills

View on GitHub
GitHub Stars84
CategoryData
Updated7mo ago
Forks10

Languages

Go

Security Score

87/100

Audited on Aug 28, 2025

No findings