Fsdb
A file system data base. Provides a thread-safe, process-safe Database class. Each entry is a separate file referenced by its relative path. Pure ruby and very light weight.
Install / Use
/learn @vjoel/FsdbREADME
What is FSDB?
FSDB is a file system data base. FSDB provides a thread-safe, process-safe Database class which uses the native file system as its back end and allows multiple file formats and serialization methods. Users access objects in terms of their paths relative to the base directory of the database. It's very light weight (the per-process state of a Database, excluding cached data, is essentially just a path string, and code size is very small, under 1K lines, all ruby).
FSDB stores data at nodes in the file system. The format can vary depending on type. For example, the default file type can be read into your program as a string, but files with the .obj suffix could be read using marshal, and files with the .yaml suffix as yaml. FSDB can easily be extended to recognize other formats, both binary and text. FSDB treats directories as collections and provides directory iterator methods. Files are the atoms of transactions: each file is saved and restored as a whole. References between objects stored in different files can be persisted as path strings.
FSDB has been tested on a variety of platforms and ruby versions, and is not known to have any problems. (On WindowsME/98/95, multiple processes can access a database unsafely, because flock() is not available on the platform.) See the Testing section for details.
FSDB does not yet have any indexing or querying mechanisms, and is probably missing many other useful database features, so it is not a general replacement for RDBs or OODBs. However, if you are looking for a lightweight, concurrent object store with reasonable performance and better granularity than PStore, in pure Ruby, with a Ruby license, take a look at FSDB. Also, if you are looking for an easy way of making an existing file tree look like a database, especially if it has heterogeneous file formats, FSDB might be useful.
Installation
To install FSDB as a gem:
$ gem install fsdb
Synopsis
Basic usage:
require 'fsdb'
db = FSDB::Database.new('/tmp/my-data')
db['recent-movies/myself'] = ["The King's Speech", "Harry Potter 7"]
puts db['recent-movies/myself'][1] # ==> "Harry Potter 7"
db.edit 'recent-movies/myself' do |movies|
movies << "The Muppets"
end
See also the examples.
Path names
Keys in the database are path strings, which are simply strings in the usual forward-slash delimited format, relative to the database's directory. There are some points to be aware of when using them to refer to database objects.
-
Paths to directories are formed in one of two ways:
-
explicitly, with a trailing slash, as in
db['foo/'] -
implicitly, as in
db['foo']iffoois already a directory, or as indb['foo/bar'], which createsfooif it did not already exist.
-
-
The root dir of the database is simply
/, its child directories are of the formfoo/and so on. The leading and trailing slashes are both optional. -
Objects can be stored in various formats, indicated by path name. A typical mapping might be:
file name | de-serialized data type --------- | --------------
foo.obj| Marshalled datafoo.txt| Stringfoo/| Directory (the contents is presented to the caller as a list of file and subdirectory paths that can be used in browse, edit, etc.)foo.yml| YAML data--see examples/yaml.rbNew formats, which correlate filename pattern with serialization behavior, can be defined and plugged in to databases. Each format has its own rules for matching patterns in the file name and recognizing the file. Patterns can be anything with a #=== method (such as a regex). See lib/fsdb/formats.rb examples of defining formats. For examples of associating formats with patterns, see examples/formats.rb.
-
Different notations for the same path, such as
/foo/bar foo/bar foo//bar foo/../foo/barwork correctly (they access the same objects), as do paths that denote hard or soft links, if supported on the platform.
Links are subject to the same naming convention as normal files with regard to format identification: format is determined by the path within the database used to access the object. Using a different name for a link can be useful if you need to access the file using two different formats (e.g., plain text via
foo.txtand tabular CSV or TSV data viafoo.tableor whatever). -
Accessing objects in a database is unaffected by the current dir of your process. The database knows it's own absolute path, and path arguments to the Database API are interpreted relative to that. If you want to work with a subdirectory of the database, and paths relative to that, use
Database#subdb:db = Database.new['/tmp'] db['foo/bar'] = 1 foo = db.subdb('foo') foo['bar'] # ==> 1 -
Paths that are outside the database (
../../zap) are allowed, but may or may not be desirable. Use#valid?andvalidatein util.rb to check for them. -
Directories are created when needed. So
db['a/b/c'] = 1creates two dirs and one file. -
Files beginning with
..are ignored by fsdb dir iterators, though they can still be accessed in transaction operators. Some such files (..fsdb.meta.<filename>) are used internally. All others not beginning with..fsdbare reserved for applications to use.The
..fsdb.meta.<filename>file holds a version number for<filename>, which is used along with mtime to check for changes (mtime usually has a precision of only 1 second). In the future, the file may also be used to hold other metadata. (The meta file is only created when a file is written to and does not need to be created in advance when using existing files as a FSDB.) -
util.rb has directory iterators, path globbing, and other useful tools.
Transactions
FSDB transactions are thread-safe and process-safe. They can be nested for larger-grained transactions; it is the user's responsibility to avoid deadlock.
FSDB is ACID (atomic/consistent/isolated/durable) to the extent that the underlying file system is. For instance, when an object that has been modified in a transaction is written to the file system, nothing persistent is changed until the final system call to write the data to the OS's buffers. If there is an interruption (e.g., a power failure) while the OS flushes those buffers to disk, data will not be consistent. If this bothers you, you may want to use a journaling file system. FSDB does not need to do its own journaling because of the availability of good journaling file systems.
There are two kinds of transactions:
-
A simple transfer of a value, as in
db['x']anddb['x'] = 1.Note that a sequence of such transactions is not itself a transaction, and can be affected by other processes and threads.
db['foo/bar'] = [1,2,3] db['foo/bar'] += [4] # This line is actually 2 transactions db['foo/bar'][-1]It is possible for the result of these transactions to be
4. But, if other threads or processes are scheduled during this code fragment, the result could be a completely different value, or the code could raise an method_missing exception because the object at the path has been replaced with one that does not have the+method or the[ ]method. The four operations are each atomic by themselves, but the sequence is not.Note that changes to a database object using this kind of transaction cannot be made using destructive methods (such as
<<) but only by assignments of the formdb[<path>] = <data>. Note that+=and similar "assignment operators" can be used but are not atomic, becausedb[<path>] += 1is really
db[<path>] = db[<path>] + 1So another thread or process could change the value stored at
pathwhile the addition is happening. -
Transactions that allow more complex interaction:
path = 'foo/bar' db[path] = [1,2,3] db.edit path do |bar| bar += [4] bar[-1] endThis guarantees that, if the object at the path is still
[1, 2, 3]at the time of theeditcall, the value returned by the transaction will be 4.Simply put,
editallows exclusive write access to the object at the path for the duration of the block. Other threads or processes that use FSDB methods to read or write the object will be blocked for the duration of the transaction. There is alsobrowse, which allows read access shared by any number of threads and processes, andreplace, which also allows exclusive write access likeedit.The differences between
replaceandeditare:-
replace's block must return the new value, whereasedit's block must operate (destructively) on the block argument to produce the new value. (The new value inreplace's block can be a modification of the old value, or an entirely different object.) -
replaceyieldsnilif there is no preexisting object, whereaseditcallsdefault_edit(which by default callsobject_missing, which by default throws MissingObjectError). -
editis useless over a drb connection, since is it operating on a Marshal.dump-ed copy. Usereplacewith drb.
You can delete an object from the database (and the file system) with the
deletemethod, which returns the object. Also,deletecan take a block, which can examine the object and abort the transaction to prevent deletion. (The delete transaction has the same exclusion semantics as edit and replace.)The
fetchandinsertmethods are aliased with[ ]and[ ]=.When the object at the path specified in a transaction does not exist in the file system, the different transaction methods behave differently:
browsecalls `default_brows
-
Related Skills
feishu-drive
351.8k|
things-mac
351.8kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
351.8kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
postkit
PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver
