What is FSDB?

FSDB is a file system data base. FSDB provides a thread-safe, process-safe Database class which uses the native file system as its back end and allows multiple file formats and serialization methods. Users access objects in terms of their paths relative to the base directory of the database. It's very light weight (the per-process state of a Database, excluding cached data, is essentially just a path string, and code size is very small, under 1K lines, all ruby).

FSDB stores data at nodes in the file system. The format can vary depending on type. For example, the default file type can be read into your program as a string, but files with the .obj suffix could be read using marshal, and files with the .yaml suffix as yaml. FSDB can easily be extended to recognize other formats, both binary and text. FSDB treats directories as collections and provides directory iterator methods. Files are the atoms of transactions: each file is saved and restored as a whole. References between objects stored in different files can be persisted as path strings.

FSDB has been tested on a variety of platforms and ruby versions, and is not known to have any problems. (On WindowsME/98/95, multiple processes can access a database unsafely, because flock() is not available on the platform.) See the Testing section for details.

FSDB does not yet have any indexing or querying mechanisms, and is probably missing many other useful database features, so it is not a general replacement for RDBs or OODBs. However, if you are looking for a lightweight, concurrent object store with reasonable performance and better granularity than PStore, in pure Ruby, with a Ruby license, take a look at FSDB. Also, if you are looking for an easy way of making an existing file tree look like a database, especially if it has heterogeneous file formats, FSDB might be useful.

Installation

To install FSDB as a gem:

$ gem install fsdb

Synopsis

Basic usage:

require 'fsdb'

db = FSDB::Database.new('/tmp/my-data')

db['recent-movies/myself'] = ["The King's Speech", "Harry Potter 7"]
puts db['recent-movies/myself'][1]              # ==> "Harry Potter 7"

db.edit 'recent-movies/myself' do |movies|
  movies << "The Muppets"
end

Path names

Keys in the database are path strings, which are simply strings in the usual forward-slash delimited format, relative to the database's directory. There are some points to be aware of when using them to refer to database objects.

Paths to directories are formed in one of two ways:
- explicitly, with a trailing slash, as in db['foo/']
- implicitly, as in db['foo'] if foo is already a directory, or as in db['foo/bar'], which creates foo if it did not already exist.
The root dir of the database is simply /, its child directories are of the form foo/ and so on. The leading and trailing slashes are both optional.
Objects can be stored in various formats, indicated by path name. A typical mapping might be:

file name | de-serialized data type --------- | -------------- foo.obj | Marshalled data foo.txt | String foo/ | Directory (the contents is presented to the caller as a list of file and subdirectory paths that can be used in browse, edit, etc.) foo.yml | YAML data--see examples/yaml.rb

New formats, which correlate filename pattern with serialization behavior, can be defined and plugged in to databases. Each format has its own rules for matching patterns in the file name and recognizing the file. Patterns can be anything with a #=== method (such as a regex). See lib/fsdb/formats.rb examples of defining formats. For examples of associating formats with patterns, see examples/formats.rb.
Different notations for the same path, such as
```
/foo/bar
foo/bar
foo//bar
foo/../foo/bar
```
work correctly (they access the same objects), as do paths that denote hard or soft links, if supported on the platform.

Links are subject to the same naming convention as normal files with regard to format identification: format is determined by the path within the database used to access the object. Using a different name for a link can be useful if you need to access the file using two different formats (e.g., plain text via foo.txt and tabular CSV or TSV data via foo.table or whatever).
Accessing objects in a database is unaffected by the current dir of your process. The database knows it's own absolute path, and path arguments to the Database API are interpreted relative to that. If you want to work with a subdirectory of the database, and paths relative to that, use Database#subdb:
```
db = Database.new['/tmp']
db['foo/bar'] = 1
foo = db.subdb('foo')
foo['bar'] # ==> 1
```
Paths that are outside the database (../../zap) are allowed, but may or may not be desirable. Use #valid? and validate in util.rb to check for them.
Directories are created when needed. So db['a/b/c'] = 1 creates two dirs and one file.
Files beginning with .. are ignored by fsdb dir iterators, though they can still be accessed in transaction operators. Some such files (..fsdb.meta.<filename>) are used internally. All others not beginning with ..fsdb are reserved for applications to use.

The ..fsdb.meta.<filename> file holds a version number for <filename>, which is used along with mtime to check for changes (mtime usually has a precision of only 1 second). In the future, the file may also be used to hold other metadata. (The meta file is only created when a file is written to and does not need to be created in advance when using existing files as a FSDB.)
util.rb has directory iterators, path globbing, and other useful tools.

Transactions

FSDB transactions are thread-safe and process-safe. They can be nested for larger-grained transactions; it is the user's responsibility to avoid deadlock.

FSDB is ACID (atomic/consistent/isolated/durable) to the extent that the underlying file system is. For instance, when an object that has been modified in a transaction is written to the file system, nothing persistent is changed until the final system call to write the data to the OS's buffers. If there is an interruption (e.g., a power failure) while the OS flushes those buffers to disk, data will not be consistent. If this bothers you, you may want to use a journaling file system. FSDB does not need to do its own journaling because of the availability of good journaling file systems.

There are two kinds of transactions:

A simple transfer of a value, as in db['x'] and db['x'] = 1.

Note that a sequence of such transactions is not itself a transaction, and can be affected by other processes and threads.
```
db['foo/bar'] = [1,2,3]
db['foo/bar'] += [4]      # This line is actually 2 transactions
db['foo/bar'][-1]
```
It is possible for the result of these transactions to be 4. But, if other threads or processes are scheduled during this code fragment, the result could be a completely different value, or the code could raise an method_missing exception because the object at the path has been replaced with one that does not have the + method or the [ ] method. The four operations are each atomic by themselves, but the sequence is not.

Note that changes to a database object using this kind of transaction cannot be made using destructive methods (such as <<) but only by assignments of the form db[<path>] = <data>. Note that += and similar "assignment operators" can be used but are not atomic, because
```
db[<path>] += 1
```
is really
```
db[<path>] = db[<path>] + 1
```
So another thread or process could change the value stored at path while the addition is happening.
Transactions that allow more complex interaction:
```
path = 'foo/bar'
db[path] = [1,2,3]

db.edit path do |bar|
  bar += [4]
  bar[-1]
end
```
This guarantees that, if the object at the path is still [1, 2, 3] at the time of the edit call, the value returned by the transaction will be 4.

Simply put, edit allows exclusive write access to the object at the path for the duration of the block. Other threads or processes that use FSDB methods to read or write the object will be blocked for the duration of the transaction. There is also browse, which allows read access shared by any number of threads and processes, and replace, which also allows exclusive write access like edit.

The differences between replace and edit are:
- replace's block must return the new value, whereas edit's block must operate (destructively) on the block argument to produce the new value. (The new value in replace's block can be a modification of the old value, or an entirely different object.)
- replace yields nil if there is no preexisting object, whereas edit calls default_edit (which by default calls object_missing, which by default throws MissingObjectError).
- edit is useless over a drb connection, since is it operating on a Marshal.dump-ed copy. Use replace with drb.
You can delete an object from the database (and the file system) with the delete method, which returns the object. Also, delete can take a block, which can examine the object and abort the transaction to prevent deletion. (The delete transaction has the same exclusion semantics as edit and replace.)

The fetch and insert methods are aliased with [ ] and [ ]=.

When the object at the path specified in a transaction does not exist in the file system, the different transaction methods behave differently:
- browse calls `default_brows

Fsdb

Install / Use

README

What is FSDB?

Installation

Synopsis

Path names

Transactions

Related Skills