Idx
maps, sets and vectors with on-demand secondary indexes.
Install / Use
/learn @wotbrew/IdxREADME
idx
idx lets you treat your collection like a database.
This library provides clojure(script) data structures (maps, sets and vectors) that allow for secondary indexes. The indexes offer alternative fast access paths to their elements.
Supports both Clojure and ClojureScript.
(require '[com.wotbrew.idx :as idx])
(def indexed-vector (idx/auto [{:name "fred"} {:name "ethel"}]))
indexed-vector
;; =>
[{:name "fred"} {:name "ethel"}] ; the indexed structure will behave (and be printed!) like the original collection.
(idx/lookup indexed-vector :name "ethel")
;; =>
({:name "ethel"})
Latest version
lein [com.wotbrew/idx "0.1.3"] or deps com.wotbrew/idx {:mvn/version "0.1.3"}.
Features
- Wrappers for vectors, sets and maps.
- Index elements on demand by any property, such as clojure functions, keywords, paths, or composites thereof. See reference
- Can choose automatic indexing, where indexes are created and cached transparently as you query the collection.
- Indexes are maintained incrementally as you modify your collection with functions -
conj,assocand so on. - Supports composite, nested, unique and sorted indexes.
- Query functions also work on normal collections so you can 'upgrade' them with indexes when you profile and find where you need them.
- Good for re-frame, add indexes without changing the shape of your data.
Caveats
- If you are only querying once or twice, it is almost always faster to just use sequence functions than build indexes.
- For small n indexes are very expensive. Use it to flatten quadratic joins, do not use it to replace all sequence filtering.
- If you index by function, that function must absolutely be pure, otherwise all bets are off. Similar to comparators and (sorted-set-by).
- Each index uses memory, so we need to make sure we consider that. This is particularly important to think about when using automatic-indexing.
- When indexing by function, index identity is function identity - so you must be careful with lambdas and closures.
Why
It is common to have the problem of taking repeated linear lookups (as you might accomplish with filter) to a sub-linear one. Typically what happens
is you use group-by or write some code to transform your collection into some kind of map to allow for fast lookup of its elements.
There are 3 problems that idx tries to solve:
- proliferation of
-by-thisor-by-thattype locals (or worse args, or keys) that only serve as fast paths to your actual data. - allowing for profiler driven optimisation without massively restructuring the code.
- index invalidation as data changes
Lets say we have a large collection of orders, and a large collection of items, I want to join the groups of items for each order under the :items key.
Typical solution would look like this:
(let [item-idx (group-by :order-id items)]
(for [order orders]
(assoc order :items (item-idx (:order-id order)))))
Now this case is not too egregious, however in real code it is tempting to pass your indexes between functions, introducing accidental complexity to their signatures (the args would not be there if it were not for insufficient data structures). If you do not do that - you are often loosing gains by not sharing them as you repeatedly construct expensive indexes.
Furthermore we often have to structure our code around the indexes, they are not easy to add and remove independent of the usages.
If you are manually creating indexes and have to change your data, then you have to do that yourself.
There are many solutions in this space, but I have not seen one that does not convert your data into something else. You can look at this library as a drop-in you can use to supercharge your existing collections.
idx is not trying to compete with databases like datascript. It is intending to compete with a proliferation of manual group-by, index-by style calls in order
to find data in your collections.
The performance of course is not as good as manual indexing, but should be good enough for most of the same use cases.
Usage
All functions are available in the com.wotbrew.idx namespace.
Manually index your collection
; (index coll p kind ...)
;; e.g
(def coll [{:id 1, :name "fred", :created-at #inst "2018-03-14"} ...])
(index coll :id :idx/unique :name :idx/hash :created-at :idx/sort)
;=>
[{:id 1, :name "fred", :created-at #inst "2018-03-14"} ...]
index returns a new indexed collection with the specified indexes (plus any that already existed).
You can also later remove indexes with delete-index.
Kind is either:
:idx/hashfor fastlookupcalls:idx/uniquefor fastidentify/replace-bycalls.:idx/sortfor fastascending/descendingcalls.
The resulting indexing collection meets the interface of its (map/vector/set) input, if you add/associate/remove elements the indexes will be maintained on your behalf.
Automatically index your collection as it is queried.
(auto coll)
This returns an indexed collection that creates indexes (and caches them) on demand as you query.
This can be good in local contexts where you only want to specify indexes once where they are used.
Caveats to this approach:
- You could end up creating a lot of indexes, which slow down
conj,assoc,dissocand so on. - Redundant indexes take up memory, remember you can
delete-index.
You can still call index on auto collections if you want to force certain indexes ahead of time.
Query your collection
Once wrapped with index or auto, a small suite of functions is available to query your collection.
lookup
Uses a one-to-many hash index if available.
Returns sequences of (unsorted) elements. You pass a property to test and value to match.
Say you have a vector of users, each with an age key, you might do:
(lookup users :age 10)
Say you have a vector of numbers, and you want to find the negative ones, functions are properties.
(def numbers [-1,3,4,-5])
(lookup numbers neg? true)
;; =>
(-5, -1)
identify
Uses a one-to-one hash index if available.
identify operates on unique properties and predicates. It will throw
if the property is non unique. Good for by-id type queries.
(def users [{:id 32344, :name "Fred"} ...])
(identify users :id 32344)
;; =>
{:id 32344, :name "Fred"}
ascending, descending
Uses a one-to-many sorted index if available.
(def users [{:name "Alice", :age 42}
{:name "Bob", :age 30}
{:name "Barbara", :age 12}
{:name "Jim", :age 83}])
(ascending users :age > 30) ; ascending sort
;; =>
({:name "Alice", :age 42}, {:name "Jim", :age 83})
(descending users :age <= 30) ; descending sort
;; =>
({:name "Bob", :age 30}, {:name "Barbara", :age 12})
path
path allows nested indexes, use it for nested indexes as if you
are indexing a get-in call.
(lookup orders (path :user :id) 32344)
match
match composes properties to form composite indexes. match returns a Predicate implementation so
you do not have to redundantly respecify the structure in the value position.
(lookup numbers (match neg? true, even? true))
They keys can be any property, and match nests.
This allows for some pretty extensive (and expensive!) indexes, but is useful to compose a couple of properties together for composite keys.
(lookup orders (match (path :user :id) 32444,
:carrier (match :country "uk" :available true)
:delivery-date #"2020-08-17"))
In order to index a match, either use an auto coll or use the :idx/value placeholder.
(index coll (match (path :user :id) :idx/value,
:carrier (match :country :idx/value :available :idx/value)
:delivery-date :idx/value))
pred
pred creates a predicate that uses a truthy/falsey index. It can be used in the value position of matches.
(match :foo (pred :has-bar))
pred is also useful to promote a function so it can be used
in the predicate position of lookup / identify / replace-by.
(lookup :foo (pred even?))
pk
pk looks up the (primary) index or key of the matched element.
Uses a unique one-to-one hash index if one is available.
(pk [1,4,5,3] identity 5) ;; => 2
(pk {:foo 42, :bar 33} identity 33) ;; => bar
(pk [{:foo 42}, {:foo 33}] :foo 42) ;; => 0
pcomp
pcomp allows for function style composition of properties, takes 2 properties and returns a property.
The property returned by (pcomp a b)
Will be looked each property in turn. (a (b element))
Modify your collection
Any indexes (automatic or otherwise) against your collection will be maintained incrementally with new versions of the data structure.
Use normal clojure collection functions such as conj, assoc, dissoc, disj and into.
Some handy functions are enabled due to the presence of indexes.
replace-by
Replaces an element in the collection identified by the property/value or predicate.
Uses a unique index if one is available (always true if you use auto)
e.g
(replace-by customers :email "foo@bar.com" new-customer)
unwrap
As maintaining indexes can become expensive if you want to make a lot of modifications, you can completely remove any indexing
with unwrap.
(unwrap coll)
delete-index
Returns a new collection without the specified index(es), uses same index specification as index
(delete-index coll :foo :idx/hash)
Reference
Properties
idx indexes elements against 'properties', which are objects i
