Rangeless
c++ LINQ -like library of higher-order functions for data manipulation
Install / Use
/learn @ast-al/RangelessREADME
rangeless::fn
range-free LINQ-like library of higher-order functions for manipulation of containers and lazy input-sequences.
Documentation
What it's for
- Reduce the amount of mutable state.
- Flatten control-flow.
- Lessen the need to deal with iterators directly.
- Make the code more expressive and composeable.
This library is intended for moderate to advanced-level c++ programmers that like the idea of c++ ranges, but can't or choose not to use them for various reasons (e.g. high complexity, compilation overhead, debug-build performance, size of the library, etc).
Motivations:
- https://www.fluentcpp.com/2019/09/13/the-surprising-limitations-of-c-ranges-beyond-trivial-use-cases/
- https://www.fluentcpp.com/2019/02/12/the-terrible-problem-of-incrementing-a-smart-iterator/
- https://brevzin.github.io/c++/2020/07/06/split-view/
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2011r1.html
- https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
Features
- Portable c++11. (examples are c++14)
- Single-header.
- Minimal standard library dependencies.
- No inheritance, polymorphism, type-erasures, ADL, advanced metaprogramming, enable_ifs, concepts, preprocessor magic, arcane programming techniques (for some definition of arcane), or compiler-specific workarounds.
- Low
#includeand compile-time overhead. - Enables trivial parallelization (see
fn::to_async and fn::transform_in_parallel). - Allows for trivial extension of functionality (see
fn::adapt).
Simple examples
struct employee_t
{
int id;
std::string last_name;
std::string first_name;
int years_onboard;
bool operator<(const employee_t& other) const
{
return id < other.id;
}
};
namespace fn = rangeless::fn;
using fn::operators::operator%; // arg % fn equivalent to fn(std::forward<Arg>(arg))
using fn::operators::operator%=; // arg %= fn; equivalent to arg = fn( std::move(arg));
// Abbreviated lambda macro, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0573r2.html
// (It is not defined in this library, because nobody likes macros except those we define ourselves).
#define L(expr) ([&](auto&& _ ){ return expr; })
auto employees = std::vector<employee_t>{/*...*/};
employees %= fn::where L( _.last_name != "Doe" );
employees %= fn::take_top_n_by(10, L( _.years_onboard ));
employees %= fn::sort_by L( std::tie( _.last_name, _.first_name) );
// or:
employees = std::move(employees)
% fn::where L( _.last_name != "Doe" )
% fn::take_top_n_by(10, L( _.years_onboard ))
% fn::sort_by L( std::tie( _.last_name, _.first_name) );
How does this work? E.g. fn::sort_by(projection_fn) is a higher-order function that returns a unary function that takes inputs by value (normally passed as rvalue), sorts them by the user-provided projection, and returns them by value (non-copying). Similarly, fn::where(predicate) filters the input to those satisfying the predicate.
operator %, pronounced "then", invoked as arg % unary_function, is syntax-sugar, similar to F#'s operator |>, that enables structuring your code in top-down manner, consistent with the direction of the data-flow, similar to UNIX pipes. It is implemented as:
template<typename Arg, typename F>
auto operator % (Arg&& arg, F&& fn) -> decltype( std::forward<F>(fn)( std::forward<Arg>(arg)) )
{
return std::forward<F>(fn)( std::forward<Arg>(arg));
}
Without using operator% the above example would look as follows:
employees = fn::where L( _.last_name != "Doe" )( std::move(employees) );
employees = fn::take_top_n_by(10, L( _.years_onboard ))( std::move(employees) );
employees = fn::sort_by L( std::tie( _.last_name, _.first_name) )( std::move(employees) );
// or, as single nested function call:
employees = fn::sort_by L( std::tie( _.last_name, _.first_name))(
fn::take_top_n_by(10, L( _.years_onboard ))(
fn::where L( _.last_name != "Doe" )(
std::move(employees) )));
Example: Top-5 most frequent words chosen among the words of the same length.
auto my_isalnum = L( std::isalnum(_) || _ == '_' );
fn::from( std::istreambuf_iterator<char>(istr.rdbuf()), {})
% fn::transform L( char(std::tolower(_)) )
% fn::group_adjacent_by(my_isalnum) // returns sequence-of-std::string
% fn::where L( my_isalnum( _.front())) // discard strings with punctuation
% fn::counts() // returns map<string,size_t> of word->count
% fn::group_all_by L( _.first.size()) // returns [[(word, count)]], each subvector containing words of same length
% fn::transform( // transform each sub-vector...
fn::take_top_n_by(5UL, L( _.second)) // by filtering it taking top-5 by count.
% fn::concat() // undo group_all_by (flatten)
% fn::for_each L( void(std::cout << _.first << "\t" << _.second << "\n") );
// compilation time:
// >>time g++ -I ../include/ -std=c++14 -o test.o -c test.cpp
// real 0m1.176s
// user 0m1.051s
// sys 0m0.097s
More examples
- A rudimentary lazy TSV parser.
- calendar.cpp vs. Haskell vs. range-v3 implementation.
- aln_filter.cpp for more advanced examples of use.
Description
Unlike range-v3, this library is centered around value-semantics rather than reference-semantics. This library does not know or deal with the multitude of range-related concepts; rather, it deals with data transformations via higher-order functions. It differentiates between two types of inputs: a Container and a lazy seq<NullaryInvokable> satisfying single-pass forward-only InputRange semantics (also known as a data-stream). Most of the function-objects in this library have two overloads of operator() respectively. Rather than composing views over ranges as with range-v3, operator()s take inputs by value, operate on it eagerly or compose a lazy seq, as appropriate (following the Principle of Least Astonishment), and return the result by value (with move-semantics) to the next stage.
E.g.
fn::where- given a container, passed by rvalue, returns the same container filtered to elements satisfying the predicate, using the erase-remove or iterate-erase idioms under the hood, as appropriate.
- given a container, passed by lvalue-reference, returns a copy of the container with elements satisfying the predicate, using
std::copy_ifunder the hood. - given a
seq, passed by value, composes and returns aseqthat will skip the elements not satisfying the predicate (lazy).
fn::sort- given a container, passed by value, returns the sorted container.
- given a
seq, passed by value, moves elements into astd::vector, and delegates to the above.
fn::transform- given a
seq, passed by value, returns aseqwrapping a composition of the transform-function over the underlyingNullaryInvokablethat will lazily yield the results of transform-function. - given a container, passed by value, wraps it as
seqand delegates to the above (i.e. also lazy).
- given a
Some functions in this library internally buffer elements, as appropriate, with single-pass streaming inputs, whereas range-v3, on the other hand, imposes multipass ForwardRange or stronger requirement on the inputs in situations that would otherwise require buffering. This makes this library conceptually more similar to UNIX pipelines with eager sort and lazy sed, than to c++ ranges.
| Operations | Buffering behavior | Laziness |
| ---------- | ------------------ | -------- |
| fn::group_adjacent_by, fn::in_groups_of | buffer elements of the incoming group | lazy |
| fn::unique_all_by | buffer unique keys of elements seen so far | lazy |
| fn::drop_last, fn::sliding_window | buffer a queue of last n elements | lazy |
| fn::transform_in_parallel | buffer a queue of n executing async-tasks | lazy |
| fn::group_all_by, fn::sort_by, fn::lazy_sort_by, fn::reverse, fn::to_vector | buffer all elements | eager |
| fn::take_last | buffer a queue of last n elements | eager |
| fn::where_max_by, fn::where_min_by | buffer maximal/minimal elements as seen so-far | eager |
| fn::take_top_n_by | buffer top n elements as seen so-far | eager |
Signaling end-of-sequence from a generator-function
More often than not a generator-function that yields a sequence of values will not be an infinite Fibonacci sequence, but rather some bounded sequence of objects, either from a file, a socket, a database query, etc, so we need to be able to signal end-of-sequence. One way to do it is to yield elements wrapped in std::unique_ptr or std::optional:
fn::seq([]() -> std::unique_ptr<...> { ... })
% fn::take_while([](const auto& x) { return bool(x); })
% fn::transform(fn::get::dereferenced{})
% ...
If your value-type has an "empty" state interpretable as end-of-inputs, you can use the value-type directly without wrapping.
If you don't care about incurring an exception-handling overhead once per whole seq, there's a simpler way of doing it: just return fn::end_seq() from the generator function (e.g. see my_intersperse example). This throws end-of-sequence exception that is caught under the hood (python-style). If you are in -fno-exceptions land, then this method is not for you.
