SkillAgentSearch skills...

Repor

Rails data aggregation framework

Install / Use

/learn @asross/Repor
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Note: this gem is no longer maintained, see this awesome fork by chaunce instead!

Repor

repor is a framework for aggregating data about Rails models backed by PostgreSQL, MySQL, or SQLite databases. It's designed to be flexible enough to accommodate many use cases, but opinionated enough to avoid the need for boilerplate.

<!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> <!-- END doctoc generated TOC please keep comment here to allow auto update -->

Basic usage

Here are some examples of how to define, run, and serialize a Repor::Report:

class PostReport < Repor::Report
  report_on :Post

  category_dimension :author, relation: ->(r) { r.joins(:author) },
    expression: 'users.name'
  number_dimension :likes
  time_dimension :created_at

  count_aggregator :number_of_posts
  sum_aggregator :total_likes, expression: 'posts.likes'
  array_aggregator :post_ids, expression: 'posts.id'
end

# show me # published posts from 2014-2015 with at least 4 likes, by author

report = PostReport.new(
  relation: Post.published,
  groupers: [:author],
  aggregator: :number_of_posts,
  dimensions: {
    likes: {
      only: { min: 4 }
    },
    created_at: {
      only: { min: '2014', max: '2015' }
    }
  }
)

puts report.data

# => [
#  { key: 'James Joyce', value: 10 },
#  { key: 'Margaret Atwood', value: 4 }
#  { key: 'Toni Morrison', value: 5 }
# ]

# show me likes on specific authors' posts by author and year, from 1985-1987

report = PostReport.new(
  groupers: [:author, :created_at],
  aggregator: :total_likes,
  dimensions: {
    created_at: {
      only: { min: '1985', max: '1987' },
      bin_width: 'year'
    },
    author: {
      only: ['Edith Wharton', 'James Baldwin']
    }
  }
)

puts report.data

# => [{
#   key: { min: Tue, 01 Jan 1985 00:00:00 UTC +00:00,
#          max: Wed, 01 Jan 1986 00:00:00 UTC +00:00 },
#   values: [
#     { key: 'Edith Wharton', value: 35 },
#     { key: 'James Baldwin', value: 13 }
#   ]
# }, {
#   key: { min: Wed, 01 Jan 1986 00:00:00 UTC +00:00,
#          max: Thu, 01 Jan 1987 00:00:00 UTC +00:00 },
#   values: [
#     { key: 'Edith Wharton', value: 0 },
#     { key: 'James Baldwin', value: 0 }
#   ]
# }, {
#   key: { min: Thu, 01 Jan 1987 00:00:00 UTC +00:00,
#          max: Fri, 01 Jan 1988 00:00:00 UTC +00:00 },
#   values: [
#     { key: 'Edith Wharton', value: 0 },
#     { key: 'James Baldwin', value: 19 }
#   ]
# }]

csv_serializer = Repor::Serializers::CsvSerializer.new(report)
puts csv_serializer.csv_text

# => csv text string

chart_serializer = Repor::Serializers::HighchartsSerializer.new(report)
puts chart_serializer.highcharts_options

# => highcharts options hash

To define a report, you declare dimensions (which represent attributes of your data) and aggregators (which represent quantities you want to measure). To run a report, you instantiate it with one aggregator and at least one dimension, then inspect its data. You can also wrap it in a serializer to get results in useful formats.

Building reports

Just call ReportClass.new(params), where params is a hash with these keys:

  • aggregator (required) is the name of the aggregator to aggregate by
  • groupers (required) is a list of the names of the dimension(s) to group by
  • relation (optional) provides an initial scope for the data
  • dimensions (optional) holds dimension-specific filter or grouping options

See below for more details about dimension-specific parameters.

Defining reports

Base relation

A Repor::Report either needs to know what ActiveRecord class it is reporting on, or it needs to know a table_name and a base_relation.

You can specify an ActiveRecord class by calling the report_on class method with a class or class name, or if you prefer, you can override the other two as instance methods.

By default, it will try to infer an ActiveRecord class from the report class name by dropping /Report$/ and constantizing.

class PostReport < Repor::Report
end

PostReport.new.table_name
# => 'posts'

PostReport.new.base_relation
# => Post.all

class PostStructuralReport < Repor::Report
  report_on :Post

  def base_relation
    super.where(author: 'Foucault')
  end
end

PostStructuralReport.new.table_name
# => 'posts'

PostStructuralReport.new.base_relation
# => Post.where(author: 'Foucault')

Finally, you can also use autoreport_on if you'd like to automatically infer dimensions from your columns and associations. autoreport_on will try to map most columns to dimensions, and if the column in question is for a belongs_to association, will even try to join and report on the association's name:

class PostReport < Repor::Report
  autoreport_on Post
end

PostReport.new.dimensions.keys
# => %i[:created_at, :updated_at, :likes, :title, :author]

PostReport.new.dimensions[:author].expression
# => 'users.name'

Autoreport behavior can be customized by overriding certain methods; see the Repor::Report code for more information.

Dimensions (x-axes)

You define dimensions on your Repor::Report to represent attributes of your data you're interested in. Dimensions objects can filter or group your relation by a SQL expression, and accept/return simple Ruby values of various types.

There are several built-in types of dimensions:

  • CategoryDimension
    • Groups/filters the relation by the discrete values of the expression
  • NumberDimension
    • Groups/filters the relation by binning a continuous numeric expression
  • TimeDimension
    • Like number dimensions, but the bins are increments of time

You define dimensions in your report class like this:

class PostReport < Repor::Report
  category_dimension :status
  number_dimension :author_rating, expression: 'users.rating',
    relation: ->(r) { r.joins(:author) }
  time_dimension :publication_date, expression: 'posts.published_at'
end

The SQL expression a dimension uses defaults to:

"#{report.table_name}.#{dimension.name}"

but this can be overridden by passing an expression option. Additionally, if the filtering or grouping requires joins or other SQL operations, a custom relation proc can be passed, which will be called beforehand.

Filtering by dimensions

All dimensions can be filtered to one or more values by passing in params[:dimensions][<dimension name>][:only].

CategoryDimension#only should be passed the exact values you'd like to filter to (or what will map to them after connection adapter quoting).

NumberDimension and TimeDimension are "bin" dimensions, and their onlys should be passed one or more bin ranges. Bin ranges should be hashes of at least one of min and max, or they should just be nil to explicitly select rows for which expression is null. Bin range filtering is min-inclusive but max-exclusive. For NumberDimension, the bin values should be numbers or strings of digits. For TimeDimension, the bin values should be dates/times or Time.zone.parse-able strings.

Grouping by dimensions

To group by a dimension, pass its name to params[:groupers].

For bin dimensions (NumberDimension and TimeDimension), where the values being grouped by are ranges of numbers or times, you can specify additional options to control the width and distribution of those bins. In particular, you can pass values to:

  • params[:dimensions][<name>][:bins],
  • params[:dimensions][<name>][:bin_count], or
  • params[:dimensions][<name>][:bin_width]

bins is the most general option; you can use it to divide the full domain of the data into non-uniform, overlapping, and even null bin ranges. It should be passed an array of the same min/max hashes or nil used in filtering.

bin_count will divide the domain of the data into a fixed number of bins. It should be passed a positive integer.

bin_width will tile the domain with bins of a fixed width. It should be passed a positive number for NumberDimensions and a "duration" for TimeDimensions. Durations can either be strings of a number followed by a time increment (minutes, hours, days, weeks, months, years), or they can be hashes suitable for use with ActiveSupport::TimeWithZone#advance. E.g.:

params[:dimensions][<time dimension>][:bin_width] = '1 month'
params[:dimensions][<time dimension>][:bin_width] = { months: 2, hours: 2 }

NumberDimensions will default to using 10 bins and TimeDimensions will default to using a sensical increment of time given the domain; you can customize this by overriding methods in those classes.

Note that when you inspect report.data after grouping by a bin dimension, you will see the dimension values are actually Repor::BinDimension::Bin objects, which respond to min, max, and various json/Hash methods. These are meant to provide a common interface for the different types of bins (double-bounded, unbounded on one side, null) and handle mapping between SQL and Ruby representations of their values. You may find bin objects useful in working with report data, and they can also be customized.

If you want to chang

View on GitHub
GitHub Stars58
CategoryDevelopment
Updated2y ago
Forks2

Languages

Ruby

Security Score

80/100

Audited on Feb 1, 2024

No findings