Hashmark
MySQL time-series database and PHP library for data point insertion and analytic queries
Install / Use
/learn @codeactual/HashmarkREADME
hashmark
hashmark is a MySQL time-series database and PHP library for data point insertion and analytic queries.
Features
- Numeric and string data types.
- PHP client library for collecting data points in preexisting apps.
- Custom scripts for analysis and periodic data point collection.
- SQL macros allowing queries to reference intermediate results from prior statements.
- Configurable date-based partitioning.
- Cache and database adapters provided by bundled Zend Framework 1.x components.
- High unit test coverage.
Analytics
Support
- MySQL aggregate functions:
AVG,SUM,COUNT,MAX,MIN,STDDEV_POP,STDDEV_SAMP,VAR_POP,VAR_SAMP - MySQL aggregate functions eligible for DISTINCT selection:
AVG,'SUM,COUNT,MAX,MIN - Time intervals for aggregates: hour, day, week, month, year
- MySQL time functions for aggregates of recurrence groups (e.g. "1st of the month"):
HOUR,DAYOFMONTH,DAYOFYEAR,MONTH
Methods
<code>multiQuery($scalarId, $start, $end, $stmts)</code>
Perform multiple queries using macros to reference prior intermediate result sets. Internally supports many of the functions below.
<code>values($scalarId, $limit, $start, $end)</code>
Return samples within a date range.
<code>valuesAtInterval($scalarId, $limit, $start, $end, $interval)</code>
Return the most recent sample from each interval within a date range.
<code>valuesAgg($scalarId, $start, $end, $aggFunc, $distinct)</code>
E.g. return **"average value between date X and Y" or "volume of distinct values between date X and Y."
<code>valuesAggAtInterval($scalarId, $start, $end, $interval, $aggFunc, $distinct)</code>
Similar to
valuesAggexcept that results are grouped into a given interval, e.g. "average weekly value between date X and Y."
<code>valuesNestedAggAtInterval($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner)</code>
Aggregate values returned by
valuesAggAtInterval, e.g. "average weekly high between date X and Y."
<code>valuesAggAtRecurrence($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct)</code>
E.g. "peak value in the 8-9am hour between date X and Y."
<code>changes($scalarId, $limit, $start, $end)</code>
Return from a date range each sample's date, value, and change in value from the prior sample.
<code>changesAtInterval($scalarId, $limit, $start, $end, $interval)</code>
Similar to
changesexcept thatvaluesAtIntervalprovides the source data, e.g. "weekly value and its change (week-over-week) between date X and Y."
<code>changesAgg($scalarId, $start, $end, $aggFunc, $distinct)</code>
E.g. "peak value change between date X and Y."
<code>changesAggAtInterval($scalarId, $start, $end, $interval, $aggFunc, $distinct)</code>
Similar to
changesAggexcept thatchangesprovides the source data, e.g. "weekly peak value change (week-over-week) between date X and Y."
<code>changesNestedAggAtInterval($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner)</code>
Aggregate values returned by
changesAggAtInterval, e.g. "average of weekly peak value changes (week-over-week) between date X and Y."
<code>changesAggAtRecurrence($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct)</code>
E.g. "peak value change on Black Friday between year X and year Y."
<code>frequency($scalarId, $limit, $start, $end, $descOrder)</code>
Return unique values and their frequency between date X and Y.
<code>moving($scalarId, $limit, $start, $end, $aggFunc, $distinct)</code>
Return from a date range each sample's date, value, and the aggregate value at sample-time. E.g. "values and their moving averages between date X and Y."
<code>movingAtInterval($scalarId, $limit, $start, $end, $interval, $aggFunc, $distinct)</code>
Similar to
valuesAtIntervalexcept thatmovingprovides the data source, e.g. "the last value and its moving average from each week between date X and Y."
Example Code
Quick Background
Main database tables:
- <code>scalars</code>: Metadata and current value of a named string or number, e.g. "featureX:optOut".
- <code>samples_decimal</code>: Historical values of a numeric data points in
scalars. - <code>samples_string</code>: Historical values of a string data points in
scalars.
Client
<code>Hashmark_Client</code> supplies methods for updating a current value (in scalars) and adding a historical sample (in samples_decimal or samples_string).
- <code>incr</code>($name, $amount = 1, $newSample = false)
- <code>decr</code>($name, $amount = 1, $newSample = false)
- <code>set</code>($name, $amount, $newSample = false)
- <code>get</code>($name)
<?php
if ($userOptedOutOfFeatureX) {
$client->incr('featureX:optOut', 1, true);
}
To enable drop-in client calls to work without any prior setup, e.g. if "featureX:optOut" above did not yet exist, use $client->createScalarIfNotExists(true).
Agent
Each script is just a class that implements the small <code>Hashmark_Agent</code> interface.
The Agent/StockPrice.php demo fetches AAPL's price from Google Finance and creates a historical data point.
<code>Cron/runAgents.php</code> normally runs each agent on a configured schedule, but a manual run might look like:
<?php
$agent = Hashmark::getModule('Agent', 'StockPrice');
$price = $agent->run($scalarId);
$partition = Hashmark::getModule(Partition, '', $db);
$partition->createSample($scalarId, $price, time());
Create a Scalar
<?php
$core = Hashmark::getModule('Core', '', $db);
$scalarFields = array();
$scalarFields['name'] = 'featureX:optOut';
$scalarFields['type'] = 'decimal';
$scalarFields['value'] = 0; // Initial value.
$scalarFields['description'] = 'Opt-out requests for featureX.';
$scalarId = $core->createScalar($scalarFields);
$savedScalarFields = $core->getScalarById($scalarId);
$savedScalarFields = $core->getScalarByName('featureX:optOut');
Create a Category
<?php
$categoryId = $core->createCategory('Feature Trackers');
if (!$core->scalarHasCategory($scalarId, $categoryId)) {
$core->addScalarCategory($scalarId, $categoryId);
}
Create a Milestone
<?php
$milestoneId = $core->createMilestone('featureX initial release');
$core->setMilestoneCategory($milestoneId, $releaseCategoryId);
Query
<?php
$analyst = Hashmark::getModule('Analyst', 'BasicDecimal', $db);
$sampleDateMin = '2012-01-01 00:00:00';
$sampleDateMax = '2012-02-01 00:00:00';
$limit = 10;
// Returns first 10 samples: their dates, values, and running/cumulative totals
$analyst->moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM');
// Now only distinct values affect aggregates
$analyst->moving($scalarId, $limit, $sampleDate
