SkillAgentSearch skills...

Opsweekly

On call alert classification and reporting

Install / Use

/learn @etsy/Opsweekly
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Opsweekly Build Status

Deploy

What is Opsweekly?

Opsweekly is a weekly report tracker, an on call categorisation and reporting tool, a sleep tracker, a meeting organiser and a coffee maker all in one.

The goal of Opsweekly is to both organise your team into one central place, but also helps you understand and improve your on call rotations through the use of a simple on call "survey", and reporting as a result of that tracking.

Alert classification is a complicated task, but with Opsweekly a few simple questions about each alert received can pay dividends in improving the on call experience for your engineers.

Features

  • Weekly Updates: Every member of your team can write a weekly status update using hints (e.g. Github commits, JIRA tickets) to inform the team what they've been working on, and then optionally email it out.
  • On-Call Alert Classification: Track, measure and improve your on call rotations by allowing your engineers to easily classify and document the alerts they received.
    • Make a simple assesment for each alert that relates to whether action was taken, no action was taken, or whether the alert needs modification for follow up later
    • Free notes field to allow documentation of actions taken to refer back to later
    • Bulk classification for time saving
  • Sleep Tracking: If your engineers have popular life tracking devices such as Fitbit or Jawbone UP, they integrate with Opsweekly to provide even more insight into the effect on call is having on their lives.
    • Mean time to sleep (MTTS) and sleep time lost to notifications are calculated.
    • Easy to configure but gives valuable data that could lead to questions like "can this alert wait until morning as it keeps waking up our engineers?"
  • In depth reporting: As you start to build up data, Opsweekly starts to generate reports and graphs illustrating your on call rotations.
    • Examples include: action taken vs no action taken on alerts, what alerts wake people up the most, mean time to sleep, top notifying hosts/services, average alert volume per day, and how on call has improved (or not) over the last year
  • Personal reporting: As well as a summary for all rotations, users are able to gain insight into their own behaviours.
    • How have their on calls affected them?
    • How much sleep do they lose on average?
    • How does this compare to others?
    • Optionally, one can view a sleep retrospective that compares the last several on-call rotations' impact on sleep loss.
      • In phplib/config.php, define oncall_sleep_retrospective_count with a numeric value (such as 3). Users viewing their profile will then see how past weeks affected their sleep.
  • Meeting Mode: Make running a weekly meeting simple with all the data you need in one page, and a facility for people to take notes.
    • Meeting mode hides all UI displaying only information required for the meeting.
    • The on call report for the previous week is included, along with key stats and elements from report
    • All weekly updates are displayed in case items need to be discussed
    • Set up a cron to remind people about the weekly meeting and provide the permalink to the meeting
  • Powerful Search: All data is searchable using a powerful search function. The default search mode is fuzzy, which will return results from all data stored in Opsweekly. However, you can get more specific:
    • Search previous on call alerts for a history of that alert, previous engineer's notes, how the alerts were classified (is this alert constantly "no action taken?") and a time map showing it's frequency over the past year.
    • Search Weekly Updates for full context on changes made previously
    • Search Meeting Notes for agenda items discussed in previous meetings
  • Fully timezone aware: Obviously it's important for users to be editing the alerts they receive in the timezone that they received them in. Each user can set their own timezone for the whole Opsweekly UI.
  • Fill in as you go/drafts: Both the Weekly report and the On-call reports can be updated to multiple times during the week, so the user does not have to edit a hefty report at the end.

Screenshots

Please visit the screenshot README for a guided tour of how Opsweekly works and the reports it can generate!

Prerequisites

  • A webserver
  • PHP 5.4 (or higher), including the curl extensions for PHP, MySQL extension, and short_open_tags enabled
  • MySQL for data storage

Installation/configuration

  1. Download/clone the repo into an appropriate folder either in your webservers directory or symlinked to it. or:
  2. Create a configuration in your webserver for Opsweekly, if using it as a seperate domain (e.g. VirtualHost)
  3. You must increase the PHP variable max_input_vars for submitting on-call reports. See Increasing max input vars
  4. Create a MySQL database for opsweekly, and optionally grant a new user access to it. E.g.:
    • mysql> create database opsweekly;
    • mysql> grant all on opsweekly.* to opsweekly_user@localhost IDENTIFIED BY 'my_password';
  5. Load the database schema into MySQL, e.g. mysql -u opsweekly_user opsweekly < opsweekly.sql
  6. Teach Opsweekly how to authenticate your users.
  7. Move phplib/config.php.example to phplib/config.php, edit with your favourite editor (more detail below)
  8. Load Opsweekly in your browser
  9. Reward yourself with a refreshing beverage.

Upgrading

We're careful to only allow changes that should be backwards compatible with previous versions of opsweekly, e.g. if a new configuration value is added, a sensible default is included, etc.

Having said that, sometimes database schema changes are required. The script upgrade_db.php will attempt to alter your tables for those schema changes; if it fails, you can copy and paste the SQL and run manually. Re-running the upgrade_db.php more than once will not break your database.

Commiters/Maintainers: If you add a new database column, please add your schema change to upgrade_db.php so existing users can enjoy the features you add!

Providers/Plugins

Opsweekly uses the concept of "providers" for the various pieces of data it needs. These are like plugins and can vary from team to team.

The following providers are used:

  • providers/weekly/: These are known as weekly "hints" which are used to helpfully hint or remind people what they did in the last week when writing their reports.
    • Weekly hint provider peoples include Github (showing recent commit activity) or JIRA (showing tickets closed in that time period)
  • providers/oncall/: These are used to pull in notifications from somewhere for the on call engineer to document.
    • For example, if you're using Logstash or Splunk to parse your Nagios logs, or pull in alerts sent to Pagerduty.
  • providers/sleep: These are used to query an external datasource to establish whether the on call engineer was asleep during the notifications he or she received.
    • Opsweekly has been tested with Jawbone UP and Fitbit sleep trackers with success

The theory behind the providers mean if Opsweekly is not pulling data from a service you're currently using, it should be trivial to write your own and plug them in. Generally providers have two sets of configuration: One global for your entire instance, and then one config per team (or user, in the case of sleep)

For more information about how to configure the providers or to write your own, please see the documentation in each of the provider directories mentioned above.

Configuration

The config.php.example contains an example configuration to get you on your way. It's fairly well commented to explain the common options, but we'll go into more depth here:

Authenticating with Opsweekly

It's very important that Opsweekly knows who everyone who uses Opsweekly is, so the first step of using Opsweekly is to teach it how to understand who people are.

In config.php, there is the important function, getUsername. This function must return the username, for example, "ldenness". You can write whatever PHP you like here; perhaps your SSO passes a HTTP header, or sets a cookie you can read to get the username.

The config.php.example has a couple of examples, one that will use the username from HTTP Basic Auth that can be configured with Apache.

Increasing max input vars

PHP has a default limit of the number of variables that can be input via form submission. Because compiling and submitting the on-call report is essentially just submitting a giant form, you must increase this value or your reports will be truncated!

Look for the configuration option max_input_vars in your PHP configuration (e.g. php.ini) or if you have your own Virtualhost (e.g. in Apache) you can do something like: php_value max_input_vars 10000 to increase the limit.

We highly suggest increasing to 10000 for future proofing your on-call reports. There's no real downside to this if you're limiting it to Opsweekly. The limit is to try and protect against exploits by hash collisions (basically, someone DoS-ing forms on your site). But you should not run Opsweekly exposed on the internet anyway.

Teams configuration

Opsweekly has the ability to support many different teams using the same codebase, if required. Each team gets it's own "copy" of the UI at a unique URL, and their data is stored in a seperate database.

Even if you only intend to use one team, the $teams array contains most of the important configuration for Opsweekly.

The key of the array(s) in the $teams array is the FQDN that you will access Opsweekly via, e.g. `opsweekly.mycompany.c

View on GitHub
GitHub Stars760
CategoryDevelopment
Updated6mo ago
Forks91

Languages

JavaScript

Security Score

87/100

Audited on Sep 5, 2025

No findings