What's Jens?

Jens is the Puppet modules/hostgroups librarian used by the CERN IT department. It is basically a Python toolkit that generates Puppet environments dynamically based on some input metadata. Jens is useful in sites where there are several administrators taking care of siloed services (mapped to what we call top-level "hostgroups", see below) with very service-specific configuration but sharing configuration logic via modules.

This tool covers the need of several roles that might show up in a typical shared Puppet infrastructure:

Developers writing manifests who want an environment to test new code: Jens provides dynamic environments that automatically update with overrides for the modules being developed that point to development branches.
Administrators who don't care: Jens supports simple dynamic environments that default to the production branch of all modules and that only update automatically when there's new production code.
Administrators looking for extra stability who are reluctant to do rolling updates: Jens implements snapshot environments that are completely static and don't update unless redefined, as all modules are pinned by commit identifier.

Right now, the functionality is quite tailored to CERN IT's needs, however all contributions to make it more generic and therefore more useful for the community are more than welcome.

This program has been used as the production Puppet librarian at CERN IT since August 2013.

Introduction

In Jens' realm, Puppet environments are basically a collection of modules, hostgroups, hierarchies of Hiera data and a site.pp. These environments are defined in environment definition files which are stored in a separate repository that's known to the program. Also, Jens makes use of a second metadata repository to know what modules and hostgroups are part of the library and are therefore available to generate environments.

With all this information, Jens produces a set of environments that can be used by Puppet masters to compile Puppet catalogs. Two types of environments are supported: dynamic and static. The former update automatically as new commits arrive to the concerned repositories whereas the latter remain static pointing to the specified commits to implement the concept of "configuration snapshot" (read the environments section for more information).

Jens is composed by several CLIs: jens-config, jens-gc, jens-reset, jens-stats and jens-update to perform different tasks. Manual pages are shipped for all of them.

Basically, the input data that's necessary for an execution of jens-update (the core tool provided by this toolset) is two Git repositories:

The repository metadata repository (or the library)
The environment definitions repository (or the environments)

More details about these are given in the following sections.

Repository metadata

Jens uses a single YAML file stored in a Git repository to know what are the modules and hostgroups available to generate environments. Apart from that, it's also used to define the paths to two special Git repositories containing what's called around here the common Hiera data and the site manifest.

This is all set up via two configuration keys: repositorymetadatadir (which is the directory containing a clone of the repository) and repositorymetadata (the file itself).

The following is how a skeleton of the file looks like:

---
repositories:
  common:
      hieradata: http://git.example.org/pub/it-puppet-common-hieradata
      site: http://git.example.org/pub/it-puppet-site
  hostgroups:
      ...
      aimon: http://git.example.org/pub/it-puppet-hostgroup-aimon
      cloud: http://git.example.org/pub/it-puppet-hostgroup-cloud
      ...
  modules:
      ...
      apache: http://git.example.org/pub/it-puppet-module-apache
      bcache: http://git.example.org/pub/it-puppet-module-bcache
      ...

The idea is that when a new top-level hostgroup is added or a new module is needed this file gets populated with the corresponding clone URLs of the repositories. Jens will add new elements to all the environments that are entitled to get them during the next run of jens-update.

Another example is available in examples/repositories/repositories.yaml.

Common Hiera data and Site

There are two bits that are declared via the library file that require some extra clarifications, especially because they are fundamentally traversal to the rest of the declarations and are maybe a bit hardcoded to how our Puppet infrastructure is designed.

The repository pointed to by site must contain a single manifest called site.pp that serves as the catalog compilation entrypoint and therefore where all the hostgroup autoloading (explained later) takes place.

it-puppet-site/
├── code
│   └── site.pp
└── README

OTOH, the common hieradata is a special repository that hosts different types of Hiera data to fill the gaps that can't be defined at hostgroup or module level (operating system, hardware vendor, datacentre location and environment dependent keys). The list of these items is configurable and can be set by using the configuration key common_hieradata_items. The following is an example of how the hierarchy in there should look like.

it-puppet-common-hieradata/
├── data
│   ├── common.yaml
│   ├── environments
│   │   ├── production.yaml
│   │   └── qa.yaml
│   ├── datacentres
│   │   ├── europe.yaml
│   │   ├── usa.yaml
│   │   └── ...
│   ├── hardware
│   │   └── vendor
│   │       ├── foovendor.yaml
│   │       └── ...
│   └── operatingsystems
│       └── RedHat
│           ├── 5.yaml
│           ├── 6.yaml
│           └── 7.yaml
└── README

common.yaml is the most generic Hiera data YAML file of all the hierarchy as it's visible for all nodes regardless of their hostgroup, environment, hardware type, operatingsystem and datacentre. It's useful to define very top-level keys.

Working examples of both repositories (used during the installation tutorial later on) can be found in the following locations

examples/example-repositories/common-hieradata
examples/example-repositories/site

Also, an example of a Hiera hierarchy configuration file that matches this structure is available on examples/hiera.yaml.

Modules: Code and data directories

Each module/hostgroup lives in a separate Git repository, which contains two top-level directories: code and data.

code: this is where the actual Puppet code resides, basically where the manifests, lib, files and templates directories live.
data: all the relevant Hiera data is stored here. For modules, there's only one YAML file named after the module.

Example:

it-puppet-module-lemon/
├── code
│   ├── lib
│   │   └── facter
│   │       ├── configured_kernel.rb
│   │       ├── lemon_exceptions.rb
│   │       └── ...
│   ├── manifests
│   │   ├── config.pp
│   │   ├── init.pp
│   │   ├── install.pp
│   │   ├── klogd.pp
│   │   ├── las.pp
│   │   └── ...
│   ├── Modulefile
│   ├── README
│   └── templates
│       └── metric.conf.rb
└── data
    └── lemon.yaml

For those already wondering how we manage to keep track of upstream modules with this strutucture: Git subtree :)

Hostgroups: What they are and why they're useful

Hostgroups are just Puppet modules that are a bit special, allowing us to automatically load Puppet classes based on the hostgroup a given host belongs to (information which is fetched at compilation time from an ENC).

This is a CERNism and unfortunately we're not aware of anybody in the Puppet community doing something similar. However, we found this idea very useful to classify IT services, grouping machines belonging to a given service in the same top-level hostgroup. Modules are normally included in the hostgroup manifests (along the hierarchy) and configured via Hiera.

In short, hostgroups represent the service-specific configuration and modules are reusable "blocks" of code that abstract certain recurrent configuration tasks which are typically used by several hostgroups.

Getting back to the structure itself, the code directory serves the same purpose as the one for modules, however the data one is slightly different, as it contains FQDN-specific Hiera data for hosts belonging to this hostgroup and data that applies at different levels of the hostgroup hierarchy.

Next, a partial example of a real-life hostgroup and its subhostgroups with the corresponding manifests and Hiera data:

it-puppet-hostgroup-punch/code/manifests/
├── aijens
│   ├── app
│   │   └── live.pp
│   ├── app.pp
...
├── init.pp

it-puppet-hostgroup-punch/data/hostgroup
├── punch
│   ├── aijens
│   │   ├── app
│   │   │   ├── live
│   │   │   │   └── primary.yaml
│   │   │   └── live.yaml
│   │   └── app.yaml
│   ├── aijens.yaml
...
└── punch.yaml

it-puppet-hostgroup-punch/data/fqdns/
├── foo1.cern.ch.yaml

For instance, if foo1.cern.ch belonged to punch/aijens/app/live, it'd be entitled to automatically include init.pp, aijens.pp (which does no exist in this case), app.pp and live.pp. Also, Hiera keys will be looked up using files foo1.cern.ch.yaml, punch.yaml, aijens.yaml, app.yaml and live.yaml

To avoid clashes during the autoloading with modules that might have the same name, the top-most class of the hostgroup is prefixed with hg_.

~ $ grep ^class it-puppet-hostgroup-punch/code/manifests/aijens/app.pp
class hg_punch::aijens::app {
~ $ grep ^class it-puppet-hostgroup-punch/code/manifests/init.pp
class hg_punch {

There's more information about how this all works filesystem-wise below. An example of the autoloading mechanism can be found in the example site.pp mentioned abo

Jens

Install / Use