We've chosen GitHub and Jenkins just for concreteness. Most version control and build systems should work.

A great way to manage your service data is to put it in a version control system. There are several major benefits:

Version control It's easy to recover old service configurations.
Change auditing Need to know who made a change? That's easy with version control.
Self-service With auditing in place, it's feasible to adopt a self-service policy around changes, as opposed to making everybody go through a gatekeeper.
Data availability You get enhanced data availability. If you lose your Seiso database, you can restore it from the copy in version control. If you lose the version control copy, people will have local copies.
Developer-friendly Developers are comfortable working with version control.

This guide shows how to set up a simple but effective service management system using GitHub, Jenkins and Seiso. We begin by creating a couple of so-called data master repositories to contain our service data. Next we'll use the Seiso master importer to sync data from the repo to Seiso. Finally we'll launch the master importer from a Jenkins job so as to establish continuous data syncs. The end result is self-service, version controlled service data management. The service data is available via Seiso to humans and automation alike in an integrated view.

Figure 1. We'll manage our service data in GitHub, and use Jenkins to sync it to Seiso to make it visible and available to automation.

1 Create a data repo for cross-cutting data

A data master repo is just a collection of service data files that we manage in a version control repo. We call the files data masters because we treat the files as authoritative for the data they contain.

Our first repo, which we'll call common, will be for data that spans services, such as data centers and environments. We need to create this first because the service data depends on the more general items that appear in this common repo.

If you're using GitHub, one nice way to manage the repos is to create a GitHub organization specifically for data master repos. That way you can keep all the data in one place, which is nice for management, but still support independent check-ins.

Figure 2. A GitHub organization for data master repos. There's a common repository for cross-cutting data, and lots of service-specific repos.

There are also a couple of patterns that we don't recommend:

  • Don't use a single monolithic data repo. The teams that own the different services become too strongly coupled to one another (e.g., accidental data breakage, untargeted pull requests, etc.). Moreover, push-triggered imports don't scale on this model because you end up importing the entire world when somebody pushes a change.
  • Don't put the data in the same repo as the service source code. You don't want source code pushes to trigger data imports. The possible exception here is using a completely separate branch for the data masters, like GitHub Pages does with gh-pages, if you're familiar with it. But using an organization is better because you have all the data masters in one place.

The common repo typically contains the following data masters (and this will evolve as we add more Seiso item types):

  • data-centers.json
  • environments.json
  • health-statuses.json
  • infrastructure-providers.json
  • regions.json
  • rotation-statuses.json
  • service-groups.json
  • service-types.json
  • status-types.json

By way of example, here's our rotation-statuses.json:

{
  "type" : "rotation-statuses",
  "items" : [ {
    "key" : "enabled",
    "name" : "Enabled",
    "statusType" : "success"
  }, {
    "key" : "disabled",
    "name" : "Disabled",
    "statusType" : "warning"
  }, {
    "key" : "excluded",
    "name" : "Excluded",
    "statusType" : "info"
  }, {
    "key" : "no-endpoints",
    "name" : "No Endpoints",
    "statusType" : "info"
  }, {
    "key" : "partial",
    "name" : "Partial",
    "statusType" : "warning"
  }, {
    "key" : "unknown",
    "name" : "Unknown",
    "statusType" : "warning"
  } ]
}

See the item type descriptions and the data master schemas for more information.


2 Create a data repo for a given service

Now it's time to create a repo for one of your services. If you're using the GitHub organization approach I suggested above, then you can just give the repo the same name as your service. Anyway, one simple directory layout is the following:

machines/machines.json
nodes/nodes.json
services/services.json
services/service-instances.json

This is however just one possible approach. There's a lot of flexibility here:

  • If you have a group of very closely related services, you might just put them all in a single repo. There's no requirement that the repo correspond 1-1 to a service. Rather you just want to keep the repo sized according to what you want to import when somebody pushes a change.
  • You might have another source for your machine data, such as Chef server, or tools that create cloud VMs. If so then you might not want machines/machines.json. At Expedia, a lot of our services use Chef, so we generally pull machine data from Chef server. But we use machines/machines.json in those cases where the service isn't using Chef.
  • You're not restricted to a single JSON file in each directory. You can separate your data masters by environment, like nodes-test.json, nodes-prod.json, nodes-dr.json and so forth. Or if you have a service with a microservice architecture, you could organize the data masters by microservice, like foo-api.json, foo-ui.json, foo-jobs.json, etc. It's totally up to you.
  • In some cases you might have other item types that are more closely aligned with specific services as opposed to being common. For example, we have teams that have service-specific test environments, so they manage those in an environments/environments.json file in their own service repo.
  • If you like YAML better than JSON, the Seiso master importer supports that format too.

Suffice it to say you can really organize the repo pretty much how you like.


3 Import the data into Seiso

Once you have your common repo and at a service repo, we can import them into Seiso using seiso-import-master (instructions here). We need to do the common repo first.

Import the common repo

First import the files in your common repo. The order matters: you have to import independent item types before importing dependent item types. The following order works:

  1. status-types.json
  2. health-statuses.json
  3. rotation-statuses.json
  4. infrastructure-providers.json
  5. regions.json
  6. data-centers.json
  7. load-balancers.json
  8. service-groups.json
  9. service-types.json
  10. environments.json

Import the service repo

The process is the same, but the item types are different. Again the order matters. Use this:

  1. machine files, if you have them
  2. service files
  3. service instance files
  4. node files

At this point, you should be able to go to the Seiso UI and see all the service data you just imported. It's also available through the API of course. (Indeed the UI gets the data from the API.)

Though you could have people import their data any time they make a change, that's not very convenient at all. It's much better to set up jobs to watch for repo changes and then kick off the import in response. The final section shows how to do that.


4 Create a Jenkins jobs to import the data automatically

Each repo gets its own job. Conceptually we simply have a job watch some repo, just like we would when setting up continuous integration for a source code repo. (These data syncs are themselves an example of continuous integration!) Then the job runs the importer. But let's go into some specifics.

First, you probably want to set up your sync jobs as parameterized builds, since the job will need a password to push data into Seiso.

Figure 3. Set your jobs up as parameterized builds so you don't have to hardcode the password into the scripts.

Setting up a job for the common repo is slightly special, just because you have to import a different set of files. At Expedia we have Jenkins listening for Git pushes and cloning into a local workplace subdirectory called seiso-data:

Figure 4. Jenkins SCM configuration.

To perform the actual import, we run the following script:

#!/bin/bash -l
set -e
source /etc/profile.d/rvm.sh
rvm use 1.9.3
gem install --no-rdoc --no-ri seiso-import_master

echo "Configuring importer"
mkdir -p $HOME/.seiso-importers
cat > $HOME/.seiso-importers/seiso.yml <<-EOF
host: seiso.example.com
port: 443
use_ssl: true
ignore_cert: false
username: seiso-batch
password: $api_password
EOF

echo "Importing items"
cd $WORKSPACE/seiso-data
seiso-import-master \
  status-types.json \
  health-statuses.json \
  rotation-statuses.json \
  infrastructure-providers.json \
  regions.json \
  data-centers.json \
  load-balancers.json \
  service-groups.json \
  service-types.json \
  services.json \
  environments.json \
  service-instances.json

set +e

Obviously, you'll want to modify the Seiso importer configuration to match your own settings.

For service repos it's the same, but just change the item types. Keep in mind that seiso-import-master accepts multiple master files, so you can do things like

$ seiso-import-master services/*.json

to handle the cases where a directory contains multiple files.


Congratulations!

Assuming everything went OK, you now have a self-service system in place for managing service data in version control and syncing it automatically with Seiso. The data is now available for easy browsing in Seiso, and it's also available for use by automated processes.

In the next guide we'll show how to source machine data from your Chef server(s), integrating with what we just did here.


Questions & Comments