Seiso User Manual
0.2.0.SNAPSHOT

Willie Wheeler

This manual explains Seiso from an end user's perspective:

  • how to browse data using the web UI, and
  • how to enter, update and delete data using the command line interface (CLI).

We won't cover the application programming interface (API) here. The Seiso API v1 Reference covers that.

Let's begin with a general overview to set the stage.

In this overview we'll cover what Seiso is and why you should care. To that end we'll see how Seiso does what it does, and why it works.

1.1 What is Seiso?

Seiso is a RESTful web service and accompanying user interface for collecting, normalizing, integrating and exposing data in the devops domain. There are a couple of major goals here:

  • Make the data more visible to technology specialists—such as developers, testers, release managers and operators—who need it.
  • Provide a firm basis for automation, such as deployment automation, continuous delivery and automated incident response.

Let's look at how Seiso accomplishes these things.

1.2 How does Seiso work?

We'll begin by going over Seiso from an architectural perspective. After that we'll take a more philosophical perspective and describe the design goals motivating Seiso's approach.

Note that this section contains high-level technical information about Seiso. It's useful if you're trying to understand how Seiso plays with other systems. If however you simply want to know how to use the user interface, you can safely skip to section 1.3.

1.2.1 Seiso architecture

Seiso is intended to be a foundational piece of a larger automation architecture:

Seiso in context
Figure 1.1. Process automation depends upon integrated tools, and integrated tools need integrated data.

At the very top of the architecture are various processes we want to automate, such as software delivery (e.g., continuous delivery) and web operations. Just below that we have the various tools and services providing the functional capabilities underlying those processes. Finally at the bottom we have the data stores and services supporting the tools.

Here is a list of examples—by no means comprehensive—to make the foregoing more concrete:

Process examples

  • Build automation
  • Test automation
  • Deployment automation
  • Release management
  • Continuous delivery
  • Web operations
  • Capacity management
  • VM sprawl management
  • Financial tracking
  • Security (provisioning, auditing, etc.)

Tool examples

  • Cloud: AWS, Google, Heroku, Azure, Digital Ocean
  • Configuration management: Chef, Puppet, Docker
  • Deployment: Ansible, Deployit, uDeploy
  • Orchestration: System Center Orchestrator, Bamboo, Jenkins, Go
  • Monitoring: SCOM, Nagios, Ganglia, AppDynamics, New Relic
  • Dashboards: Graphene, Grafana, Splunk
  • Log management: Splunk, Loggly

Data source examples

  • IAM: Active Directory, LDAP
  • SCM: GitHub, Subversion, Perforce
  • Build systems: Bamboo, Jenkins, Go
  • Issue tracking: Jira, Mingle, Trello, TFS
  • Asset inventories: Chef server, Cobbler
  • ITSM suites: ServiceNow, IBM, HP, CA
  • Infrastructure: Hosts (collectd, statsd), load balancers
  • Monitoring/logging: Graphite, Splunk
  • Management: SNMP, JMX
Table 1.1. Examples of processes, tools and data sources that we might want to integrate.

The arrow in figure 1.1 from the data sources to Seiso represents the fact that Seiso consumes data from external sources. This can be batched in, real-time, semi-real-time or any combination of those. Seiso provides a schema for normalizing and integrating the data, and then exposes it through a RESTful API so the tools can use it in a more comprehensive and pre-integrated fashion.

Tip: In practice, it generally makes sense to employ different data sync strategies for different types of data. For example, person data in Active Directory doesn't change that often, so running a nightly sync usually works just fine. On the other hand, node rotation changes (i.e., being in or out of the load balanced pool) usually need to sync in real time, since deployment orchestration tools need to work with current information.

Note also that both Seiso and the original data sources remain available for use by the tools layer. Seiso isn't a facade sitting in front of the data; rather it is an additional source to consolidate what are otherwise difficult and duplicated data integration efforts.

Now that you know what Seiso is and how it works, it's time to cover the principles that drive Seiso's design.

1.2.2 Seiso design goals

In designing Seiso, we had a number of goals in mind. These goals reflect our views on automation best practices. We lay them out here to make our values and assumptions explicit.

Support self-service. Organizations that embrace agile development and devops culture typically have a "get it done" ethic. Teams in such orgs need to be able to make changes themselves. So we've been careful not to preclude self-service approaches to data management. At the same time, some orgs may want to exercise more control over data changes, and Seiso supports that too. It's up to your org whether this is tightly managed via authorization, loosely managed by auditing, or wide open.

Closed-loop automation. Among the major challenges surrounding efforts to collect automation data is data quality and completeness. There are different approaches to collecting the data, ranging from automated discovery to in-person interviews, and they have various strengths and weaknesses. Automated discovery tends to be less work, but the collected data tends to better lower in the tech stack than higher up. It can also produce too much data, at least for the typical software delivery and operations use cases.

Personal interviews are not particularly effective. They are a lot of work, especially on large teams, and it's hard to get good data. Over time, the quality of the data tends to degrade since people usually have better things to do than update their service information in a spreadsheet or database.

Seiso's approach is to connect data to automation in a mutually reinforcing way. On the one hand, the data enables the automation. On the other, the automation creates a real reason to keep the data current: if the data is wrong, then the automation doesn't work correctly. Moreover, as the automation delivers organization value, it tends to drive data completeness. We call this closed-loop automation:

Closed-loop automation
Figure 1.2. The data makes the automation possible, and the automation drives data quality.

Closed-loop automation creates a situation in which data converges to correctness and completeness. Even if things don't start out that way. (Indeed they rarely if ever start out correct and complete.) It is the difference between data-as-enabler and data-as-shelfware.

The closed-loop approach works. Your author has helped to implement the approach at two large development organizations. In both cases we started by onboarding a few dev teams ourselves, which gave them automated deployments (at the first org) and operations (at both orgs). It wasn't long before new teams began to seek us out, adding their own data to the system, and keeping it up-to-date so as to keep their automation working. (Self-service helps!)

Automation-first design. From a software architecture perspective, the REST API mediates all access to the database. In particular, the web UI has to go through the REST API to get at the data. This design ensures that the corresponding API exists before something can appear in the UI. This is what we mean by "automation-first". See figure 1.3 below.

Automation-first design
Figure 1.3. Automation-first design ensures that we can automate anything we can do in the UI.

Favor best-of-breed tools over the big vendor tool suite. Broadly speaking, there are two approaches to automating the workplace:

  • The big vendor approach. Buy a "fully-integrated enterprise automation solution" from one of the big vendors.
  • Best-of-breed approach. Adopt/buy/create specialized, best-of-breed tools and services, and integrate them.

Though there are plausible arguments for going the big vendor route, Seiso adopts the best-of-breed philosophy toward devops automation. Various automation-related movements (devops, webops, continuous delivery, etc.) are evolving so quickly that it doesn't make sense to lock into a single vendor's tool suite. Moreover, nowadays most automation tools have REST APIs, so integration isn't the technical hurdle that it once was.

In our view Seiso works best when deployed as part of a set of loosely-coupled services. Seiso offers both a REST API and messaging support for integration.

Support data federation. A moment's reflection makes it obvious that it's neither feasible nor desirable, on purely technical grounds, to have a "one-repo-to-rule-them-all" for automation data. Consider for example person data (name, contact information, etc.), source code data (commits, builds, etc.) and load balancer rotation data:

  • If your organization is using Microsoft Exchange, then your Active Directory servers are likely to be the authoritative source for person data.
  • Your source code repositories are going to be authoritative for commits.
  • Your build servers are going to be authoritative for builds.
  • Your load balancers are going to be authoritative for rotation.

It wouldn't make sense to try to make some single system authoritative for all of that data, since the tools that use the data wouldn't generally support that.

One-repo-to-rule-them-all
Figure 1.4. It's easy to see that "one-repo-to-rule-them-all" won't work.

Besides the technical reason, though, there's also an organizational reason that emerges in organizations of any size. Departments and teams often have their own favorite tools and data sources for their own data. One team might use JIRA for issue tracking where another uses TFS and a third uses Trello. One team uses Git and GitHub where another uses Subversion. This sort of technology fragmentation is the rule, not the exception, and any data integration approach needs to confront it head-on.

In fairness, the "one-repo-to-rule-them-all" discussion above is a bit of a strawman. Nobody actually advocates managing person data and source code from a single authoritative repo. But in presenting the extreme view, we're highlighting the challenges that arise in more modest versions.

For instance, it is common for an organizational big wheel to say things like: "Wouldn't it be a lot better if everybody just used [whatever big wheel used at previous company]?" Some people resist, others block new tools because "they're not the standard", and so on.

We consider such behaviors to be dysfunctional. Why would any org intentionally cut itself off from the many new, high-quality and often low cost or free offerings that appear on a regular basis? While we should certainly harmonize toolsets where it's feasible, we should be prepared to integrate where appropriate.

A correct data integration approach presupposes data federation. That is, it assumes that there will be a multitude of systems, organized around data type, ownership and potentially other factors, authoritative for subsets of the data we care about. Seiso is designed to work in such a setting.

Seiso integration
Figure 1.5. Seiso provides an integrated view without trying to replace existing authoritative sources.

That wraps up our discussion of Seiso's design goals. You should now have a better idea of some of the opinions that led us to create Seiso in the first place, and the rationale behind key elements of the design. Now let's turn our attention to the major concepts within the Seiso app itself.

1.3 Key concepts

Seiso's main job is to manage service data. We use item as a generic way to refer to the things-being-managed. There are several item types, but the main ones are the following:

Item type Description
Services Services are the applications, web services, databases and so forth that we want to manage (e.g., develop, deploy, operate, etc.). As such they are the most fundamental item type in Seiso.
Environments Environments are where you perform specific kinds of software lifecycle activity, such as development, testing, staging and production.
Service instances Service instances are services as they exist in given environments, such as a hypothetical CRM service in a production environment. A service can have multiple instances in any given environment. Service instances are probably the focal point for most use cases.
Nodes Nodes are service instances as they exist on given machines. A single machine can host multiple nodes from one or more service instances.
Machines Machines are the infrastructure hosting the nodes. Machines can be physical or virtual.
Table 1.2. The key Seiso item types.

Figure 1.6 illustrates the relationships between the key item types.

E/R diagram for key entities
Figure 1.6. An entity/relationship diagram highlighting the key item types and their interrelationships. Service instances tie services to environments, and comprise nodes running on machines.

In the next section we will take a closer look at the item types above, as well as other supporting types.