Professional Documents
Culture Documents
Everything is full steam ahead,
until one day you discover that there's
a security vulnerability in one of the applications used.
Now, you need to upgrade
all the servers to the latest version.
If you have 10 servers in the fleet,
it's probably not too much trouble
to log into each one of
them one after the other and install the new version.
But what if you have 100 servers?
This would get super boring and you'd
likely end up making mistakes,
leaving some servers with the wrong version installed.
Now, imagine having to do this on 1000 servers.
There's no way you're going to log into
each of them to upgrade the software.
So what can you do instead?
In this course, we'll look into how we can apply
automation to manage fleets of computers.
We'll learn how to automate deploying new computers,
keep those machines updated,
manage large-scale changes, and a lot more.
We'll discuss managing both physical machines running in
our offices and virtual machines running in the Cloud.
If this sounds overwhelming, don't worry,
I'll go step-by-step with you along the way.
I'm [inaudible] , and I'm
a Site Reliability Engineer at
Google working on the team that supports G-mail.
If you've never heard about
Site Reliability Engineering before,
let me tell you a bit about what we do.
SRE is focused on the reliability and
maintainability of large systems.
We apply tons of automation techniques to manage them.
This let's teams with only a handful
of engineers have a big impact,
scaling our support as our service grows.
We're small, but mighty.
My job includes a lot of different tasks.
Sometimes I spend my time collaborating with
partner teams on the reliability aspects
of a cool new feature,
like scheduling emails to send at a later time on G-mail.
Other days, I write software,
creating tools that help
automate how we manage the service.
When I'm not doing that,
I might do research or
architectural design for a new project.
I'm also part of the on-call rotation for the service.
If problems come up when I'm on call,
I'm in charge of fixing them or
finding the right person to fix them if I can't.
So what will we cover in this course?
We'll start by looking into
an automation technique called configuration management,
which lets us manage
the configuration of our computers at scale.
Specifically, we'll learn how to use Puppet,
the current industry standard
for configuration management.
We'll look at some simple examples,
and then see how we can apply
the same concept to more complex cases.
You'll be a Puppet master in no time.
Later on, we'll expand
our automation skills by looking into
how we can make use of the Cloud to
help us scale our infrastructure.
We'll learn about the benefits and
challenges of moving services to the Cloud.
We'll check out some of the best practices for
handling hundreds of virtual
machines running in the Cloud,
how to adapt our services to that,
and how to troubleshoot them
when things don't go according to plan.
Heads up, they rarely do.
Before we move on, I should
probably tell you a little bit about myself.
I discovered I was interested in
IT and technology as a teenager.
So when I decided to enlist
in the Navy right after high school,
I signed up to be an
Information Systems Technician there.
I served in the Navy for four years supporting
IT and networks resources around the world.
After leaving the Navy,
I went to college and then joined
Google in the IT support department.
The transition from working in
a very structured environment like
the military to a place like Google
was initially a bit hard to wrap my head around.
I had to become much more comfortable in dealing with
ambiguity in the problem spaces that I was working in,
which meant learning to trust
my own sense of judgment and prioritization.
All along the way, I kept learning
new skills and growing as a person and an engineer.
So I'm excited to be here to help
you take the next step in your IT career,
to help you keep growing
your automation skills by learning
how to manage fleets of
computers using configuration management,
and how to work with the Cloud.
Modern IT is moving more and more
towards Cloud-based solutions and having
a solid background in how to manage them will be
even more critical for IT professionals in the future.
In this course, we'll use Qwiklabs which is
an environment that allows you to test
your code on a virtual machine running in the Cloud.
This lets you experience real-world scenarios,
where you'll need to interact with
one or more remote systems to achieve your goal.
We'll build on top of the many tools
that you've learned about throughout the program,
like using Python for automation scripts,
using Git to store versions of code,
or figure out what's going on when
a program doesn't behave as expected.
You'll see some complex topics and videos that
may not 100 percent sink in the first time around.
That's totally natural.
Take your time and re-watch the videos a few
times if you need to, you'll get the hang of it.
Also, remember that you can use the discussion forums to
connect with your fellow learners and
ask questions anytime you need.
We're about to begin our journey,
learning how we can apply automation at large scale.
So let's get started.
Welcome to the Course
Welcome to the course!
In this course, you’ll learn how to automate managing large fleets of computers using
tools like Configuration Management. You'll also learn how you can make use of Cloud
technologies
Course prerequisites
This course requires some familiarity with basic IT concepts like operating with file
systems, handling processes, and understanding log files.
The example scripts and programs in this course are written in Python, so you’ll need an
understanding of this programming language, too.
We also touch upon other concepts covered in other courses of the specialization, like
using a Version Control System, or using the Linux command line interface.
Qwiklabs
For some of our exercises, you'll be using an application called Qwiklabs. Qwiklabs lets
you interact with a computer running an operating system that might not be the same one
running on your machine. The Qwiklabs scenarios will allow you to solve some real-world
problems, putting your knowledge to work through active learning exercises.
Getting and giving help
Here are a few ways you can give and get help:
1. Discussion forums: You can share information and ideas with your fellow
learners in the discussion forums. These are also great places to find answers
to questions you may have. If you're stuck on a concept, are struggling to
solve a practice exercise, or you just want more information on a subject, the
discussion forums are there to help you move forward.
2. Coursera learner support: Use the Learner Help Center to find information on
specific technical issues. These include error messages, difficulty submitting
assignments, or problems with video playback. If you can’t find an answer in
the documentation, you can also report your problem to the Coursera support
team by clicking on the Contact Us! link available at the bottom of help center
articles.
3. Qwiklabs support: Please use the Qwiklabs support request form to report
any issues with accessing or using Qwiklabs. A member of the Qwiklabs team
will work with you to help resolve the problem.
4. Course content issues: You can also flag problems in course materials by
rating them. When you rate course materials, the instructor will see your
ratings and feedback; other learners won’t. To rate course materials:
Open the course material you want to rate. You can only rate videos, readings,
and quizzes.
If the content was interesting or helped you learn, click the thumbs-up icon.
If the content was unhelpful or confusing, click the thumbs-down icon.
WHAT IS SCALE?
INTRODUCTION TO PUPPET
WHAT IS PUPPET?
PUUPET CLASSES
In the examples of Puppet code that we've seen so far,
we've declared classes that contain one resource.
You might have wondered what those classes were for.
We use these classes to collect the resources that are needed to achieve
a goal in a single place.
For example, you could have a class that installs a package, sets the contents of
a configuration file, and starts the service provided by that package.
Let's check out an example like that.
In this case, we have a class with three resources, a package,
a file, and a service.
All of them are related to the Network Time Protocol, or NTP,
the mechanism our computers use to synchronize the clocks.
Our rules are making sure that the NTP package is always upgraded to
the latest version.
We're setting the contents of the configuration file using the source
attribute, which means that the agent will read the required contents from
the specified location.
And we're saying that we want the NTP service to be enabled and running.
By grouping all of the resources related to NTP in the same class,
we only need a quick glance to understand how the service is configured and
how it's supposed to work.
This would make it easier to make changes in the future since we have
all the related resources together.
It makes sense to use this technique whenever we want
to group related resources.
For example,
you could have a class grouping all resources related to managing log files,
or configuring the time zone, or handling temporary files and directories.
You could also have classes that group all the settings related to your web serving
software, your email infrastructure, or even your company's firewall.
We're just getting started with Puppet's basic resources and
seeing how they can be applied.
In further videos, we'll be learning a lot more about common practices when using
configuration management tools.
But before jumping into that, we've put together a reading with more information
about Puppet syntax, resources and links to the official reference.
Then we've got a quick quiz to check that everything is making sense.
Bolt Examples
Bolt lets you automate almost any task you can think of. These examples walk you
through beginning and intermediate level Bolt use cases, demonstrating Bolt
concepts along the way. You can find shorter examples of common Bolt patterns in
the Bolt examples repo, which is more of a reference than a learning tool.
If you'd like to share a real-world use case, reach out to us in the #bolt channel
on Slack.
Note: Do you have a real-world use case for Bolt that you'd like to share? Reach
out to us in the #bolt channel on Slack.
Resources
Sections
Resource declarations
Resource uniqueness
Resource types
Title
Attributes
Namevars and name
Metaparameters
Resource syntax
Basic syntax
Complete syntax
Resource declaration default attributes
Setting attributes from a hash
Abstract resource types
Arrays of titles
Adding or modifying attributes
Local resource defaults
Expand
Resources are the fundamental unit for modeling system configurations. Each
resource describes the desired state for some aspect of a system, like a specific
service or package. When Puppet applies a catalog to the target system, it manages
every resource in the catalog, ensuring the actual state matches the desired state.
Resources contained in classes and defined types share the relationships of those
classes and defined types. Resources are not subject to scope: a resource in any
area of code can be referenced from any other area of code.
You can delay adding resources to the catalog. For example, classes and defined
types can contain groups of resources. These resources are managed only if you
add that class or defined resource to the catalog. Virtual resources are added to the
catalog only after they are realized.
Resource declarations
At minimum, every resource declaration has a resource type, a title, and a set
of attributes:
A resource declaration has extremely low precedence; in fact, it's even lower than
the variable assignment operator (=). This means that if you use a resource
declaration for its value, you must surround it with parentheses to associate it with
the expression that uses the value.
If a resource declaration includes more than one resource body, it declares multiple
resources of that resource type. The resource body is a title and a set of attributes;
each body must be separated from the next one with a semicolon. Each resource in
a declaration is almost completely independent of the others, and they can have
completely different values for their attributes. The only connections between
resources that share an expression are:
They all have the same resource type.
They can all draw from the same pool of default values, if a resource body
with the title default is present.
Resource uniqueness
Each resource must be unique; Puppet does not allow you to declare the same
resource twice. This is to prevent multiple conflicting values from being declared
for the same attribute. Puppet uses the resource title and the name attribute
or namevar to identify duplicate resources — if either the title or the name is
duplicated within a given resource type, catalog compilation fails. See the page
about resource syntax for details about resource titles and namevars. To provide
the same resource for multiple classes, use a class or a virtual resource to add it to
the catalog in multiple places without duplicating it. See classes and virtual
resources for more information.
Resource types
Every resource is associated with a resource type, which determines the kind of
configuration it manages. Puppet has built-in resource types such as file, service,
and package. See the resource type reference for a complete list and information
about the built-in resource types.
Title
A resource's title is a string that uniquely identifies the resource to Puppet. In a
resource declaration, the title is the identifier after the first curly brace and before
the colon. For example, in this file resource declaration, the title is /etc/passwd:
file { '/etc/passwd':
} Copied!
Titles must be unique per resource type. You can have both a package and a
service titled "ntp," but you can only have one service titled "ntp." Duplicate titles
cause compilation to fail.
The title of a resource differs from the namevar of the resource. Whereas the title
identifies the resource to Puppet itself, the namevar identifies the resource to the
target system and is usually specified by the resource's name attribute. The
resource title doesn't have to match the namevar, but you'll often want it to: the
value of the namevar attribute defaults to the title, so using the name in the title can
save you some typing.
If a resource type has multiple namevars, the type specifies whether and how the
title maps to those namevars. For example, the package type uses
the provider attribute to help determine uniqueness, but that attribute has no
special relationship with the title. See each type's documentation for details about
how it maps title to namevars.
Attributes
Attributes describe the desired state of the resource; each attribute handles some
aspect of the resource. For example, the file type has a mode attribute that
specifies the permissions for the file.
Each resource type has its own set of available attributes; see the resource type
reference for a complete list. Most resource types have a handful of crucial
attributes and a larger number of optional ones. Attributes accept certain data
types, such as strings, numbers, hashes, or arrays. Each attribute that you declare
must have a value. Most attributes are optional, which means they have a default
value, so you do not have to assign a value. If an attribute has no default, it is
considered required, and you must assign it a value.
Namevars and name
Every resource on a target system must have a unique identity; you cannot have
two services, for example, with the same name. This identifying attribute
in Puppet is known as the namevar.
Each resource type has an attribute that is designated to serve as the namevar. For
most resource types, this is the name attribute, but some types use other attributes,
such as the file type, which uses path, the file's location on disk, for its namevar. If
a type's namevar is an attribute other than name, this is listed in the type reference
documentation.
Most types have only one namevar. With a single namevar, the value must be
unique per resource type. There are a few rare exceptions to this rule, such as
the exec type, where the namevar is a command. However, some resource types,
such as package, have multiple namevar attributes that create a composite
namevar. For example, both the yum provider and the gem provider
have mysql packages, so both the name and the provider attributes are namevars,
and Puppet uses both to identify the resource.
You might want to specify different a namevar that is different from the title when
you want a consistently titled resource to manage something that has different
names on different platforms. For example, the NTP service might be ntpd on Red
Hat systems, but ntp on Debian and Ubuntu. You might title the service "ntp," but
set its namevar --- the name attribute --- according to the operating system. Other
resources can then form relationships to the resource without the title changing.
Metaparameters
Some attributes in Puppet can be used with every resource type. These are
called metaparameters. These don't map directly to system state. Instead,
metaparameters affect Puppet's behavior, usually specifying the way in which
resources relate to each other.
The most commonly used metaparameters are for specifying order relationships
between resources. See the documentation on relationships and ordering for details
about those metaparameters. See the full list of available metaparameters in
the metaparameter reference.
Resource syntax
You can accomplish a lot with just a few resource declaration features, or you can
create more complex declarations that do more.
Basic syntax
The simplified form of a resource declaration includes:
file { '/etc/passwd':
} Copied!
Complete syntax
By creating more complex resource declarations, you can:
Resource declaration defaults are useful because it lets you set many attributes at
once, but you can still override some of them.
This example declares several different files, all using the default values set in the default resource
body. However, the mode value for the the files in the last array (['ssh_config',
'ssh_host_dsa_key.pub'....) is set explicitly instead of using the default.
file {
default:
ensure => file,
'ssh_host_rsa_key.pub', 'sshd_config']:
# override mode
} Copied!
Each key is the name of a valid attribute for that resource type, as a string.
Each value is a valid value for the attribute it's assigned to.
This sets values for that resource's attributes, using every attribute and value listed in the hash.
For example, the splat attribute in this declaration sets the owner, group, and mode settings for
the file resource.
$file_ownership = {
}
file { "/etc/passwd":
* => $file_ownership,
} Copied!
You cannot set any attribute more than once for a given resource; if you try, Puppet raises a
compilation error. This means that:
If you use a hash to set attributes for a resource, you cannot set a different,
explicit value for any of those attributes. For example, if mode is present in
the hash, you can't also set mode => "0644" in that resource body.
You can't use the * attribute multiple times in one resource body, since the
splat itself is an attribute.
To use some attributes from a hash and override others, either use a hash to set per-expression
defaults, as described in the section on resource declaration defaults, or use the merging
operator, + to combine attributes from two hashes, with the right-hand hash overriding the left-
hand one.
file { "/tmp/foo": ensure => file, } File { "/tmp/foo": ensure => file, }
$mytype = File
Copied!
$mytypename = "file"
Copied!
This lets you declare resources without knowing in advance what type of resources they'll be, which
can enable transformations of data into resources.
Arrays of titles
If you specify an array of strings as the title of a resource body, Puppet creates
multiple resources with the same set of attributes. This is useful when you have
many resources that are nearly identical.
For example:
$rc_dirs = [
'/etc/rc.d', '/etc/rc.d/init.d','/etc/rc.d/rc0.d',
file { $rc_dirs:
} Copied!
If you do this, you must let the namevar attributes of these resources default to their titles. You can't
specify an explicit value for the namevar, because it applies to all of the resources.
To amend attributes with the splat attribute, see the section about setting attributes
from a hash.
To amend attributes with a resource reference, add a resource reference attribute block to the
resource that's already declared. Normally, you can only use resource reference blocks to add
previously unmanaged attributes to a resource; it cannot override already-specified attributes. The
general form of a resource reference attribute block is:
file {'/etc/passwd':
File['/etc/passwd'] {
} Copied!
This example amends the owner, group, and mode attributes of any resources that match the
collector:
class base::linux {
file {'/etc/passwd':
...}
include base::linux
} Copied!
CAUTION: Be very careful when amending attributes with a collector. Test with --
noop to see what changes your code would make.
It can override other attributes you've already specified, regardless of class
inheritance.
It can affect large numbers of resources at one time.
It implicitly realizes any virtual resources the collector matches.
Because it ignores class inheritance, it can override the same attribute more
than one time, which results in an evaluation order race where the last
override wins.
To set local resource defaults, define your defaults in a variable and re-use them in
multiple places, by combining resource declaration defaults and setting attributes
from a hash.
This example defines defaults in a $file_defaults variable, and then includes the variable in a
resource declaration default with a hash.
class mymodule::params {
$file_defaults = {
# ...
"/etc/myconfig":
ensure => file,
Domain-specific language
From Wikipedia, the free encyclopedia
Overview[edit]
A domain-specific language is created specifically to solve problems in a particular domain and is
not intended to be able to solve problems outside of it (although that may be technically
possible). In contrast, general-purpose languages are created to solve problems in many
domains. The domain can also be a business area. Some examples of business areas include:
Usage patterns[edit]
There are several usage patterns for domain-specific languages: [2][3]
Processing with standalone tools, invoked via direct user operation, often on the
command line or from a Makefile (e.g., grep for regular expression matching, sed,
lex, yacc, the GraphViz toolset, etc.)
Domain-specific languages which are implemented using programming language
macro systems, and which are converted or expanded into a host general purpose
language at compile-time or realtime
embedded domain-specific language (eDSL),[4] implemented as libraries which
exploit the syntax of their host general purpose language or a subset thereof while
adding domain-specific language elements (data types, routines, methods, macros
etc.). (e.g. jQuery, React, Embedded SQL, LINQ)
Domain-specific languages which are called (at runtime) from programs written in
general purpose languages like C or Perl, to perform a specific function, often
returning the results of operation to the "host" programming language for further
processing; generally, an interpreter or virtual machine for the domain-specific
language is embedded into the host application (e.g. format strings, a regular
expression engine)
Domain-specific languages which are embedded into user applications (e.g., macro
languages within spreadsheets) and which are (1) used to execute code that is
written by users of the application, (2) dynamically generated by the application, or
(3) both.
Many domain-specific languages can be used in more than one way.[citation needed] DSL code
embedded in a host language may have special syntax support, such as regexes in sed, AWK,
Perl or JavaScript, or may be passed as strings.
Design goals[edit]
Adopting a domain-specific language approach to software engineering involves both risks and
opportunities. The well-designed domain-specific language manages to find the proper balance
between these.
Domain-specific languages have important design goals that contrast with those of general-
purpose languages:
Examples[edit]
Examples of domain-specific languages include HTML, Logo for pencil-like
drawing, Verilog and VHDL hardware description languages, MATLAB and GNU Octave for
matrix programming, Mathematica, Maple and Maxima for symbolic mathematics, Specification
and Description Language for reactive and distributed systems, spreadsheet formulas and
macros, SQL for relational database queries, YACC grammars for creating parsers, regular
expressions for specifying lexers, the Generic Eclipse Modeling System for creating diagramming
languages, Csound for sound and music synthesis, and the input languages
of GraphViz and GrGen, software packages used for graph layout and graph rewriting, Hashicorp
Configuration Language used for Terraform and other Hashicorp tools, Puppet also has its
own configuration language.
GameMaker Language[edit]
The GML scripting language used by GameMaker Studio is a domain-specific language targeted
at novice programmers to easily be able to learn programming. While the language serves as a
blend of multiple languages including Delphi, C++, and BASIC, there is a lack of structures, data
types, and other features of a full-fledged programming language. Many of the built-in functions
are sandboxed for the purpose of easy portability. The language primarily serves to make it easy
for anyone to pick up the language and develop a game.
FilterMeister[edit]
FilterMeister is a programming environment, with a programming language that is based on C,
for the specific purpose of creating Photoshop-compatible image processing filter plug-ins;
FilterMeister runs as a Photoshop plug-in itself and it can load and execute scripts or compile
and export them as independent plug-ins. Although the FilterMeister language reproduces a
significant portion of the C language and function library, it contains only those features which
can be used within the context of Photoshop plug-ins and adds a number of specific features
only useful in this specific domain.
MediaWiki templates[edit]
The Template feature of MediaWiki is an embedded domain-specific language whose
fundamental purpose is to support the creation of page templates and the transclusion (inclusion
by reference) of MediaWiki pages into other MediaWiki pages.
Metacompilers[edit]
Further information: Metacompiler
Gherkin[edit]
Gherkin is a language designed to define test cases to check the behavior of software, without
specifying how that behavior is implemented. It is meant to be read and used by non-technical
users using a natural language syntax and a line-oriented design. The tests defined with Gherkin
must then be implemented in a general programming language. Then, the steps in a Gherkin
program acts as a syntax for method invocation accessible to non-developers.
Other examples[edit]
Other prominent examples of domain-specific languages include:
Emacs Lisp
Game Description Language
OpenGL Shading Language
Gradle
ActionScript
Declarative code
The Puppet Domain Specific Language (DSL) is a declarative language, as opposed to the
imperative or procedural languages that system administrators tend to be most comfortable
and familiar with.
In theory, a declarative language is ideal for configuration baselining tasks. With the Puppet
DSL, we describe the desired state of our systems, and Puppet handles all responsibility for
making sure the system conforms to this desired state. Unfortunately, most of us are used to a
procedural approach to system administration. The vast majority of the bad Puppet code I’ve
seen has been the result of trying to write procedural code in Puppet, rather than adapting
existing procedures to Puppet’s declarative model.
Procedural code with Puppet
In some cases, writing procedural code in Puppet is unavoidable. However, such code is
rarely elegant, often creates unexpected bugs, and can be difficult to maintain. We will see
practical examples and best practices for writing procedural code when we look at the exec
resource type in Chapter 5.
Of course, it’s easy to simply say “be declarative.” In the real world, we are often tasked to
deploy software that isn’t designed for a declarative installation process. A large part of this
book will attempt to address how to handle many uncommon tasks in a declarative way. As a
general rule, if your infrastructure is based around packaged open source software, writing
declarative Puppet code will be relatively straight-forward. Puppet’s built in types and
providers will provide a declarative way to handle most of your operational tasks. If you’re
infrastructure includes Windows clients and a lot of enterprise software, writing declarative
Puppet code may be significantly more challenging.
Another major challenge system administrators face when working within the constraints of a
declarative model is that we tend to operate using an imperative workflow. How often have
you manipulated files using regular expression substitution? How often do we massage data
using a series of temp files and piped commands? While Puppet offers many ways to
accomplish the same tasks, most of our procedural approaches do not map well into Puppet’s
declarative language. We will explore some examples of this common problem and discuss
alternative approaches to solving it.
What is declarative code anyway?
As mentioned earlier, declarative code tends not to have verbs. We don’t create users and we
don’t remove them; we ensure that the users are present or absent. We don’t install or remove
software; we ensure that software is present or absent. Where create and install are verbs,
present and absent are adjectives. The difference seems trivial at first, but proves to be very
important in practice.
Imagine that I’m giving you directions to the Palace of Fine Arts in San Francisco.
Procedural instructions:
Head North on 19th Avenue
Get on US-101S
There are no road closures or other traffic disruptions that would force you to a different
route.
The directions are valid even if you’re currently at the Palace of Fine Arts
These instructions empower you to route around road closures and traffic
The declarative approach allows you to chose the best way to reach the destination based on
your current situation, and it relies on your ability to find your way to the destination given.
Declarative languages aren’t magic. Much like an address relies on your understanding how
to read a map or use a GPS device, Puppet’s declarative model relies on its own procedural
code to turn your declarative request into a set of instructions that can achieve the declared
state. Puppet’s model uses a Resource type to model an object, and a provider implementing
procedural code to produce the state the model describes.
The major limitation imposed by Puppet’s declarative model might be somewhat obvious. If
a native resource type doesn’t exist for the resource you’re trying to model, you can’t manage
that resource in a declarative way. Declaring that I want a red two-story house with 4
bedrooms might empower you to build the house out of straw or wood or brick, but it
probably won’t actually accomplish anything if you don’t happen to be a general contractor.
There is some good news on this front, however. Puppet already includes native types and
providers for most common objects, the Puppet community has supplied additional native
models, and if you absolutely have to accomplish something procedurally you can almost
always fall back to the exec resource type.
A practical example
Let’s examine a practical example of procedural code for user management. We will discuss
how to make the code can be made robust, its declarative equivalent in Puppet, and the
benefits of using Puppet rather than a shell script for this task.
groupadd examplegroup
mkdir ~alice/.ssh/
echo "ssh-rsa
AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF/2RY+r7KIeUoNbQb1TP6ApOtgJPNV\
0TY6teCjbxm7fjzBxDrHXBS1vr+fe6xa67G5ef4sRLl0kkTZisnIguXqXOaeQTJ4Idy4LZEVVbn
gkd\
2R9rA0vQ7Qx/XrZ0hgGpBA99AkxEnMSuFrD/E5TunvRHIczaI9Hy0IMXc= \
What if we decide this user should also be a member of the wheel group?
Example 1-2. Imperative user modification with BASH
useradd -g examplegroup alice
userdel alice
groupdel examplegroup
The correct process to use depends on the current state of the user
Each of these processes will produce errors if invoked more than one time
Imagine for a second that we have several systems. On some systems, example user is absent.
On other systems, alice is present, but not a member of the wheel group. On some
systems, alice is present and a member of the wheel group. Imagine that we need to write a
script to ensure that alice exists, and is a member of the wheel group on every system, and
has the correct authorized key. What would such a script look like?
Example 1-4. Robust user management with BASH
#!/bin/bash
groupadd examplegroup
fi
fi
fi
mkdir -p ~alice/.ssh
fi
echo "ssh-rsa
AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF/2RY+r7KIeUoNbQb1TP6ApOtg\
JPNV0TY6teCjbxm7fjzBxDrHXBS1vr+fe6xa67G5ef4sRLl0kkTZisnIguXqXOaeQTJ4Idy4LZE
VVb\
ngkd2R9rA0vQ7Qx/XrZ0hgGpBA99AkxEnMSuFrD/E5TunvRHIczaI9Hy0IMXc= \
alice@localhost" >> ~alice/.ssh/authorized_keys
fi
Of course, this example only covers the use case of creating and managing a few basic
properties about a user. If our policy changed, we would need to write a completely different
script to manage this user. Even fairly simple changes, such as revoking this user’s wheel
access could require somewhat significant changes to this script.
This approach has one other major disadvantage; it will only work on platforms that
implement the same commands and arguments of our reference platform. This example will
fail on FreeBSD (implements adduser, not useradd) Mac OSX, and Windows.
Declarative code
Let’s look at our user management example using Puppet’s declarative DSL.
Creating a user and group:
Example 1-5. Declarative user creation with Puppet
$ssh_key =
"AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF/2RY+r7KIeUoNbQb1TP6ApOtgJPNV0
T\
Y6teCjbxm7fjzBxDrHXBS1vr+fe6xa67G5ef4sRLl0kkTZisnIguXqXOaeQTJ4Idy4LZEVVbngk
d2R\
9rA0vQ7Qx/XrZ0hgGpBA99AkxEnMSuFrD/E5TunvRHIczaI9Hy0IMXc="
group { 'examplegroup':
user { 'alice':
ssh_authorized_key { 'alice@localhost':
user =>'alice',
type =>'ssh-rsa',
key => $ssh_key,
$ssh_key =
"AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF/2RY+r7KIeUoNbQb1TP6ApOtgJPNV0
T\
Y6teCjbxm7fjzBxDrHXBS1vr+fe6xa67G5ef4sRLl0kkTZisnIguXqXOaeQTJ4Idy4LZEVVbngk
d2R\
9rA0vQ7Qx/XrZ0hgGpBA99AkxEnMSuFrD/E5TunvRHIczaI9Hy0IMXc="
group { 'examplegroup':
user { 'alice':
ssh_authorized_key { 'alice@localhost':
ensure => 'present',
user =>'alice',
type =>'ssh-rsa',
(1) Note that the only change between this example and the previous example is the addition
of the groups parameter for the alice resource.
Remove alice:
Example 1-7. Ensure that a user is absent using Puppet
$ssh_key =
"AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF/2RY+r7KIeUoNbQb1TP6ApOtgJPNV0
T\
Y6teCjbxm7fjzBxDrHXBS1vr+fe6xa67G5ef4sRLl0kkTZisnIguXqXOaeQTJ4Idy4LZEVVbngk
d2R\
9rA0vQ7Qx/XrZ0hgGpBA99AkxEnMSuFrD/E5TunvRHIczaI9Hy0IMXc="
group { 'examplegroup':
user { 'alice':
ssh_authorized_key { 'alice@localhost':
user =>'alice',
type =>'ssh-rsa',
Group['examplegroup']
(1), (2), (3) Ensure values are changed from Present to Absent.
(4), (5) Resource ordering is added to ensure groups are removed after users. Normally, the
correct order is implied due to the Autorequire feature discussed in Chapter 5.
You may notice the addition of resource ordering in this example when it wasn’t required in previous
examples. This is a byproduct of Puppet’s Autorequire feature. PUP-2451 explains the issue in greater depth.
In practice, rather than managing alice as 3 individual resources, we would abstract this into a defined type
that has its own ensure parameter, and conditional logic to enforce the correct resource dependency ordering.
In this example, we are able to remove the user by changing the ensure state from present to
absent on the user’s resources. Although we could remove other parameters such as gid,
groups, and the users key, in most cases it’s better to simply leave the values in place, just in
case we ever decide to restore this user.
It’s usually best to disable accounts rather than remove them. This helps preserve file ownership information
and helps avoid UID reuse.
In our procedural examples, we saw a script that would bring several divergent systems into
conformity. For each step of that example script, we had to analyze the current state of the
system, and perform an action based on state. With a declarative model, all of that work is
abstracted away. If we wanted to have a user who was a member of 2 groups, we would
simply declare that user as such, as in Example 1-6.
$app_source = 'http://www.example.com/application.tar.gz'
$app_target = '/tmp/application.tar.gz'
2. This example does not validate the checksum of the downloaded file, which could
produce some odd results upon extraction. An additional exec resource might be used to
test and correct for this case automatically.
3. In some cases, a partial or corrupted download may wedge this process. We attempt to
work around this problem by overwriting the archive each time it’s downloaded.
4. This example makes several assumptions about the contents of application.tar.gz. If any
of those assumptions are wrong, these commands will repeat every time Puppet is
invoked.
5. This example is not particularly portable, and would require a platform specific
implementation for each supported OS.
6. This example would not be particularly useful for upgrading the application.
This is a relatively clean example of non-declarative Puppet code, and tends to be seen when
working with software that is not available in a native packaging format. Had this application
been distributed as an RPM, dpkg, or MSI, we could have simply used a package resource for
improved portability, flexibility, and reporting. While this example is not best practices, there
are situations where is unavoidable, often for business or support reasons.
Another common pattern is the use of conditional logic and custom facts to test for the
presence of software. Please don’t do this unless it’s absolutely unavoidable:
Facter.add(:example_app_version) do
setcode do
Facter::Core::Execution.exec('/usr/local/app/example_app --version')
end
end
$app_source = 'http://www.example.com/app-1.2.3.tar.gz'
$app_target = '/tmp/app-1.2.3.tar.gz'
if $example_app_version != '1.2.3' {
This particular example has many of the same problems of the previous example, and
introduces one new problem: it breaks Puppet’s reporting and auditing model. The
conditional logic in this example causes the download and extraction resources not to appear
in the catalog sent to the client following initial installation. We won’t be able to audit our
run reports to see whether or not the download and extraction commands are in a consistent
state. Of course, we could check the example_application_version fact if it happens to be
available, but this approach becomes increasingly useless as more resources are embedded in
conditional logic.
This approach is also sensitive to factor and plugin sync related issues, and would definitely
produce some unwanted results with cached catalogs.
Using facts to exclude parts of the catalog does have one benefit: it can be used to obfuscate
parts of the catalog so that sensitive resources do not exist in future Puppet runs. This can be
handy if, for example, your wget command embeds a passphrase, and you wish to limit how
often it appears in your catalogs and reports. Obviously, there are better solutions to that
particular problem, but in some cases there is also benefit to security in depth.
Idempotency
In computer science, an idempotent function is a function that will return the same value each
time it’s called, whether it’s only called once, or called 100 times. For example: X = 1 is an
idempotent operation. X = X + 1 is a non-idempotent,recursive operation.
Puppet as a language is designed to be inherently idempotent. As a system, Puppet designed
to be used in an idempotent way. A large part of this idempotency owed to its declarative
resource management model, however Puppet also enforces a number of rules on its variable
handling, iterators, and conditional logic to maintain its idempotency.
For example, if for some reason Puppet fails part way through a configuration run, re-
invoking Puppet will complete the run and repair any configurations that were left in an
inconsistent state by the previous run.
Convergence vs Idempotence
With an eventually convergent system, the configuration management tool is invoked over
and over; each time the tool is invoked, the system approaches a converged state, where all
changes defined in the configuration language have been applied, and no more changes can
take place. During the process of convergence, the system is said to be in a partially
converged, or inconsistent state.
Convergence of course also implies the existence of a diverged state. Divergence is the act of
moving the system away from the desired converged state. This typically happens when
someone attempts to manually alter a resource that is under configuration management
control.
There are many practices that can break Puppet’s idempotence model. In most cases,
breaking Puppet’s idempotence model would be considered a bug, and would be against best
practices. There are however some cases where a level of eventual convergence is
unavoidable. One such example is handling the numerous post-installation software reboots
that are common when managing Windows nodes.
Side effects
In computer science, a side effect is a change of system or program state that is outside the
defined scope of the original operation. Declarative and idempotent languages usually
attempt to manage, reduce, and eliminate side effects. With that said, it is entirely possible for
an idempotent operation to have side effects.
Puppet attempts to limit side effects, but does not eliminate them by any means; doing so
would be nearly impossible given Puppet’s role as a system management tool.
Some side effects are designed into the system. For example, every resource will generate a
notification upon changing a resource state that may be consumed by other resources. The
notification is used to restart services in order to ensure that the running state of the system
reflects the configured state. File bucketing is another obvious intended side effect designed
into Puppet.
Some side effects are unavoidable. Every access to a file on disk will cause that file’s atime
to be incremented unless the entire filesystem is mounted with the noatime attribute. This is
of course true whether or not Puppet is being invoked in noop mode.
useradd alice
The following code is not idempotent, because it will add undesirable duplicate host entries
each time it’s invoked:
Example 1-9. A non-idempotent operation that will create duplicate records
The following code is idempotent, but will probably have undesirable results:
Example 1-10. An idempotent operation that will destroy data
To make our example idempotent without clobbering /etc/hosts, we can add a simple check
before modifying the file:
Example 1-11. An imperative idempotent operation
host { 'example.localdomain':
ip => '127.0.0.1',
Alternatively, we could implement this example using the file_line resource type from the
optional stdlib Puppet module:
Example 1-13. Idempotent host entry using the File_line resource type
In both cases, the resource is modeled in a declarative way and is idempotent by its very
nature. Under the hood, Puppet handles the complexity of determining whether the line
already exists, and how it should be inserted into the underlying file. Using the native host
resource type, Puppet also determines what file should be modified and where that file is
located.
The idempotent examples are safe to run as many times as you like. This is a huge benefit
across large environments; when trying to apply a change to thousands of hosts, it’s relatively
common for failures to occur on a small subset of the hosts being managed. Perhaps the host
is down during deployment? Perhaps you experienced some sort of transmission loss or
timeout when deploying a change? If you are using an idempotent language or process to
manage your systems, it’s possible to handle these exceptional cases simply by performing a
second configuration run against the affected hosts (or even against the entire infrastructure.)
When working with native resource types, you typically don’t have to worry about
idempotence; most resources handle idempotence natively. A couple of notable exceptions to
this statement are the exec and augeas resource types. We’ll explore those in depth in
Chapter 5.
Puppet does however attempt to track whether or not a resource has changed state. This is
used as part of Puppet’s reporting mechanism and used to determine whether or not a signal
should be send to resources with a notify relationship. Because Puppet tracks whether or not a
resource has made a change, it’s entirely possible to write code that is functionally
idempotent, without meeting the criteria of idempotent from Puppet’s resource model.
For example, the following code is functionally idempotent, but will report as having
changed state with every Puppet run.
Example 1-14. Puppet code that will report as non-idempotent
Puppet’s idempotence model relies on a special aspect of its resource model. For every
resource, Puppet first determines that resource’s current state. If the current state does not
match the defined state of that resource, Puppet invokes the appropriate methods on the
resources native provider to bring the resource into conformity with the desired state. In most
cases, this is handled transparently, however there are a few exceptions that we will discuss
in their respective chapters. Understanding these cases will be critical in order to avoid
breaking Puppet’s simulation and reporting models.
path =>'/bin',
In this case, unless provides a condition Puppet can use to determine whether or not a change
actually needs to take place.
Using condition such as unless and only if properly will help produce safe and robust exec resources. We will
explore this in depth in Chapter 5.
A final surprising example is the notify resource, which is often used to produce debugging
information and log entries.
Example 1-16. The Notify resource type
notify { 'example':
message => 'Danger, Will Robinson!'
The notify resource generates an alert every time its invoked, and will always report as a
change in system state.
Run level idempotence is a place where Puppet’s model of change becomes just as important
as whether or not the resources are functionally idempotent. Remember that before
performing any configuration change, Puppet will first determine whether or not the resource
currently conforms to policy. Puppet will only make a change if resources are in an
inconsistent state. The practical implication is that if Puppet does not report having made any
changes, you can trust this is actually the case.
In practice, determining whether or not your Puppet runs are truly idempotent is fairly
simple: If Puppet reports no changes upon its second invocation on a fresh system, your
Puppet codebase is idempotent.
Because Puppet’s resources tend to have side effects, it’s much possible (easy) to break
Puppet’s idempotence model if we don’t carefully handle resource dependencies.
Example 1-17. Ordering is critical for run-level idempotence
package { 'httpd':
file { '/etc/httpd/conf/httpd.conf':
Package['httpd'] ->
File['/etc/httpd/conf/httpd.conf']
The file resource will not create paths recursively. In example 1-17, the httpd package must
be installed before the httpd.conf file resource is enforced; and it depends on the existence
of /etc/httpd/conf/httpd.conf, which is only present after the httpd package has been
installed. If this dependency is not managed, the file resource becomes non-idempotent; upon
first invocation of Puppet it may throw an error, and only enforce the state of httpd.conf upon
subsequent invocations of Puppet.
Such issues will render Puppet convergent. Because Puppet typically runs on a 30 minute
interval, convergent infrastructures can take a very long time to reach a converged state.
There are a few other issues that can render Puppet non-idempotent
Non-deterministic code
As a general rule, the Puppet DSL is deterministic, meaning that a given set of inputs
(manifests, facts, exported resources, etc) will always produce the same output with no
variance.
$example = {
Another common cause of non-deterministic code pops up when our code is dependent on a
transient state.
file { '/tmp/example.txt':
Example 1-18 will not be idempotent if you have a load balanced cluster of Puppet Masters.
The value of $::servername changes depending on which master compiles the catalog for a
particular run.
With non-deterministic code, Puppet loses run level idempotence. For each invocation of
Puppet, some resources will change shape. Puppet will converge, but it will always report
your systems as having been brought into conformity with its policy, rather than being
conformant. As a result, it’s virtually impossible to determine whether or not changes are
actually pending for a host. It’s also more difficult to track what changes were made to the
configuration, and when they were made.
Non deterministic code also has the side effect that it can cause services to restart due to
Puppet’s notify behavior. This can cause unintended service disruption.
Stateless
Puppet’s client / server API is stateless, and with a few major (but optional) exceptions,
catalog compilation is a completely stateless process.
A stateless system is a system that does not preserve state between requests; each request is
completely independent from previous request, and the compiler does not need to consult
data from previous request in order to produce a new catalog for a node.
Puppet uses a RESTful API over HTTPS for client server communications.
With master/agent Puppet, the Puppetmaster need only have a copy of the facts supplied by
the agent in order to compile a catalog. Natively, Puppet doesn’t care whether or not this is
the first time it has generated a catalog for this particular node, nor whether or not the last run
was successful, or if any change occurred on the client node during the last run. The nodes
catalog is compiled in its entirety every time the node issues a catalog request. The
responsibility for modeling the current state of the system then rests entirely on the client, as
implemented by the native resource providers.
IF you don’t use a puppetmaster or have a small site with a single master, statelessness may
not be a huge benefit to you. For medium to large sites however, keeping Puppet stateless is
tremendously useful. In a stateless system, all Puppetmasters are equal. There is no need to
synchronize data or resolve conflicts between masters. There is no locking to worry about.
There is no need to design a partition tolerant system in case you lose a datacenter or data-
link, and no need to worry about clustering strategies. Load can easily be distributed across a
pool of masters using a load balancer or DNS SRV record, and fault tolerance is as simple as
ensuring nodes avoid failed masters.
It is entirely possible to submit state to the master using custom facts or other techniques. It’s
also entirely possible to compile a catalog conditionally based on that state. There are cases
where security requirements or particularly idiosyncratic software will necessitate such an
approach. Of course, this approach is most often used when attempting to write non-
declarative code in Puppet’s DSL. Fortunately, even in these situations, the Server doesn’t
have to actually store the node’s state between runs; the client simply re-submits its state as
part of its catalog request.
If you keep your code declarative, it’s very easy to work with Puppet’s stateless client/server
configuration model. IF a manifest declares that a resource such as a user should exist, the
compiler doesn’t have to be concerned with the current state of that resource when compiling
a catalog. The catalog simply has to declare a desired state, and the Puppet agent simply has
to enforce that state.
Puppet’s stateless model has several major advantages over a stateful model:
Puppet scales horizontally
It is worth noting that there are a few stateful features of Puppet. It’s important to weigh the
value of these features against the cost of making your Puppet infrastructure stateful, and to
design your infrastructure to provide an acceptable level of availability and fault tolerance.
We will discuss how to approach each of these technologies in upcoming chapters, but a
quick overview is provided here.
Sources of state
In the beginning of this section, I mentioned that there are a few features and design patterns
that can impose state on Puppet catalog compilation. Let’s look at some of these features in a
bit more depth.
Filebucketing
Filebucketing is an interesting and perhaps underappreciated feature of the File resource type.
If a filebucket is configured, the file provider will create a backup copy of any file before
overwriting the original file on disk. The backup may be bucketed locally, or it can be
submitted to the Puppetmaster.
Bucketing your files is useful for keeping backups, auditing, reporting, and disaster recovery.
It’s immensely useful if you happen to blast away a configuration you needed to keep, or if
you discover a bug and would like to see how the file is changed. The Puppet enterprise
console can use filebucketing to display the contents of managed files.
Filebuckets can also be used for content distribution, however using a filebucket this way
creates state. Files are only present in a bucket when placed there; either as a backup from a
previous run, or by the static_compiler terminus. Placing a file in the bucket only happens
during a Puppet run, and Puppet has no internal facility to synchronize buckets between
masters. Reliance upon file buckets for content distribution can create problems if not applied
cautiously. It can create problems when migrating hosts between datacenters, when
rebuilding masters. Use of filebucketing in your modules can also create problems during
local testing with puppet apply.
Exported resources
Exported resources provide a simple service discovery mechanism for Puppet. When a
puppetmaster or agent compiles a catalog, resources can be marked as exported by the
compiler. Once the resources are marked as exported, they are recorded in a SQL database.
Other nodes may then collect the exported resources, and apply those resources locally.
Exported resources persist until they are overwritten or purged.
As you might imagine, exported resources are, by definition stateful and will affect your
catalog if used.
We will take an in depth look at PuppetDB and exported resources in Chapter 2. For the time
being, just be aware that exported resources introduce a source of state into your
infrastructure.
In this example, a pool of webservers export their pool membership information to a haproxy
load balancer, using the puppetlabs/haproxy module and exported resources.
Example 1-19. Declaring state with an exported resource
include haproxy
haproxy::listen { 'web':
Haproxy::Balancermember >
This particular example is a relatively safe use of exported resources; if PuppetDB for some reason became
unavailable the pool would continue to work; new nodes would not be added to the pool until PuppetDB was
restored. TODO: Validate what I just said is true given the internal use of concat on this module…
Exported resources rely on PuppetDB, and are typically stored in a PostgreSQL database.
While the PuppetDB service is fault tolerant and can scale horizontally, the PostgreSQL itself
scales Vertically and introduces a potential single point of failure into the infrastructure.
Hiera
Hiera is by design a pluggable system. By default is provides JSON and YAML backends,
both of which are completely stateless. However, it is possible to attach Hiera to a database or
inventory service, including PuppetDB. If you use this approach, it can introduce a source of
state into your Puppet Infrastructure. We will explore Hiera in depth in Chapter 6.
Inventory and reporting
There are plugins to Puppet that allow inventory information to be used during catalog
compilation, however these are not core to Puppet.
Custom facts
Facts themselves do not inherently add state to your Puppet manifests, however they can be
used to communicate state to the Puppetmaster, which can then be used to compile
conditional catalogs. Using facts in this way does not create the scaling and availability
problems inherent in server site state, but it does create problems if you intend to use cached
catalogs, and it does reduce the effectiveness of your reporting infrastructure.
Summary
In this chapter, we reviewed the major design features of Puppet’s language, both in terms of
the benefits provided by Puppet’s language, and the restrictions its design places on us.
Future chapters will provide more concrete recommendations for the usage of Puppet’s
language, overall architecture of Puppet, and usage of Puppet’s native types and providers.
Building code that leverages Puppet’s design will be a major driving force behind may of the
considerations in future chapters.
MODULE REVIEW
Important details: