You are on page 1of 31

Let SMF Deal With That:

An introduction to the
Service Management
Framework
Ellard Roush
Sun Microsystems

1
Agenda

• Introduction to Service Management Framework


(SMF)
• Commands Demo
• Service Manifest and Development
• Q&A

2
Service Management pre-SMF
• Daemons started by scripts delivered into /etc/rc*.d,
or by inetd (through /etc/inetd.conf)
> Dependencies expressed through script numbering
(fragile, imprecise)
• Common operations like stopping a service now &
forever required two different steps
> Easy to forget one
> Often undone by patching or upgrade anyway
• Daemon death ignored after start
• OS didn't know consequences of memory errors in
daemons – had to panic

3
What is SMF?
• It is solution to all the problems on the last slide
• It is half of Predictive Self Healing
> It works with the Solaris Fault Manager to gracefully
recover from uncorrectable hardware errors
• It provides public, documented interfaces that ISVs
and customers can use
• It is used automatically
> No need to turn it on
> No way to turn it off

4
SMF basics: svc.startd
• A new system daemon, svc.startd, has taken over most
of init's responsibilities in starting system services
• init still uses inittab, and /etc/rc*.d scripts are still run
• svc.startd can automatically restart services
> If sshd is “enabled”, then it is
– started at boot
– restarted if it dies (even if killed)
> sshd may be disabled by a single command
– stopped
– not started at boot
– not started after patch or upgrade

5
Service states
• SMF lets the admin set whether each service is
enabled or disabled
• SMF keeps a state for each service
> uninitialized has not been evaluated yet
> disabled service is disabled, not running
> offline enabled, waiting for dependencies
> online enabled and running
> degraded running below full performance
> maintenance service problem occurred
6
Service dependencies
• Services may declare dependencies on each other
• svc.startd starts services in dependency order
> Independent services started in parallel → faster boot
• Uncorrectable hardware errors handled better
> Daemon is restarted
> Services which depend on it can be restarted
• Enabled services hang out in the offline state until their
dependencies are met
> A new command answers “What services is service X
waiting for?”

7
SMF configuration
• Service meta-configuration
(enabled status, state,
dependencies, methods, etc.) is svc.startd
kept in the Service Configuration
Facility (SCF), also known as the SMF tools
SMF repository
• The repository is controlled by svc.configd

svc.configd, another new


daemon
• The repository is (currently)
stored in repository.db
/etc/svc/repository.db

8
Service names: FMRIs
• Services are named by Fault Management Resource
Identifiers, or FMRIs
> URI syntax
svc:/system/cron:default

service name instance name


• Note that while the service name usually contains slashes,
there are no service directories! The namespace is flat.
• Commands accept abbreviations (system/cron, cron)
and glob patterns

9
Service instances
• To allow configuration sharing,
services are represented as repository
instance nodes which are
children of service nodes service properties
• Both service nodes and
instance nodes can have instance properties
properties
• If an instance doesn't have instance properties
property X, the service's
property X is used
• Dependencies on a service are service
satisfied if any of its instances
are online service
> Frees dependents from
knowing implementation
10
Commands: svcs(1)
• Without arguments, lists state, state-time, and FMRI of
services that are enabled; with -a, lists all services
• Show dependencies (-d) and dependents (-D)
• Show member processes (-p), additional details (-v/-l)

$ svcs
STATE STIME FMRI
....
online 18:18:30 svc:/network/http:apache2
online 18:18:29 svc:/network/smtp:sendmail
....
$ svcs -p sendmail
STATE STIME FMRI
online 18:18:29 svc:/network/smtp:sendmail
18:18:29 100180 sendmail
18:18:29 100181 sendmail
$ svcs -d sendmail
STATE STIME FMRI
online 18:17:44 svc:/system/identity:domain
online 18:17:52 svc:/network/service:default
....

11
Commands: svcs -x
• Answers the question: What's wrong with my system?
• Explains why services are offline, impact of non-running
services
• Gives pointers to knowledge documents, log files to help
you determine the cause and find a remedy

$ svcs telnet
STATE STIME FMRI
offline 7:38:17 svc:/network/telnet:default
$ svcs -x
svc:/network/inetd:default (inetd)
State: disabled since Wed Jan 25 07:38:17 2006
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: inetd(1M)
See: /var/svc/log/network-inetd:default.log
Impact: 17 dependent services are not running. (Use -v for list.)
12
Commands: svcadm(1M)
• svcadm manipulates services
> svcadm enable enables services, services start when
dependencies are ready
> svcadm disable disables services
> svcadm restart stops and starts services
> svcadm refresh commits the current properties (to the running
snapshot) and instructs the service to re-read its configuration
> svcadm clear signals that a service in maintenance has been
fixed
• These commands are asynchronous: they issue commands to
svc.startd and return immediately
• With -s, enable & disable wait until completion (synchronous)
• With -t, enable & disable are temporary (until next boot)

13
Commands: svccfg(1M)

• Interactive access to properties and snapshots

# svccfg
svc:> select network/http:apache2
svc:/network/http:apache2> listprop
...
general framework
general/enabled boolean false
...
start method
start/exec astring "/lib/svc/method/http-apache2 start"
start/timeout_seconds count 60
start/type astring method
svc:/network/http:apache> editprop
[ $EDITOR launches on a temporary file containing property settings ]
svc:/network/http:apache2> exit
# svcadm refresh apache2 # read latest configuration
# svcadm restart apache2 # restart with latest configuration

14
Commands Demo

15
Troubleshooting
• Service failures printed to console, syslog
• Start with svcs -x output
> Often gives concise reason
> Provides link to knowledge document at sun.com
> Gives path to log file
• Use svcadm clear to clear maintenance state from repaired
services
• Use svccfg to tweak debugging variables:
> svccfg -s system/foo setenv LD_PRELOAD libumem.so
> svccfg -s system/foo setenv UMEM_DEBUG default

16
Recovery
• If a single service is broken, make sure you've got the latest
service config: svcadm refresh <fmri>
• Follow instructions from svcs -x pointer
• Revert to a previous snapshot.
$ svccfg -s system/cron:default
svc:/system/cron:default> listsnap
initial
last-import
previous
running
start
svc:/system/cron:default> revert start
svc:/system/cron:default> exit
$ svcadm refresh cron
$ svcadm restart cron

17
Delegated Restarters
• svc.startd's model isn't right for all services (inetd, clustering)
• SMF allows a service to be a delegated restarter for other
services
> Start, stop, and refresh services however they want
> Responsible for managing instance states
> svc.startd still handles enabledness & dependencies, though
• inetd was reimplemented as a delegated restarter
> Methods are called inetd_start, inetd_stop, etc.
> Services come online when inetd starts listening for them
> The repository is used for configuration instead of inetd.conf
• A public delegated restarter API is planned

18
/etc/inetd.conf& inetadm(1M)
• inetd.conf is no longer the primary configuration
• Most Solaris inet services have been converted
• Entries in inetd.conf are automatically converted
during install & upgrade by inetconv(1M)
• If something adds an entry to /etc/inetd.conf,
inetd(1M) will detect and issue a warning message
> Run inetconv again to convert the new entry
• inetadm(1M) can be used to modify inetd-
specific properties

19
Service Development: Benefits
• Services appear with SMF FMRIs
> Visible using standard Solaris tools; your service appears
in administrative heads-up displays
> Manageable using standard Solaris tools; admin can
leverage existing knowledge to use your service
> New generic tools developed will automatically see your
service
• Built-in restart due to administrative error, software,
or hardware fault
• Participation in future software diagnosis
capabilities
20
Service Development: Tasks
• An existing Solaris service may be converted
incrementally, and to different levels
> Get it working: write a manifest using existing init script
as start/stop method
> Handle error cases: refine methods
> Full restartability: if service has multiple components,
split them into individual services
> Customized error/restart handling: avoid service restart if
fault can be handled internally

21
Service manifests
• A service is delivered by an XML file called a manifest
> Describes dependencies, methods, and properties
• Manifests are delivered into /var/svc/manifest
• During startup, new manifests in /var/svc/manifest
and old manifests which have changed are loaded into
the repository with the svccfg(1M) command
• Do not edit manifests in /var/svc/manifest; make
customizations with svccfg(1M), etc.
> Repository customizations will be preserved across
patch & upgrade
22
Manifest Creation
• Name your service
• Identify whether your service may have multiple instances
• Identify how your service is started/stopped
• Determine faults to be ignored, if any
• Identify dependencies
• Identify dependents
• Create at least one instance
• Create template information to describe your service

23
Example Manifest: utmpd(1M)
<service name='system/utmp' type='service' version='1'>
<create_default_instance enabled='true' />
<single_instance />
<dependency name='milestone' grouping='require_all'
restart_on='none' type='service'>
<service_fmri value='svc:/milestone/sysconfig'/>
</dependency>
<dependent name='utmpd_multi-user' grouping='optional_all'
restart_on='none'>
<service_fmri value='svc:/milestone/multi-user'/>
</dependent>
<exec_method type='method' name='start'
exec='/lib/svc/method/svc-utmpd' timeout='60' />
<exec_method type='method' name='stop'
exec=':kill' timeout='60' />
<stability value='Unstable' />
<template>
<common_name><loctext xml:lang='C'>
utmpx monitoring
</loctext></common_name>

<documentation>
<manpage title='utmpd' section='1M'
manpath='/usr/share/man' />
</documentation>
</template>
</service>

24
Method refinement
• On failure, explain the problem to stdout or
stderr (goes to a log) and exit with a non-0 code
> If the failure is not transient, return
$SMF_EXIT_ERR_FATAL or $SMF_EXIT_ERR_CONFIG
from /lib/svc/share/smf_include.sh
• On success, don't return until service is ready to
serve clients
> Dependent services may be started immediately

25
Commands: svcprop(1)
• List properties of services and instances
• Fetch individual properties for use in scripts
$ svcprop network/http:apache2
...
physical/entities fmri svc:/network/physical:default
physical/grouping astring optional_all
physical/restart_on astring error
physical/type astring service
start/exec astring /lib/svc/method/http-apache2\ start
start/timeout_seconds count 60
start/type astring method
stop/exec astring /lib/svc/method/http-apache2\ stop
stop/timeout_seconds count 60
stop/type astring method
restarter/auxiliary_state astring none
restarter/next_state astring none
restarter/state astring disabled
restarter/state_timestamp time 1102030556.737590000
$ svcprop -p enabled network/http:apache2
false

26
Development: Other Examples
• Manifest DTD is documented; read it at
/usr/share/lib/xml/dtd/service_bundle.dtd.1
• Explore /var/svc/manifest for similar services
> system/utmp is a simple standalone daemon
> system/coreadm is a simple configuration service
> network/telnet is an inetd-managed daemon
• Explore /lib/svc/method for similar methods

27
Service Packaging
• Use i.manifest and r.manifest from
/usr/sadm/install/scripts
> (from S10U2 or OpenSolaris)
• Manifests delivered into /var/svc/manifest with
type “f” and class “manifest”
> Use /var/svc/manifest/site if the service is
specific to your site
> Use another directory if you're an ISV, but remember a
uniquifier (e.g. stock ticker)
• Methods delivered with your application binaries
(/opt strongly recommended)
28
Developer References
• Manifest development
> /usr/share/lib/xml/dtd/service_bundle.dtd.1

> Look in /var/svc/manifest for examples


> inetconv -i file to create an empty inetd manifest
> smf_method(5) – information for writing methods
> inetd(1M) – inetd-specific method information

29
Additional Resources
• Discussion and further information at
http://opensolaris.org/os/community/smf
• Additional quickstart and developer documentation
available at
http://www.sun.com/bigadmin/content/selfheal/
• Solaris System Administration Guide has SMF
information:
http://docs.sun.com/app/docs/doc/817-1985
• smf(5) manpage introduces the facility
• Blogs:
> http://blogs.sun.com/sch
> http://blogs.sun.com/lianep
30
Let SMF Deal With That:
An introduction to the
Service Management
Framework
Ellard Roush

31