You are on page 1of 52

Solaris 10 Containers

Kimberly Chang
OS Ambassador
Solaris 10 Adoption, US Client Solutions
http://webhome.sfbay/kchangs
http://blogs.sun.com/kchangs
Server Virtualization
Solaris Containers and Solaris Dynamic System Domains

Container 1 Container 2 Container 3

Container 4 Container 5
Domain 1 Domain 2
Sun Server
Server Virtualization
• Consolidates multiple applications
• Provides security perimeter between applications
and underlying system
• Makes more effective use of hardware
• Simplifies administration
• Adds flexibility to resource management
• Can be hardware- or software- based
Container Components
• Full Resource Containment - SRM (Solaris 9)
> Provides predictable service levels
• Isolation -Zones (Solaris 10)
> Prevent unauthorized access (security boundary)
> Minimize fault propagation (fault boundary)
• Service Management Application
> Ease of management – GUI Container Manager
Zones

• Provides virtualized OS environments, each looking


like a Solaris instance
> Implemented via a lightweight layer in the OS
> Details of physical resources are hidden
> Separate nodename, IP address, IP port space
> Processes cannot see or affect processes in other
containers
> Each zone can be administered independently
> No porting as the ABI/API is the same
Zones Block Diagram
red zone (red.com) sun zone (sun.com) global zone
zone root: /zone/redzone zone root: /zone/sunzone (serviceprovider.com)
web services login services
(Apache 1.3.22, J2SE) (OpenSSH sshd 3.4) login services
(OpenSSH sshd 3.4)
enterprise services web services
(Oracle 8i, IAS 6) (Apache 2.0, Tomcat)
core services
(ypbind, automountd)
core services core services
(ypbind, automountd) (ldap_cachemgr, inetd)

ce0:1 /red /usr /opt ge0:1 ge0:2 /usr /opt

zoneadmd zoneadmd

zone management
zonecfg(1M), zoneadm(1M), zlogin(1), ...

/usr /opt
/aux0/redspace

Storage By default, zone refers


to non-global zone.
ce0 ge0
Granularity

• 8,000+ zones per OS instance


• Doesn't require dedicated CPUs, memory,
physical devices, etc.
> Just space for unique root filesystem
• Existing hardware resources can be:
> multiplexed across zones, or
> allocated per zone using resource pools
Security

• Security boundary around each zone


• Restricted subset of privileges
> A compromised zone is unable to escalate its own
privileges
• Important name spaces are isolated
• Processes running in a zone are unable to affect
activity in other zones or the global zone
Zone Security Properties
• Services can be isolated from each other
> Quarantining potentially risky software
> Isolating multiple dis-trusting parties
> Containing potential damage by a breach
• Global Zone can:
> observe all activities inside each zone
> not be seen by software in each zone
> change the contents or processes in each zone
• Non-global Zones run with less privileges
Zones are Less Privileged
“contract_event” Request reliable delivery of events "proc_lock_memory" Lock pages in physical memory
“contract_observer” Observe contract events for other users "proc_owner" See/modify other process states
"cpc_cpu” Access to per-CPU perf counters "proc_priocntl" Increase priority/sched class
"dtrace_kernel" DTrace kernel tracing "proc_session" Signal/trace other session process
"dtrace_proc" DTrace process-level tracing "proc_setid" Set process UID
"dtrace_user" DTrace user-level tracing "proc_taskid" Assign new task ID
"file_chown" Change file's owner/group IDs “proc_zone” Signal/trace processes in other zones
"file_chown_self" Give away (chown) files “sys_acct” Manage accounting system (acct)
"file_dac_execute" Override file's execute perms “sys_admin System admin tasks (e.g. domain name)
"file_dac_read" Override file's read perms "sys_audit" Control audit system
"file_dac_search" Override dir's search perms "sys_config" Manage swap
"file_dac_write" Override (non-root) file's write perms "sys_devices" Override device restricts (exclusive)
"file_link_any" Create hard links to diff uid files "sys_ipc_config" Increase IPC queue
"file_owner" Non-owner can do misc owner ops "sys_linkdir" Link/unlink directories
"file_setid" Set uid/gid (non-root) to diff id "sys_mount" Filesystem admin (mount,quota)
"ipc_dac_read" Override read on IPC, Shared Mem perms "sys_net_config" Config net interfaces,routes,stack
"ipc_dac_write" Override write on IPC, Shared Mem perms "sys_nfs" Bind NFS ports and use syscalls
"ipc_owner" Override set perms/owner on IPC "sys_res_config" Admin processor sets, res pools
"net_icmpaccess" Send/Receive ICMP packets "sys_resource" Modify res limits (rlimit)
"net_privaddr" Bind to privilege port (<1023+extras) "sys_suser_compat" 3rd party modules use of suser
"net_rawaccess” Raw access to IP "sys_time" Change system time
"proc_audit” Generate audit records
"proc_chroot” Change root (chroot)
"proc_clock_highres" Allow use of hi-res timers
"proc_exec" Allow use of execve() Interesting Some interesting privileges
"proc_fork" Allow use of fork*() calls Basic Non-root privileges
"proc_info" Examine /proc of other processes Removed Not available in Zones
Processes

• Certain system calls are not permitted or have


restricted scope inside a zone
> http://developers.sun.com/solaris/articles/application_in_zone.html
• All processes can be seen inside the global zone
> But control of those processes is privileged
• Inside a zone, only processes in the same zone can
be seen or affected
• proc(4) only shows processes in the same zone
Zone Filesystem
Global root /

... .... .... /zone /usr /dev ... .... ....

zone1 2 3 Global view


Zone root / Zone view
Zone 1

/etc /bin /usr /dev /export /proc


File Systems & Devices

• Each zone is allocated its own root file system


> No access to other zones' root file system
> Private /dev directory mounted in zone
• Sparse-root vs. Whole-root
> Sparse: subset of packages; sharing of execs, libs,
data. /usr,/sbin,/lib,/platform by default
inherited in a read-only manner via lofs
> Whole: copies are made (needs more storage)
• Raw devices can be given to a zone with caution
Network & Identity

• Each zone controls its identity


> Node name, RPC domain name, time zone, locale
> Each container can use a different naming service (DNS,
LDAP and NIS, etc.)
> Private IP addresses, ports
• Separate /etc/passwd files means that unique
root users can be assigned
• Only one TCP/IP stack per kernel
> Zones shielded from stack specifics – routing, devices, etc.
> Cannot view other zones traffic
Zones and Resource Pools

Non-Global Non-Global Non-Global Global Zone


Zone1 Zone2 Zone3

cpu1 cpu2 cpu3 cpu4


cpu5 cpu6 cpu7 cpu8

Resource Pool B Default Resource Pool


Resource Pool A ● Processor set (now)
● Scheduling Class (now)
● Memory Set (TBD)

● Swap Set (TBD)


Solaris Container

• Solairs Containers =
Zones + Resource Management
• Oracle license honor Containers (Zones+RM)
> http://oracle.com/corporate/pricing/specialtopics.html
> Running Oracle Database in Solaris 10 Containers Best Practices
- Metalink# 317257.1
FSS Scheduling Class
• CPU allocation is based on “shares” assigned to
projects or zones
> Share defines a guaranteed floor, rather than a cap
> Only impose a limit when there is a shortage of CPU
> Default share value is 1 share
• FSS works within a processor set
• Avoid mixing scheduling classes within a pset
• FSS class can be used for workloads having
different CPU utilization patterns
> e.g. OLTP, DSS, java
Solaris Container
Resource Management – Fair Share Scheduler

App C
App A 20% App A
30% 20% App D
33%

App B App B
33% App C
50%
14%

App A (3 shares) App B (5 shares) App C (2 shares) App D (5 shares)

Shares describe relative ratio...


Fair Share Scheduler (FSS(7D))
• Assigns resources based on number of shares
assigned / number of shares on the system
• Two-level model
> Top Level: Global zone administrator assigns shares to
zones
> Second Level: Zone administrator assigns shares to
projects
• A project's CPU allocation depends on project shares
as well as zone shares
• Most likely to use one approach, not both at the
same time.
Two Level FSS
1

4
3
5
2
twilight
4
drop
fracture 1 Database
3
Project
global
6
Shares Allocated
to Zones Shares Allocated by
Zone Administrator
2 6 2 6 6
x = x = ~ 7.8%
(3+1+2+1) (4+5+4+3+6) 7 22 77
Enabling FSS Scheduler
• Set FSS to be default scheduler class unpon next reboot
> # dispadmin -d FSS
> 'dispadmin -d' creates /etc/dispadmin.conf
• Dynamically switch to FSS scheduler
> Sysetup init script
> # dispadmin -d FSS
> # /etc/init.d/sysetup start
> 'priocntl' command
> # priocntl -s -c FSS -i all
> # priocntl -s -c FSS -i pid 1
• Verify
> # ps -cafe
> # ps -ef -o user,pid,class,comm
Examples
Single Application Containers

global zone (v880-room2-rack5-1; 129.76.1.12)


dns1 zone (dnsserver1) web1 zone (foo.org) web2 zone (bar.net) mail zone (mailserver)
zone root: /zone/dns1 zone root: /zone/web1 zone root: /zone/web2 zone root: /zone/mail1

login services login services login services login services


(SSH sshd) (SSH sshd) (SSH sshd, telnetd) (SSH sshd)

Environment
Application
network services network services network services network services
(named) (Apache, Tomcat) (IWS) (sendmail, IMAP)

core services core services core services core services


(inetd) (inetd) (inetd) (inetd)
hme0:1

hme0:2

hme0:3
zcons

zcons

zcons

zcons
ce0:1

ce0:2

ce0:3
ce1:1

Platform
10 30 60
/usr

/usr

/usr

/usr

Virtual
zoneadmd zoneadmd zoneadmd zoneadmd

pool1 (4 CPU; 6GB), FSS pool2 (4 CPU; 10GB)

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)

core services remote admin/monitoring platform administration


(inetd, rpcbind, sshd, ...) (SNMP, SunMC, WBEM) (syseventd, devfsadm, ifconfig, metadb,...)

storage complex
network device network device network device
(hme0) (ce0) (ce1)
Examples
Multiple Application Containers
global zone (v1280-room3-rack12-2; 129.76.4.24)
oracle1 zone (oracle_ops) oracle2 zone (ora_ta) mail zone (mailserver)
zone root: /zone/oracle1 zone root: /zone/oracle2 zone root: /zone/mail1
15 web service project 60 ora_ops project 70 ora_ta project login services
(Apache 1.3.22) (oracle) (oracle) (SSH sshd)

Environment
Application
10 app service project 0 backup project 20 dba users proj network services
(IAS, J2SE) (sqlplus) (sh, bash, prstat) (sendmail, IMAP)

5 dba users project 10 system project 10 system project core services


(sh, bash, prstat) (inetd, sshd) (inetd, sshd) (inetd)

hme0:2
hme0:1

zcons

zcons
zcons

ce0:2

ce0:3
ce1:1
ce0:1

Platform
70 10

/usr

/usr
/usr

Virtual
zoneadmd zoneadmd zoneadmd

pool1 (8 CPU), FSS pool2 (4 CPU)

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)

core services remote admin/monitoring platform administration


(inetd, rpcbind, sshd, ...) (SNMP, SunMC, WBEM) (syseventd, devfsadm, ifconfig, metadb,...)

storage complex
network device network device network device
(hme0) (ce0) (ce1)
Sun™ MC – Solaris Container Manager
Manage systems that run the
Solaris 8, 9, and 10 OS

Manage Solaris Containers


across many systems

Uses Sun Management


Center 3.5 Update 1b
Container Management
View all the Containers
in your environment

Create/Delete/Modify
Projects

Automatic discovery
of new objects

Recreate a Container on
another system
Manage Solaris Zones

Create new
Zones through a
single wizard

Create/Delete/Modify Support for IPQoS for


Pools, Zones Solaris Zones
Zone Commands

• Zone Configuration – zonecfg


> Define what a zone looks like
• Console Access – zlogin -C
• Zone Administration – zoneadm
> Install, Boot, Restart, Stop, List, Verify, Uninstall
Zone Administration
• zoneadm(1M) is used by the global zone
administrator to
> install a new root file system for a configured zone
> list zones and optionally their state
> verify whether the configuration of an installed zone is
semantically complete and ready to be booted
> boot or ready an installed zone
> halt or reboot a running zone
> uninstall the root file system of an installed zone
Zone Console
• Zone pseudo-console available for each zone
> Mimics a hardware console
> Accessible via zlogin -C
> Available prior to zone boot
global# zlogin -C zone1
[Connected to zone 'zone1' console]
zone1#
~.
[Connection to zone 'zone1' console closed]
• Publishes zone state change messages
[Notice: zone halted]
Demo
Creating a zone

Global# zonecfg -z zone1


zone1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:zone1> create
Setting's for the zone
zonecfg:zone1> set zonepath=/zoneroots/zone1
zonecfg:zone1> set autoboot=true
zonecfg:zone1> add net
zonecfg:zone1:net> set address=192.9.200.100/24
zonecfg:zone1:net> set physical=e1000g
zonecfg:zone1:net> end
zonecfg:zone1> add inherit-pkg-dir
zonecfg:zone1:inherit-pkg-dir> set dir=/opt
zonecfg:zone1:inherit-pkg-dir> end
zonecfg:zone1> verify
zonecfg:zone1> commit
zonecfg:zone1> ^D

global# zoneadm list -vc


global# ls -l /etc/zones/zone1.xml
global# zonecfg -z zone1 info
Installing the zone
global# zoneadm -z zone1 install
Preparing to install zone <zone1>.
Creating list of files to copy from the global zone.
Copying <2394> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1048> packages on the zone.
Initialized <1048> packages on zone.
Zone <zone1> is initialized.
Installation of <1> packages was skipped.
Installation of these packages generated warnings: <SFWmuttS>
The file </zoneroots/zone1/root/var/sadm/system/logs/install_log> contains a log of the zone installation.

- It took about 9 minutes on my laptop

global# zoneadm list -cv


ID NAME STATUS PATH
0 global running /
1 zone1 installed /zoneroots/zone1
Boot the zone
global# zoneadm -z zone1 boot
- It took about 4 seconds for 1st boot on my laptop.

global# zoneadm list -cv


ID NAME STATUS PATH
0 global running /
1 zone1 running /zoneroots/zone1

global# zlogin -C zone1


[Connected to zone 'zone1' console]
<Run through sysid tools as usual to do initial customization>
Example: Interactive Initial Boot
• sysidtool(1M) runs by default
[NOTICE: zone booting up]
SunOS Release 5.10 Version s10_52 32-bit
Copyright 1983-2004 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: twilight
The system is coming up. Please wait.
Select a Language
0. English
1. French
2. Japanese
3. Simplified Chinese
4. Traditional Chinese
Please make a choice (0 - 4), or press h or ? for help:
Example: Hands-off Initial Boot
• Using a sysidcfg(4) file as an alternative
# cat > ~/zone-cfg/zone1.sysidcfg
system_locale=en_US
timezone=US/Pacific
timeserver=localhost
terminal=xterms
security_policy=NONE
network_interface=PRIMARY {hostname=zone1 \
protocol_ipv6=no}
name_service=NONE
system_locale=C
root_password=MNYm8SfoJlvIY
^D
# cp ~/zone-cfg/zone1.sysidcfg \
/zoneroots/zone1/root/etc/sysidcfg
Example: Process Monitoring
• prstat -Z
• prstat -z <zonename>
• ps -aef -z <zonename>
• ps -aef -Z
• df -hZ
ptree prints the process trees with child processes
indented from their respective parent processes.
• ptree -z <zonename>
Zones and File Systems

• 3 different ways of provisioning file systems:


> LOFS – Mount directory from global in a non-global zone
> UFS – Mount real UFS directly into non-global zone
> Raw – Attach raw devices to non-global zone
• Zonecfg requires a separate “add fs” or “add device”
stanza for each device or mount point added.
Example: Zones + LOFS
global# zonecfg -z zone1
zonecfg:zone1> add fs
zonecfg:zone1:fs> set dir=/opt/local
zonecfg:zone1:fs> set special=/local
zonecfg:zone1:fs> set type=lofs
zonecfg:zone1:fs> add options [rw, nodevices]
zonecfg:zone1:fs> end
zonecfg:zone1> verify
zonecfg:zone1> commit
zonecfg:zone1> ^D

➔This will mount the /local directory from the global to a mount point of
/opt/local in the zone
➔ Useful to share data between zones, using the global zones as a go-between
Example: Zones + UFS
global# zonecfg -z dbzone
zonecfg:red> add fs
zonecfg:red:fs> set dir=/opt/local
zonecfg:red:fs> set special=/dev/dsk/c0d0s7
zonecfg:red:fs> set raw=/dev/rdsk/c0d0s7
zonecfg:red:fs> set type=ufs
zonecfg:red:fs> end
zonecfg:red> verify
zonecfg:red> commit
zonecfg:red> ^D
> Mounts the UFS disk slice /dev/dsk/c0t0d0s7 as /opt/local in the non-
global zone.
> No exposed mount point for this file system in the global zone.
Example: Zones + Raw Devices
global#zonecfg -z zone1
zonecfg:zone1> add device
zonecfg:zone1:device> set match=/dev/rdsk/c0d0s6
zonecfg:zone1:device> end
zonecfh:zone1> add device
zonecfg:zone1:device> set match=/dev/dsk/c0d0s6
zonecfg:zone1:device> end
zonecfg:zone1> verify
zonecfg:zone1> commit
zonecfg:zone1> ^D

> Adds a raw device directly into the non-global zone


> Creates device node for the new device
> Match can include wildcards and is evaluated each time the zone boots

zone1# newfs /dev/rdsk/c0d0s6


zone1# mount /dev/dsk/c0d0s6 /opt/local
Example: Zones + FSS
#zonecfg -z zone1
zonecfg:zone1> set pool=newpool
zonecfg:zone1> add rctl
zonecfg:zone1:rctl> set name=zone.cpu-shares
zonecfg:zone1:rctl> add value (priv=privileged,limit=10,action=none)
zonecfg:zone1:rctl> end
zonecfg:zone1> verify
zonecfg:zone1> commit
zonecfg:zone1> ^D

Note: default pool will be used if “set pool” is not specified

#prctl -n zone.cpu-shares -r -v 25 -i zone zonename


Resource Pools Management
poolcfg(1M) and pooladm(1M)
• Enabling pools
> # pooladm -e
• Disabling pools
> # pooladm -d
• Creating /etc/pooladm.conf xml file
> # pooladm -s
• View current config info
> # poolcfg -c info
Pools Configuration
• Create a set with min and max number of CPU's
in a pool
> # poolcfg -c 'create pset dbset (uint pset.min=1; uint
pset.max=2)'
• Create a pool
> # poolcfg -c 'create pool dbpool'
• Associate set to the pool
> # poolcfg -c 'associate pool dbpool (pset dbset)'
• View current config info
> # poolcfg -c info
> # poolstat -r all
Pools Example
• tm163-118# poolcfg -c info
system tm163-118
string system.comment
int system.version 1
boolean system.bind-default true
int system.poold.pid 20514

pool dbpool
int pool.sys_id 3
boolean pool.active true
boolean pool.default false
int pool.importance 1
string pool.comment
pset dbset

pool pool_default
int pool.sys_id 0
boolean pool.active true
boolean pool.default true
int pool.importance 1
string pool.comment
pset pset_default
Pools Example (Cont.)
pset dbset
int pset.sys_id 1
boolean pset.default false
uint pset.min 1
uint pset.max 1
string pset.units population
uint pset.load 0
uint pset.size 1
string pset.comment

cpu
int cpu.sys_id 0
string cpu.comment
string cpu.status on-line

pset pset_default
int pset.sys_id -1
boolean pset.default true
uint pset.min 1
uint pset.max 1
string pset.units population
uint pset.load 0
uint pset.size 1
string pset.comment

cpu
int cpu.sys_id 1
string cpu.comment
string cpu.status on-line
Pools and Zone
• Bind a zone to a pool
> # poolbind -p dbpool -i zoneid dbzone
• Which pool are you binding to?
> dbzone# poolbind -q $$
25177 dbpool
System Parameter Changes in S10
• Many removed and obsoleleted parameters
> http://docs.sun.com/app/docs/doc/817-0404/6mg74vs90?a=view
• Removed System V IPC parameters
Message Queues Semaphores Shared Memory
msgsys:msginfo_msgmap semsys:seminfo_semmaem shmsys:shminfo_shmmin
msgsys:msginfo_msgmax semsys:seminfo_semmap shmsys:shminfo_shmseg
msgsys:msginfo_msgseg semsys:seminfo_semmns
msgsys:msginfo_msgssz semsys:seminfo_semmnu
semsys:seminfo_semvmx
semsys:seminfo_semume
semsys:seminfo_semusz

• Obsoleted parameters replaced with controlable resource parameters


(bigigger default value)
> http://docs.sun.com/app/docs/doc/817-1592/6mhahuoim?a=view
Oracle Related Parameters
• System V IPC parameters and the corresponding Solaris resource controls

Oracle Required
Parameter Recommendation in S10 Resource Control Default Value
SEMNI
(semsys:seminfo_semmni) 100 Yes project.max-sem-ids 128
SEMMNS
(semsys:seminfo_semmns) 1024 No N/A N/A
SEMMSL
(semsys:seminfo_semmsl) 256 Yes project.max-sem-nsems 512
SHMMAX ¼ of physical
(shmsys:shminfo_shmmax) Yes project.max-shm-memory RAM
SHMMIN
(shmsys:shminfo_shmmin) 1 No
SHMMNI
(shmsys:shminfo_shmmni) 100 Yes project.max-shm-ids 128
SHMSEG
(shmsys:shminfo_shmseg) 10 No N/A N/A
Resource Control Commands
• System V IPC parameters not need to be set in /etc/system
• Set on a per-process or per-project basis
• prctl(1)
> # prctl -n process.max-file-descriptor <pid>
> # prctl -n project.cpu-shares -v 10 -r -i project db_project
> # prctl -n project.max-shm-memory -v 10g -r -i project user.oracle
> # prctl -n project.max-shm-memory -i project user.oracle
> # prctl -i project user.oracle
• rctladm (1)
> # rctladm -l
Zones FAQ/Blogs/Info
• http://www.opensolaris.org/os/community/zones/faq
• http://blogs.sun.com/<whomever>
> David Comay (comay)
> Dan Price (dp)
> John Beck (jbeck)
> Andy Tucker (tucker)
• http://www.sun.com/bigadmin/content/zones
• http://www.sun.com/blueprints/
SOLARIS 10 CONTAINERS
Kimberly Chang
kimberly.chang@sun.com
http://webhome.sfbay/kchangs
http://blogs.sun.com/kchangs