Professional Documents
Culture Documents
2 CloudStack Overview
2 CloudStack Overview
Written by: Chiradeep Vittal, Alex Huang @ Citrix Revised by: Gavin Lee, Zhennan Sun @ TCloud Computing
Outline
Overview of CloudStack Problem Definition Feature set overview Network Storage MS internals System VMs System Interactions Roadmap Comparisons
What is CloudStack?
Multi-tenant cloud orchestration platform
Turnkey Hypervisor agnostic Scalable Secure Open source, open standards Deploys on premise or as a hosted solution BSS, self service portal. (Not ASL) Extensive networking service
Build your cloud the way the worlds most successful clouds are built
Public Clouds
Multi-tenant Public Cloud
Dedicated resources Security & total control Internal network Managed by Enterprise or 3rd party
Dedicated resources Security SLA bound 3rd party owned and operated
Mix of shared and dedicated resources Elastic scaling Pay as you go Public internet, VPN access
Admin
Admin Users
Users
End User
Users
Compute
Admin
Network
Storage
Storage
Fiber Channel
NFS
Swift
Secondary Storage
Network
Problem Definition
Offer a scalable, flexible, manageable IAAS platform that follows established cloud computing paradigms IAAS
Orchestrate physical and virtual resources to offer self-service infrastructure provisioning and monitoring
Scalable
1 -> N hypervisors / VMs / virtual resources 1 -> N end users
Flexible
Handle new physical resource types
Hypervisors, storage, networking
Add new APIs Add new services Add new network models
Established Paradigms
EC2 inspired
Semantic variations based on cloud provider needs, hypervisor capabilities
Create VM
Running, Stopped & Total VMs Public IPs Private networks Latest Events
Users
VM Operations
VM Access
VM Status
Volume
Template
Schedule Snapshots
Hourly
Weekly Monthly
Now
Daily
Internet
Hypervisor is the basic unit of scale. Cluster consists of one ore more hosts of same hypervisor All hosts in cluster have access to shared (primary) storage
Pod 1
Access Layer
Pod N
.
Cluster N
Secondary Storage
Pod is one or more clusters, usually with L2 switches. Availability Zone has one or more pods, has access to secondary storage. One or more zones represent cloud
.
Cluster 1 Host 1 Host 2
Primary Storage
Zone1
Single Management Server can manage multiple zones Zones can be geographically distributed but low latency links are expected for better performance Single MS node can manage up to 10K hosts. Multiple MS nodes can be deployed as cluster for scale or redundancy
Zone1
MS
User API
MS
MySQL DB
User API
Load Balancer MS
Admin API
Admin API
MS
MySQL DB Back Up Replication DB
Infrastructure Resources
MS is stateless. MS can be deployed as physical server or VM Single MS node can manage up to 10K hosts. Multiple nodes can be deployed for scale or redundancy Commercial: RHEL 5.4+; FOSS: Ubuntu 10.0.4, Fedora 16 Infrastructure Resources
CloudStack Storage
Primary Storage
Configured at Cluster-level. Close to hosts for better performance Stores all disk volumes for VMs in a cluster Cluster can have one or more primary storages Local disk, iSCSI, FC or NFS Cluster 1 Host 1 Pod 1 L2 switch
Secondary Storage
L3 switch
Secondary Storage
Configured at Zone-level Stores all Templates, ISOs and Snapshots Zone can have one or more secondary storages NFS, OpenStack Swift Host 2
Primary Storage
Host
VM
Primary Storage
VM storage
Host
Primary Storage
Cluster
A grouping of hosts and their associated storage
Pod
Collection of clusters
Cluster
Secondary Storage Network
Network
Within the same L2 switch
Cluster
Secondary Storage
Template, snapshot and ISO storage
Zone
Collection of pods, network offerings and secondary storage
Secondary Storage
Zone level storage for template, ISOs and snapshots NFS or OpenStack Swift via CloudStack System VM
Cluster Pod
Secondary Storage
Zone
Provisioning Process
1. 2. 3. User Requests Instance Provision Optional Network Services Copy instance template from secondary storage to primary storage on appropriate cluster Create any requested data volumes on primary storage for the cluster Create instance Start instance
Secondary Storage
Template
VM
Host Host
Primary Storage
Cluster Pod
4. 5. 6.
Zone
Citrix XenServer
Integrates directly with XenServer Pool Master Snapshots at host level System VM control channel at host level Network management is host level
CloudStack Manager
XenServer Pool Master Host XenServer Host XenServer Host XenServer Host XenServer Host XenServer Resource Pool
Oracle VM
Integrates with ovs-agent Snapshots at host level System VM control channel at host level Network management is host level Does not use OVM Manager All templates must be from Oracle CloudStack configures ocfs2 nodes Requires helper cluster
XenServer, KVM or vSphere OVS Agent
CloudStack Manager
OVM Host
OVS Agent
OVM Host
OVS Agent
OVM Host
OVS Agent
OVM Host
CloudStack Manager
KVM Host
KVM Host
VMware vSphere
Integration through vCenter System VM control channel via CloudStack private network Snapshot and volume management via Secondary Storage VM Networking via vSphere vSwitch
CloudStack Manager vSphere Host vCenter vSphere Host vSphere Cluster vSphere Host vSphere Host vSphere Host vSphere Cluster Data Center
XAPI
HTTPS
Agent
Agent
KVM
OVM
RHEL 6.0, 6.1, 6.2 (coming) Full Snapshots (not live) QCOW2 NFS, iSCSI & FC Storage over-provisioning: NFS
Resources
VMs, IPs, Snapshots
Admin
Domain Reseller A
Domain is a unit of isolation that represents a customer org, business unit or a reseller Domain can have arbitrary levels of subdomains A Domain can have one or more accounts An Account represents one or more users and is the basic unit of isolation Admin can limit resources at the Account or Domain levels
Sub-Domain Org C
Admin
Resources
VMs, IPs, Snapshots
Admin
Account Group A
Account Group B
User 1 User 2
CloudStack Network
Network Terminology
Traffic type
Guest: The tenant network to which instances are attached Storage: The physical network which connects the hypervisor to primary storage Management: Control Plane traffic between CloudStack management server and hypervisor clusters Public:
Outside the cloud [usually Internet] Shared public VLANs trunked down to all hypervisors
Network type
Shared, same subnet for different user
Direct. 1 subnet Direct tagged. VLAN, multiple subnet
All traffic can be multiplexed on to the same underlying physical network using VLANs
Usually Management network is untagged Storage network usually on separate nic (or bond)
Admin informs CloudStack how to map these network types to the underlying physical network
Configure traffic labels on the hypervisor Configure traffic labels on Admin UI
VM Instance
Choose the instantiated guest network IP is arbitrary
Guest Network
Instance of Network Offering Shared: created by Admin Isolated: Created and owned by user One virtual router for one network Cross pod, within Zone VLAN id picked from the pool
Physical Network
Zone level Defined by NIC Assigned with traffic type (P, G, M, S) Associated by label/vswitch label/vswitch name Attached with device as service provider
Network Offering
Only for Guest traffic Guest network type: Shared or Isolated Defined a set of network services, such as DHCP, Firewall, VPN, NAT Bandwidth
Tag
Physical Network
Operations Admin and Cloud API
CloudStack MS Cluster
Users
Router
MySQL
Availability Zone
Servers
Pod 2
Pod 3
Pod N
Secondary Storage
Pod 1
Network Isolation
Web VM
DB VM
DB Security Group
Web VM
Web VM
Web VM
DB VM
Web VM
Web VM
10.1.0.3
10.1.0.4
L3 Core Switch
Pod 2 L2 Switch
10.1.8.1
Load Balancer
Pod 3 L2 Switch
10.1.16.1
10.1.16.12
10.1.16.21
10.1.16.47
10.1.16.85
Pod M
Pod N
Hypervisor V V
CLUSTER 1
Hypervisor 1
CLUSTER 4
Hypervisor V V N
Hypervisor N+1
Hypervisor 8
Public Internet
Guest Virtual Network Public IP 10.1.1.0/24 Guest 2 address Gateway VM 1 65.37.141.24 address 65.37.141.80 10.1.1.1 Guest 2 Virtual Guest 2 Router VM 2 NAT DHCP Load Balancing VPN Guest 2 VM 3
Guest Virtual Network 10.1.1.1/8 VLAN 100 Public Network/Internet 10.1.1.1 Guest VM 1 Public Network/Internet Public IP 65.37.141.111
Juniper SRX Firewall
Public IP 65.37.141.11
CS Virtual Router
Private IP 10.1.1.111
10.1.1.1
Guest VM 1
10.1.1.3
10.1.1.3
NetScaler Load Blancer
Guest VM 2
10.1.1.4
Guest VM 3
10.1.1.5
Guest VM 4
10.1.1.5
CS Virtual Router
Guest VM 4
DHCP, DNS
Security Group 1
10.1.2.3
Guest VM 1
65.11.1.3
Guest VM 2
L3 switch
Guest VM 2
65.11.1.4
EIP, ELB
Guest VM 3
10.5.2.99
Guest VM 3
10.1.2.18
65.11.1.5
DHCP, DNS
CS Virtual Router
Multi-tier network
Multi-tier network
Virtual Network 10.1.3.0/24 VLAN 141 10.1.2.31 Private IP 10.1.1.111 10.1.1.1 Web VM 1 10.1.2.21 10.1.2.24 10.1.2.18 App VM 2 10.1.3.45 App VM 1 10.1.3.21
Web VM 2
10.1.1.5
CS Virtual Router
Web VM 4
CS Virtual Router
Public IP 65.37.141.115
Customer Premises
Monitoring VLAN
Virtual Router Services IPAM DNS LB [intra] S-2-S VPN Static Routes ACLs NAT, PF FW [ingress & egress] BGP
App VM 1
10.1.1.3
Web VM 2
App VM 2
10.1.1.4
Web VM 3
10.1.3.24
DB VM 1
Web VM 4 Virtual Network 10.1.2.0/24 VLAN 1001 Virtual Network 10.1.3.0/24 VLAN 141
Customer Premises
Monitoring VLAN
Virtual Router Services IPAM DNS LB [intra] S-2-S VPN Static Routes ACLs NAT, PF FW [ingress & egress] BGP
App VM 1
10.1.1.3
Web VM 2
App VM 2
10.1.1.4
Web VM 3
10.1.3.24
DB VM 1
Network Offerings
Cloud provider defines the feature set for guest networks Toggle features or service levels
Security groups on/off Load balancer on/off Load balancer software/hardware VPN, firewall, port forwarding
User chooses network offering when creating network Enables upgrade between network offerings Default offerings built-in
For classic CloudStack networking
CloudStack Storage
Storage
Primary Storage
Zone-Level Layer 3 Switch Private Network
Pod 2
Block device to the VM IOPs intensive Accessible from host or cluster wide Supports storage tiering
WORM Storage
Secondary Storage or Object Store for templates, ISO, and snapshot archiving High capacity
Cluster 2
Computing Server 1
Primary Storage
Computing Server 2
Computing Server 3
Cluster 1
Computing Server 4
CloudStack manages the storage between the two to achieve maximum benefit and resiliency
Storage Tagging
Supported via storage tags for primary storage Specify a tag when adding a storage pool Specify a tag when adding a disk offering Only storage pools with the tag will be allocated for the volume
WORM Storage
Write Once Read Many storage pattern is supported by two different storage types
Secondary Storage (NFS Server within an availability zone) Object Store (Swift implementation for cross-zone)
Snapshots
Snapshots are used as backups for DRS Taken on the primary storage and moved to secondary storage Supports individual snapshots and recurring snapshots Full snapshots on VmWare and KVM. Need help. Incremental snapshots on XenServer Allows backup network traffic to be specified in zone to segregate the backup network traffic from other network traffic types
MS Internals
Architecture Workflow High Availability Scalability
Cmds
cmd.execute()
CS API
API Servlet
Services API
Responses
Kernel
Agent API (Commands)
Agent Manager Local
Resources
Or Remote
MySQL
Old Architecture
EC2 CloudStack
API Layer
Access Control
Virtual Machine Manager Console Proxy Manager
Agent Manager
XenServ er Resourc e KVM Resour ce SRX Resour ce Other Resourc es
Pros Agile development for existing developers Scales well horizontally Cons Monolithic Difficult to educate new and third-party developers Easy to introduce bugs
Snapshot Manager
52
Template Manager
Network Manager
Storage Manager
Management Services - Resource management - Configuration - Additional operations added by third party -
ACL & Authentication - Accounts, Domains, and Projects - ACL, limits checking
API Server isolates integration code from Execution Server API Server can horizontally scale to handle traffic Easily adds other API compatibility Easily exposes API needed by third party vendors
Kernel
Drives long running VM operations Syncs between resources managed and DB Generates events
Plugins
Storage Handling Network Handling Deployment planning Hypervisor Handling
Framework
Cluster Management Job Management Alert & Event Management Database Access Layer Messaging Layer Component Framework (OSGi) Transaction Management
Execution Server protected by job queue Kernel kept small for stability. It only drives processes. Plugins provide mappings of virtual entities to physical resources Third party plugins to provide vendor differentiation in CloudStack Communicates with resources within data center over message bus
Storage Resources
Resources are carried in service VMs to be in close network proximity to the physical resources it manages Easily scales to utilize the most abundant resource in data center (CPU & RAM) Communicates with Execution Server over message bus (JSON) Can be replicated for fault tolerance
UI
Cloud Portal
CLI
Other Clients
Management Server
REST API
OAM&P API Console Proxy Management Template Access HA Usage Calculations Additional Services End User API EC2 API Other APIs Pluggable Service API Engine Security Adapters Account Management Connectors
Plugin API
ACL & Authentication Accounts, Domains, and Projects ACL, limits checking Services API
Kernel
Drives long running VM operations Syncs between resources managed and DB Generates events
Services API
Cluster Management
Resource Management
Job Management
Database Access
Event Bus Message Bus Hypervisor Resources Network Resources Storage Resources Image Resources Snapshot Resources
Kernel Module
Understands how to orchestrate long running processes (i.e. VM starts, Snapshot copies, Template propagation) Well defined process steps Calls Plugin API to execute functionalities that it needs
Plugins
Various ways to add more capability to CloudStack Implements clearly defined interfaces All operations must be idempotent All calls are at transaction boundaries Compiles only against the Plugin API module
Anatomy of a Plugin
Rest API
Optional. Required only if needs to expose configuration API to admin.
ServerResource
Optional. Required if Plugin needs to be colocated with the resource Implements translation layer to talk to resource Communicates with server component via JSON
Plugin API
Implmentation
Anatomy of a Plugin
Can be two jars: server component to be deployed on management server and an optional ServerResource component to be deployed colocated with the resource Server component can implement multiple Plugin APIs to affect its feature Can expose its own API through Pluggable Service so administrators can configure the plugin As an example, OVS plugin actually implements both NetworkGuru and NetworkElement
Components.xml Example
<components.xml> <system-integrity-checker class="com.cloud.upgrade.DatabaseUpgradeChecker"> <checker name="ManagementServerNode" class="com.cloud.cluster.ManagementServerNode"/> <checker name="EncryptionSecretKeyChecker" class="com.cloud.utils.crypt.EncryptionSecretKeyChecker"/> <checker name="DatabaseIntegrityChecker" class="com.cloud.upgrade.DatabaseIntegrityChecker"/> <checker name="DatabaseUpgradeChecker" class="com.cloud.upgrade.PremiumDatabaseUpgradeChecker"/> </system-integrity-checker> <interceptor library="com.cloud.configuration.DefaultInterceptorLibrary"/> <management-server class="com.cloud.server.ManagementServerExtImpl" library="com.cloud.configuration.PremiumComponentLibrary"> <adapters key="com.cloud.storage.allocator.StoragePoolAllocator"> <adapter name="LocalStorage" class="com.cloud.storage.allocator.LocalStoragePoolAllocator"/> <adapter name="Storage" class="com.cloud.storage.allocator.FirstFitStoragePoolAllocator"/> </adapters> <pluggableservice name="VirtualRouterElementService" key="com.cloud.network.element.VirtualRouterElementService" class="com.cloud.network.element.VirtualRouterElement"/> </management-server> </components.xml>
ServerResource
Translation layer between CloudStack commands and resource API May be Co-located with resource Have no access to DB API defined in JSON messages
DAO
SQL generation done mostly in GenericDaoBase Uses JPA annotations Very little code to write for each individual DAO Database Access Layer for Kernel No support for more complicated features such as fetch strategy Welcome to use other types of ORM in other modules but like to hear about preferred library. (Hibernate is out due to licensing issues)
Example DAO
// ExampleVO.java @Entity @Table(name=example) public class ExampleVO { @Id @GeneratedValue(strategy= GenerationType.IDENTITY) @Column(name=id) long id; @Column(name=name) String name; @Column(name=value) String value; } } // ExampleDao.java public interface ExampleDao extends GenericDao<ExampleVO, Long> { } // ExampleDaoImpl.java @Local(value=ExampleDao.class) public class ExampleDaoImpl extends GenericDaoBase<ExampleVO, Long> implements ExampleDao { protected ExampleDaoImpl() { }
Server Resource
Start VM Start User VM Start VM Get a Deployment Plan (Host and StoragePool) Prepare Nics Reserve resources for Nic Notify that Nic is about to be started in network Agent Calls Prepare Volumes Prepare template on Primary Storage Agent Calls Agent Start VM Call Stores job result
High Availability
High Availability
Service Offering contains a flag for whether HA should be supported for the VM Does not use the native HA capability of hypervisors for XenServer and KVM Uses adapters to fine tune HA process
High Availability
Has VM changed since work scheduled?
Yes
Cancel Work
Investigation
Uses investigators to find out if VM is alive or down Each investigator returns three states
No
No
Investigation Needed?
Yes Up
Failure
Start VM
Down
Is VM Up or Down?
Up
Success
Up Down Unknown
Unknown
Down
Fencing
Uses fencers to fence off the VM from accessing storage to ensure VM is not corrupted Each Fencer returns three states
Fenced Unable to Fence Dont know how to fence
No
Reschedule Work
Yes
No
More Fencers??
Yes No
Restart
Restarts the VM
Scalability
Current Status
10k resources managed per management server node Scales out horizontally (must disable stats collector) Real production deployment of tens of thousands of resources Internal testing with software simulators up to 30k physical resources with 300k VMs managed by 4 management server nodes We believe we can at least double that scale per management server node
All incoming requests that requires mostly DB operations are short in duration and are executed by executor threads because incoming requests are already load balanced by the load balancer All incoming requests needing resources, which often have long running durations, are checked against ACL by the executor threads and then queued and picked up by job threads. # of job threads are scaled to the # of DB connections available to the management server Requests may take a long time depending on the constraint of the resources but they dont fail.
VM Sync
Fires every minute Peer to peer model: Resource does a full sync on connection and delta syncs thereafter. Management server trusts on resource for correct information. Only runs against resources connected to the management server node
Numbers
Assume 10k hosts and 500k VMs (50 VMs per host) Stats Collector
Fires off 10k requests every 5 minutes or 33 requests a second. Bad but not too bad: Occupies 33 threads every second. But just wait:
2 management servers: 66 requests 3 management servers: 99 requests
It gets worse as # of management servers increase because it did not auto-balance across management servers Oh but it gets worse still: Because the 10k hosts is now spread across 3 management servers. While its 99 requests generated, the number of threads involved is three-fold because requests need to be routed to the right management server. It keeps the management server at 20% busy even at no load from incoming requests
VM Sync
Fires off 1 request at resource connection to sync about 50 VMs Then, push from resource as resource knows what it has pushed before and only pushes changes that are out-of-band. So essentially no threads occupied for a much larger data set.
Listeners are provided to business logic to listen on connection status and adjusts work based on whos connected. By only working on resources that are connected to the management server the process is on, work is auto-balanced between management servers. Also reduces the message routing between the management servers.
Console Proxy VM Provides AJAX-style HTTP-only console viewer Grabs VNC output from hypervisor Scales out (more spawned) as load increases Java-based server Communicates with MS
Secondary Storage VM
Provides image (template) management services Download from HTTP file share or Swift Copy between zones Scale out to handle multiple NFS mounts Java-based server communicates with MS
System VM spec
Debian 6.0 ("Squeeze"), 2.6.32 kernel with the latest security patches from the Debian security APT repository. No extraneous accounts 32-bit for enhanced performance on Xen/VMWare Only essential software packages are installed. Services such as, printing, ftp, telnet, X, kudzu, dns, sendmail are not installed. SSHd only listens on the private/link-local interface. SSH port has been changed to a nonstandard port (3922). SSH logins only using keys (keys are generated at install time and are unique for every customer) pvops kernel with Xen paravirt drivers + KVM virtio drivers + VMware tools for optimum performance on all hypervisors. Xen tools inclusion allows performance monitoring Template is built from scratch and is not polluted with any old logs or history Latest versions of haproxy, iptables, ipsec, apache from debian repository ensures improved security and speed Latest version of jre from Sun/Oracle ensures improved security and speed
System VM contd
SSH keys and password are unique to cloud installation Code can be patched by restarting system vm
Mounts a special ISO file with latest code at boot If ISO contents differ, patch and reboot
Interactions
OVM Cluster
vcenter
Primary Storage
Monitoring
End User UI Admin UI Domain Admin UI
CS API
CS Admin &
End-user API
XS Cluster
XAPI
Primary Storage
JSON
Primary
Juniper SRX Cloud user {API client (Fog/etc)} Nitro API JSON JSON Console Console Proxy VM Proxy VM {Proxied} SSH Ajax Console HTTPS Router VM Router VM Router VM Sec. Storage Sec. Storage VM VM NFS Netscaler VNC
ec2 API
Cloud user {ec2 API client }
MySQL
Cloud user
CloudStack Roadmap
CloudStack Roadmap
2012
Feb Apr Jul Oct
2013
Feb
Acton
Swift Integration Support XenServer 6 Support Vsphere 5
Bonita
Burbank
Campo
AWS-style Regions IPv6 Resource Scaling Dedicated Resource Module Scalability (50K hosts) Plugin Architecture Hypervisor Enhancement
?
Hyper-V (win 8)
OpenvSwitch Support Inter-Vlan Routing VMWare Distributed vSwitch Support Multi-tier App Site-to-Site VPNs AWS-style tags VM Tiers
Netscaler Integration Cisco Nexus 1000v Support Refine Resource Upload Volume Management UI refinement LDAP/AD Authentication Clustered LVM support