Amazon Web Services – MongoDB on AWS

January 2012

MongoDB on AWS
Miles Ward January 2012

Page 1 of 20

Amazon Web Services – MongoDB on AWS

January 2012

Table of Contents
Table of Contents ........................................................................................................................................ 2 Introduction ................................................................................................................................................ 3 What is NoSQL? ........................................................................................................................................... 3 NoSQL on AWS ............................................................................................................................................ 3 Running NoSQL Systems on Amazon Elastic Compute Cloud ......................................................................... 4
Performance ..........................................................................................................................................................4 Durability and Availability ......................................................................................................................................5 Elasticity and Scalability .........................................................................................................................................5 Configuration .........................................................................................................................................................5 Pricing....................................................................................................................................................................6

MongoDB on Amazon EC2 ........................................................................................................................... 7
Overview ...............................................................................................................................................................7 Basic tips ................................................................................................................................................................7 Basic MongoDB Installation ....................................................................................................................................8 Architecture ...........................................................................................................................................................9 Building Blocks ................................................................................................................................................................ 9 Production Designs ....................................................................................................................................................... 10 Additional Scaling Considerations................................................................................................................................. 12 Operations ........................................................................................................................................................... 17 Backup ........................................................................................................................................................................... 17 Restore .......................................................................................................................................................................... 18 Monitoring .................................................................................................................................................................... 18 Security................................................................................................................................................................ 18 Network ........................................................................................................................................................................ 18 Authentication .............................................................................................................................................................. 19

Summary ................................................................................................................................................... 20 Further Reading/Next Steps....................................................................................................................... 20

Page 2 of 20

All services aim to maximize benefits of scale and to pass those benefits on to developers. and they implement different optimizations. from anywhere on the web. scalable. NoSQL software packages are widely deployed in the AWS cloud. and InfiniteGraph Document stores: like MongoDB. they expose different interfaces. NoSQL software is typically easy to use by developers. It gives any developer access to the same highly scalable. NoSQL on AWS Amazon Web Services (AWS) provides an excellent platform for running many advanced data systems in the cloud. reliable. at any time. Amazon Web Services has native NoSQL services that do not require direct administration and offer usage-based pricing. Developers simply store and query data items via web services requests. and Amazon SimpleDB does the rest. and flexible non-relational data store that offloads the work of database administration. CouchDB. and fault-tolerance. fast. Consider these options as possible alternatives to building your own system with an OSS or a commercial NoSQL application. Running your own NoSQL data store on Amazon Elastic Cloud Compute (Amazon EC2) is a great scenario for users whose application requires the unique benefits of structured storage software optimized for high-performance operations on large datasets. highavailability. Page 3 of 20 . some minor differences apply to all major NoSQL systems. and we examine important MongoDB implementation characteristics such as performance. and reflects optimization around narrower workload definitions. The applications are written in different languages. easy-to-use cloud computing platform. inexpensive infrastructure that Amazon uses to run its own global network of websites. Amazon SimpleDB is a highly available. at the expense of strict ACID (atomicity. physical infrastructure. durability. secure. In many ways AWS infrastructure is similar to local.Amazon Web Services – MongoDB on AWS January 2012 Introduction Amazon Web Services (AWS) is a flexible. isolation. as the name implies. There are three major categories of NoSQL applications today:    Key-value stores: like Cassandra. and durability) compliance and. We pay particular attention to identifying features that support scalability. Riak. What is NoSQL? NoSQL is a popular name for a subset of structured storage software that is designed with the intention of delivering increased optimization for high-performance operations on large datasets. This whitepaper will help you understand one of the most popular NoSQL options available with the AWS cloud computing platform—the open source application MongoDB. DEX. and Project Voldemort Graph databases: like Neo4j. We provide an overview of general best practices that apply to all major NoSQL options. and BaseX These NoSQL applications are either separate open source software (OSS) projects or commercial closed-source projects. tolerant of scale by way of horizontal distribution. cost-effective. and security. consistency. Amazon Simple Storage Service (Amazon S3) provides a simple web services interface that can be used to store and retrieve any amount of data. native querying in the SQL syntax. A general understanding of those differences can assist greatly in making good architecture decisions for your system. Some of the unique characteristics of the cloud provide strong benefits for NoSQL workloads.

Amazon Web Services – MongoDB on AWS January 2012 Running NoSQL Systems on Amazon Elastic Compute Cloud Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides virtualized servers (instances) on-demand. You can start new instances and add more capacity to your cluster. If your performance is disk I/O limited. Remember that Amazon EBS volumes. and synchronous and asynchronous replication strategies. Let’s dive a little deeper and learn about different aspects of running a NoSQL system on Amazon EC2. ephemeral disks are somewhat higher performance and don’t impact your network connectivity. and you can interact with them as you would any machine. We encourage users to benchmark their actual application on several Amazon EC2 instance types and storage configurations in order to select the most appropriate configuration. are connected via the network. and if aggregation is required. If CPU or memory limits your system performance you can scale up the memory. You have complete control of your instances. To scale up random I/O performance. the constraint is more abstract. In many cases. so if you anticipate needing the higher performance instance types. evaluate the network load to ensure that your instance size is sufficient to provide the network bandwidth required. the persistent block storage available to Amazon EC2 instances. You have root access to each one. users can use the same system performance tuning options in the Amazon EC2 environment that they use in a physical server environment. so be sure to choose the appropriate instance size. and the application workload. Performance The performance of a NoSQL system on Amazon Elastic Compute Cloud (Amazon EC2) depends on many factors. and network resources available to the software by choosing a larger Amazon EC2 instance type. A single EBS volume can provide approximately 100 IOPS. To increase the performance of your system you need to know which of the server’s resources is the performance constraint. Remember. Page 4 of 20 . remember that changing an existing instance to a different size requires a stop/start cycle. compute. you can increase the number of EBS volumes as a ratio of EBS storage (6 x 100 GB EBS volumes versus 1 x 600 GB EBS volume). In general. An increase in network performance can have a significant impact on aggregate performance. Remember. To take advantage of additional attached EBS disks. Users can also scale the total performance of a NoSQL system by scaling horizontally across multiple servers with sharding. and single instances with arrays of 10 or more attached EBS disks can often reach 1. Also. start with 64-bit instances. For sequential disk access. Some customers use ephemeral disks in conjunction with Amazon Elastic Block Store to conserve network bandwidth and EBS I/O for more write-heavy operations.000 IOPS sustained. utilizing RAID striping reduces operational durability of the logical volume by a degree inversely proportional to the number of EBS volumes in the stripe set. you might want to make changes to the configuration of your disk resources. caching. use software RAID (redundant array of independent disks) 0 (disk striping) across multiple EBS volumes to increase single logical volume total IOPS (input output operations per second). You can stop your instance while retaining the data on your boot partition and then subsequently restart the same instance using web service APIs. 32-bit Amazon Machine Images (or AMIs) can’t run on 64-bit instances. the configuration of the software. This makes it the perfect platform for your NoSQL systems. including the Amazon EC2 instance type. You can run a variety of NoSQL systems on the top of Amazon EC2. the number and configuration of Amazon Elastic Block Store (Amazon EBS) volumes.

or otherwise spreading the database query load across multiple Amazon EC2 instances. increasing the performance of a single system instance is the easiest way to increase the performance of their application overall. Page 5 of 20 .99% availability of objects over a given year Amazon EBS volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0. as the following table illustrates: AWS Service Amazon S3 backups Durability Benefit Designed to provide 99. you can bundle the instance into a custom AMI (using the bundle commands for instance store AMIs. you can use the AWS Management Console to stop an instance increase the size of an instance. Durability is provided by AWS at many different levels.999999999% durability and 99. Remember. Increasing performance across multiple instances is often referred to as scaling out. can be used to provide even higher performance. then create multiple new instances of your database configuration within a few moments. and restart the instance. you can launch an instance from an AMI that has the base operating system you want. there’s no DVD drive in the cloud. For many customers.5%. More details here. This is particularly useful if you have a set maintenance window and can tolerate system downtime.1% – 0. You can also use the ec2-modifyinstance-attribute command to make these changes. More advanced scaling. For example. Configuration To create a NoSQL structured data store on Amazon EC2. rapid access to replacement instances Can achieve even higher availability by using multiple Availability Zones in different regions around the world Amazon EBS volumes Amazon EC2 instances Amazon EC2 Regions and Availability Zones No approach to structured storage is complete without using the available options in each system for application-level redundancy. Many customers use the AWS Management Console to upload setup files to Amazon S3 from their workstations. once you have configured an Amazon EC2 instance with your structured storage application. Increasing performance of a single system instance is often referred to as scaling up. users of NoSQL solutions on Amazon EC2 can take advantage of the elasticity and scalability of the underlying AWS platform. then install the application software using the standard operating system software install processes. so users must download the required software. Can detach volumes and reattach to new instances of Amazon EC2 servers upon instance failure. sharding. Elasticity and Scalability In many cases. or the ec2-create-image command for EBS AMIs).Amazon Web Services – MongoDB on AWS January 2012 Durability and Availability The Amazon EC2 environment is designed to provide the tools you need to deliver highly durable services. In the Amazon EC2 environment. where failure refers to a complete loss of the volume. and Amazon’s NoSQL services are also designed for durability.

You’ll need to configure the instance’s security group to allow traffic on the port used by the NoSQL application. VNC. Storage: Amazon EBS volumes for your storage (both actual storage capacity and I/O to disk). you might have complex operating system and application configurations that should be preserved by rebundling your AMI. Pricing Eliminating the need to manage hardware and having the ability to quickly provision capacity to scale your NoSQL structured storage makes the AWS cloud the optimum solution for running NoSQL systems. go to Creating Amazon EBS-backed AMIs in the Amazon Elastic Cloud Compute User Guide. Feel free to leverage tools like AWS Simple Monthly Calculator to estimate the cost of your projected workloads. The AWS cloud not only gives you flexibility when designing a solution. The Amazon EBS volumes that serve as the root drives of Amazon EC2 instances are often durable. subsequent (or additional) launches of Amazon EC2 instances will include all of your configuration changes. However. 4. All ports are disabled by default. so make sure you open up the ports you need. either by exposing that interface over the web (or a VPN) to your location. Backups: Amazon S3 storage for snapshots of configured root (EBS) volumes. For directions. When you re-bundle your AMI. Medium or Heavy Utilization) or Spot models for the computational load of the system.Amazon Web Services – MongoDB on AWS January 2012 After your application is installed and configured on Amazon EC2. Depending upon the architecture of your NoSQL solution. Compute: Amazon EC2 instance hours purchased either in the on-demand. and they are designed to survive the physical failure of the Amazon EC2 instance host or an individual EBS host hardware failure. which we’ll cover in greater detail in coming sections of this paper. or remotely accessing the server via SSH. you interact with your service via its exposed interface. or RDP. Page 6 of 20 . You can work with an application on Amazon EC2 just as you would with an on-premises software application. but also makes it cost-efficient to operate it. 3. NX. 2. Data Transfer: Network transit (particularly intra-Availability Zone networking charges for the configurations that span across multiple availability zones (Replica-set). Reserved (Light. there are a few factors you should account for when estimating the cost of delivery: 1.

which isn’t useful to MongoDB. MongoDB focuses on flexibility.5 GB of data in a 32-bit system.html.mongodb. and ease of use.org/post/137788967/32-bit-limitations. auto-sharding.0 or greater. For more information see. go to http://www. Use EBS in RAID. Use 64-bit instances only. Raise file descriptor limits. For more information go to http://blog.org/display/DOCS/GridFS. sophisticated replication. Snapshot regularly for backups. MongoDB can only store about 2.Amazon Web Services – MongoDB on AWS January 2012 MongoDB on Amazon EC2 Overview MongoDB is a scalable. open source. Significant feature enhancements in the version 2.org/display/DOCS/Security+and+Authentication#SecurityandAuthenticationReplicaSetandShardingAuthentication. Be careful with large objects. and represent commands to the mongod process.org is a critical resource for the latest code enhancements. MongoDB provides JSON-style document-oriented storage with full index support. For more information. Don’t use large virtual memory (VM) pages. which may require syntax adjustment on Windows systems. Items that begin with > are run from the mongo shell. This reduces I/O overhead.   Page 7 of 20 .mongodb.mongodb. power.org/display/DOCS/Too+Many+Open+Files. Use XFS or Ext4. Turn off atime and diratime when you mount the data volume. documentation. use GridFS within Mongo. speed. document-oriented structured storage system.0 release include security in sharded environments and other critical elements. For more information see http://www.net/155/krishnakumar. Use RAID 0 or 10 for the data volume. www. and news on this open source software (OSS) project.mongodb. For more information go to http://www. RAID 1 for configdb. high-performance. MongoDB currently restricts documents to 16 MB. The default of 1024 on most systems won’t work for production-scale workloads. If you need larger. Basic tips          MongoDB is updated frequently.mongodb. These file systems support I/O suspend and write-cache flushing which is critical for multi-disk consistent snapshots. Turn on authentication. Use MongoDB 2. and compatibility with the Map/Reduce paradigm. Items that begin with $ are run at a standard Linux shell prompt. This whitepaper uses two different prompts. http://linuxgazette.

conf file to increase oplogSize. Start mongod.) 2. $ .org/linux/mongodb-linux-x86_64-2.nodiratime 0 0’ >> /etc/fstab 7.Amazon Web Services – MongoDB on AWS January 2012 Basic MongoDB Installation 1.repo 10.mongodb. After you’ve created the file. Optionally.) 14. (Our example uses the Amazon Linux 64-bit AMI. enable journaling. Make a directory to which the file system can be mounted. $ sudo echo ‘/dev/sdf /data/db auto noatime. Create an Amazon EBS volume to use for your MongoDB storage.mongodb.tgz > m. in order to do so create a file at /etc/yum.tgz $ tar xzf m. Launch an Amazon EC2 instance using the AMI of your choice. Mount the volume. Edit the /etc/mongodb. #$ chkconfig mongod on Page 8 of 20 . Use the AWS Management Console to allow ingress from your application servers via appropriate ports by creating rules in your Amazon EC2 security group. $ sudo mount -a /dev/sdf /data/db 8.tgz 9.d/10gen. Connect to the instance via SSH. Edit the file to contain: [10gen] name=10gen Repository baseurl=http://downloads-distro. Make a file system on your EBS volume.2. 4. $ sudo mkfs -t ext4 /dev/*the_connection_you_attached_the_volume_to_for_example_sdf* 5. oplogSize = 8000 journal = true dbpath = /data/db logpath = /data/db/logs 13. run the following: $ sudo yum install mongo-10gen 12. you can install MongoDB via a package manager. Download and install MongoDB. $ sudo mkdir -p /data/db/ $ sudo chown `id -u` /data/db 6. Alternatively. 3.org/repo/redhat/os/x86_64 gpgcheck=0 11. (The default port is 27017.repos./mongodb-xxxxxxx/bin/mongod 15.noexec.0. Edit your fstab to enumerate the volume on startup. and attach it to the instance. you can have mongod start automatically every time the server boots. and target our data and log directories. $ curl http://fastdl.

and have all the mongod file structure colocated with the rest of your application on the root volume of the instance. Mongos processes have no persistent state. which allows seamless access to your distributed dataset. Routing to the appropriate shard is provided by the mongos process. Mongos processes may be run on the shard servers themselves. Mongod will be running on each node of the cluster you deploy. which are often combined in the highest-performance systems: replica sets and sharded replica sets. MongoDB has two separate constructs for multi-node topologies. rather. Are you experimenting with the framework on your own for a private project? It’s likely that you’ll install mongod (the running MongoDB application) on the same machine as the rest of your software. and sharding is an automatic data distribution system. Building Blocks mongod Mongod is the primary mongo application. mongos will route read-traffic to the secondary nodes in each shard's replica set. Replica sets are an asynchronous cluster replication technology. If you set slaveOkay in your client layer (either in your driver or in your direct queries). Increasing the number of instances in a replica set provides horizontal scalability for read performance and fault-tolerance. This scale is fine for developer experimentation. they pull their state from the config server on startup. Page 9 of 20 . Any changes that occur on the config servers are propagated to each mongos process.scale applications more complex topologies provide higher performance and fault tolerance.Amazon Web Services – MongoDB on AWS January 2012 Architecture The design of your MongoDB installation on Amazon EC2 depends on the scale at which you want to operate. mongos Mongos is a routing and coordination process that makes the mongod nodes in the cluster look like a single system. Increasing the number of shards (each one being a replica set) allows you to distribute distinct data to separate mongod processes to provide horizontal scalability for write performance. responsible for the structured storage system. but they are lightweight enough to exist on each application server. Many mongos processes can be run simultaneously since these processes do not coordinate between one another. however for high.

or you can run a BIND or other DNS server on our network. given these building blocks. or mongos since there is no sharding structure for your applications to navigate. To change the IP address of config servers you may need to take the cluster down or change the location to which the host name points. Many customers scale out their MongoDB server infrastructure following these steps along the growth curve . config servers store the metadata of the cluster. This introduces a dependency on your DNS solution. there is no need for config servers. An easier approach is to have DNS records for each config server instance so that you can change the locations easily. The config server mongod process is fairly lightweight and can be run on t1.For your config server we recommend that you use DNS for high availability. however. Amazon Route 53 provides a highly available.Amazon Web Services – MongoDB on AWS January 2012 config server Config server is a mongod process used to synchronously replicate the state information of a sharded environment. Production Designs So. The config server is only required in sharded deployments. Although config server can run standalone.micro instances. If you are just running a replica set. Step 1: Separate the mongod server from the application tier. In this setup there’s no need for a config server. it does carry critical data that should be aggressively protected. any production deployment should have three config servers that have copies of the same metadata (for data safety and high availability (HA)). In a sharded environment. how would a system scale to accommodate growing load over time? The following steps and diagrams guide you through a sample production design process that scales as the load grows. high-performance DNS service inside of Amazon EC2 that’s very easy to set up. Page 10 of 20 . Each config server is a mongod process that stores metadata.

Note: Mongo’s replication is asynchronous. but you can use more if you need higher read performance. use mongos for distributing read load to slaves. Having an arbiter instance in place allows failover to occur when a majority of replicas (2/3 in this example) are lost. For high availability these replicas should be placed in different Availability Zones. Page 11 of 20 . Step 2a: If your desired regions only support two Availability Zones.Amazon Web Services – MongoDB on AWS January 2012 Step 2: For replica sets. consider replication to the next nearest region. Use an odd number of MongoDB instances. shifting to five replicas (two in one Availability Zone. in this example we use three. you could place an arbiter instance in the Availability Zone with the fewest instance members of the replica set. Step 2b: Alternatively. it’s critical to use getLastError in your queries with w : 2 or w : “majority”. See extra detail here. and three in the other).

In this example three config servers are running on separate t1. and network resources. Avoid incremental steps. Vertically scaled higher performance instances can provide many of the performance benefits of a more complex replicated and sharded topology. Additional Scaling Considerations Vertical scaling doesn’t offer the same benefits as horizontal scaling. sometimes significant resources.Amazon Web Services – MongoDB on AWS January 2012 Step 3: Create a sharded cluster of replica sets. Scaling step by step when you know you need a big system isn’t efficient in MongoDB. but remember that vertically scaled instances come with none of the significant fault-tolerance benefits. We recommend that you make changes earlier and scale out sooner. remember that you have a virtually unlimited pool of those resources if you can scale horizontally. while you still have capacity to spend on the transition. While it’s critical for most use cases to provide the mongod process with sufficient memory and disk IOPS.micro EC2 instances. Similarly. storage. Step 4: Scale to a multi-region system that uses a time-delayed replica for warm backup. There are significant time-tovalue benefits to launching the full set of shards you’re expecting to run. in order to expand its footprint. scaling under maximum load isn’t a good idea. it simply adds more work to the system at the least effective time. MongoDB needs compute. One potential work-around you can try if you’re caught in an overload situation is to build your Page 12 of 20 . or increasing the count of instances in the replica set all at once. if you can. Waiting until you’re at 99 percent to create another replica for read load or to start the sharding process won’t help.

be sure to mirror those volumes (using RAID 1) to enhance operational durability. a separate file (potentially a RAID 1 mirror) for your journal. and another separate file for your log. Amazon EBS has significant write cache. and provides enhanced durability in comparison to the ephemeral disks on the instance. Minimum scale: Use Amazon Elastic Block Store (Amazon EBS). Medium scale: Optimize for high durability by using RAID 10 on EBS volumes. Remember. on instances which expose more than a single volume. and then return to service. Page 13 of 20 . it is critical to design the storage subsystem that supports your MongoDB in a way that correctly leverages your available resources. We suggest using four volumes for your /data/db file. So. take a small maintenance window to sync the new more powerful fleet.Amazon Web Services – MongoDB on AWS January 2012 new fleet of instances from backups of production instances. If you’re going to use ephemeral disks. Designing Storage to Support MongoDB The pattern of scaling the instance footprint for a MongoDB cluster is designed to deliver additional process memory to your workload. You’ll want to design your infrastructure carefully to avoid storage bottlenecks. excellent random I/O performance. if the instance is lost or terminated. you’ll lose all of your data. even though they’re mirrored.

xlarge.Amazon Web Services – MongoDB on AWS January 2012 Large scale: Move up to higher bandwidth instance types (m1. c1. with time-delayed replication for easier durable backups to Amazon Simple Storage Service (Amazon S3) via snapshot. Extra-Large scale: Here’s where the pattern changes. we recommend adding another replica using a single EBS volume. In this situation. m2. This is especially true if your MongoDB workload is largely very small average writes (4-8 KB). At large scale you should have many replicas and as a result. We recommend that you increase storage subsystem performance at the expense of durability by removing the network/performance overhead of the RAID 10 and switch to using only a RAID 0 stripe set. Page 14 of 20 .4xlarge). the burden of data durability moves to the application tier.xlarge. and increase the size of the RAID 10 to be an 8-volume set. due to the complexity of taking snapshots of large multi-volume stripe sets.

look at increasing the number of replicas within your shards. installing and starting the mongod processes. If the limit is read performance. Page 15 of 20 .Amazon Web Services – MongoDB on AWS January 2012 Web Scale: Shard! If the limit is aggregate write performance. which have more bandwidth for communications with the EBS system. so make sure you have sufficient shard breadth to keep the working set in memory. If you’re located in our US East region. CC1 and CC2 nodes would make excellent primary nodes. increase the number of shards in your cluster. we offer Cluster Compute nodes. and then formally initiating the set. particularly when paired with a large number of EBS volumes (more than 16). Read performance at these scales can be limited if the working dataset isn’t in memory. Creating a Replica Set Setting up asynchronous replication in MongoDB is a three-step process: provisioning the requisite instances and storage.

mongodb. each with enough disks for the medium scale recommendation discussed earlier.mysite. host: 'replicaset3. you could host the MongoDB instances inside the Amazon Virtual Private Cloud (Amazon VPC). For more information go to Creating. Alternatively.mongohosts.2xlarge -z us-east-1b $ ec2-run-instances ami-1b814f72 -b /dev/sdf=:80 -b /dev/sdg=:80 -b /dev/sdh=:80 -b /dev/sdi=:80 -g smongo -m -t m2. When you run the above set. Page 16 of 20 . This way.mongohosts. if one of the nodes failed in the primary region with two nodes.mysite. Start three identical EC2 instances. $ ec2-run-instances ami-1b814f72 -b /dev/sdf=:80 -b /dev/sdg=:80 -b /dev/sdh=:80 -b /dev/sdi=:80 -g smongo -m -t m2. {_id: 1. Start the mongo shell (on any one of the replica hosts). and Deleting Resource Record Sets in the Amazon Route 53 Developer Guide. This is useful for electing a time-delayed node for running backups.mongohosts.2xlarge -z us-east-1a $ ec2-run-instances ami-1b814f72 -b /dev/sdf=:80 -b /dev/sdg=:80 -b /dev/sdh=:80 -b /dev/sdi=:80 -g smongo -m -t m2. to support high availability of the architecture and read performance scalability.org/display/DOCS/Data+Center+Awareness Creating a Sharded and Replicated MongoDB Cluster Sharded systems are typically built on top of replicated systems.org/display/DOCS/Replica+Set+Tutorial and http://www.mongodb. {_id: 2. and initiate the replication.2xlarge -z us-east-1c 2. host: 'replicaset2. members: [ {_id: 0.com:27017'}.Amazon Web Services – MongoDB on AWS To create a replica set January 2012 1.initiate(config).com:27017'}]} $ > rs. You can use DNS to swap out individual replica set hosts. You can also pass a priority:value flag in for each node. the node should output an error message saying that the replica set isn’t up and running yet. Changing. Follow the basic installation instructions discussed earlier. This flag controls the priority by which nodes will be elected to the master in a node failure condition. where you have control over static instance IP addresses. host: 'replicaset1. 4.com:27017'}. create the configuration for the replica set. or for spanning regions.mysite. start mongod: $ mongod --replSet thenameofyourreplicaset --port 27017 --dbpath /db/data 3. More details are available from MongoDB at: http://www. that the other node in that region (not the distant disaster recovery (DR) node) would be elected the primary node. so create three DNS entries in Amazon Route53 (or your DNS system of choice). $ Mongo $ > config = {_id: 'foo'. with the following alteration to the step 14.

5. The key is the object in the collection that will be indexed and used to distribute chunks of items across your cluster. using the --shardsvr flag to enable sharding.$cmd.findOne(). $ ec2-create-snapshot -d thedescriptionofyourbackup vol-11111111 3. > db.micro). Create a snapshot of each EBS volume you want to use to hold data. Shard the relevant collections. journals.runCommand( { enablesharding : "<dbname>" } ). 2. Unlock the database and resume processing. using the --configsvr flag to enable the config server process. To perform a backup 1. and an _id name to distinguish each replica set. After the ec2-create-snapshot command returns the “Pending” state for the snapshot. > db. Enable sharding on the database.sys.unlock. For example. for a total of nine EC2 instances. This changes the default port to 27018. The actual mechanics of backup are greatly simplified by the Amazon S3 snapshot capability native to Amazon EBS. You can find these in the AWS Management Console or using the ec2-describe-instances command. or they run on standalone instances (t1. { "info" : "now locked". Once a collection is sharded it cannot be un-sharded without a backup/restore cycle. This enables the shardcollection command below. "ok" : 1 } 2. passing the DNS names and ports for all three of the config server instances.0.runCommand( { shardcollection : "<namespace>".lock:1}). 3. Operations Backup AWS strongly recommends use of Mongo version 2.Amazon Web Services – MongoDB on AWS To create a sharded MongoDB cluster January 2012 1. you could have three shards with three replicas (a primary and two slaves) each. Confirm that you’ve started config servers.  db.1 or later and the --journal option for MongoDB on Amazon EC2. This will default the port to 27019.  db. it is safe to resume operations on mongod. 4. Install mongos routers on each application host.runCommand({fsync:1. We also recommend placing the journal on Amazon EBS volumes distinct from the default dbpath. These config servers can be distributed among the mongod servers at random. starting them with the --configdb flag. { "ok" : 1.key : <shardkeypatternobject> }). or oplogs for mongod. Flush and lock the mongod by using the fsync and the lock commands. "info" : "unlock requested" } Page 17 of 20 . Start additional hosts for mongod. using the volume numbers on your specific instance.

$ sudo mount /dev/sdp /data/db 5. Security Network To access MongoDB resources from other Amazon EC2 instances or from remote servers. 10gen. if you’re restoring a RAID set to replace the volumes in the same order for easiest re-creation of the RAID volume in the OS. Attach the volumes to the instance.validate({full:true}) Monitoring AWS provides robust monitoring of Amazon EC2 instances. More details on this service are available at http://www. these ports should be Page 18 of 20 . Restore To restore from a backup 1. and other services via the Amazon CloudWatch service. For example. for example current free memory on your instances. you can set an alarm to warn of excessive storage throughput. If you are operating near maximum capacity. $ ec2-create-volume --availability-zone us-east-1a --snapshot snap-12345678 2. via SMS or email. Like any network service. the company behind MongoDB. Mount the volume.10gen. when user-defined thresholds are reached on individual Amazon Web Services.Amazon Web Services – MongoDB on AWS January 2012 Note: For the duration of a snapshot (until it’s listed as “Completed”) there will be a variable negative impact to Amazon Elastic Block Store disk performance. Remember. Another approach would be to write a custom metric to Amazon CloudWatch. You can find the snapshot IDs and create the volumes by using the AWS Management Console or by using the ec2-describe-snapshots command. $ ec2-attach-volume --instance i-aaaaaaa --device /dev/sdp vol-01010101 3./mongodb-xxxxxxx/bin/mongod > db. Amazon EBS volumes. Start mongo and verify the restoration of your mongo collections. This could be more than one volume. a free.com/mongodb-monitoring-service. $ sudo mkdir -p /data/db/ $ sudo chown `id -u` /data/db 4. cloud-based product for monitoring and managing MongoDB installations. Create an Amazon EBS volume from each snapshot used to back up your data. you need to configure your security group to permit ingress over the appropriate TCP ports. you should take your system snapshots on a time-delayed replica. Make a directory into which you can mount the file system. has also released MongoDB Monitoring Service (MMS). Amazon CloudWatch can send an alarm. go to Send Email Based on Storage Throughput Alarm in the Amazon CloudWatch Developer Guide.thenameofyourcollection. $ . To set this alarm. and to alarm or trigger automatic responses off of those measures.

3. However. To disable the "home" stats page use --nohttpinterface. open them only to your corporate office or to other authenticated machines that require access. Default TCP port numbers for MongoDB processes are as follows: A standalone mongod process: 27017 mongos : 27017 shard server (mongod --shardsvr) : 27018 config server (mongod --configsvr) : 27019 web statistics page for mongod : add 1000 to port number (28017. Security group parameters can be changed at any time. > use admin > db. for example. 6+ characters.addUser("theadmin". and all mongos processes) with the --keyFile /path/to/file option. and control access only from known administrative IP addresses. no greater than 1 KB) that can be copied to each server in the set. "anadminpassword") To authenticate a replicated or sharded mongod installation 1. another one. 2. Most statistics pages in the HTTP UI are unavailable unless the --rest option is specified. to a specific IP address.. Page 19 of 20 . The best practice in a multi-tier architecture is to permit operational access to the mongod tier only from servers in the security groups that require access. or a CIDR IP address range. the information on the stats home page is read-only regardless. Modify this file's permissions to be readable only by the appropriate user.auth option.) Ports can be opened to all computers in the same Amazon EC2 security group. or add the first user from the localhost interface. by default). For extra security. You need to add a user to the “admin” collection before starting the server. Create a key file (Base64. you should secure it. run the mongod process with the . (If you use this port. Start each server in the cluster (including all replica set members. Authentication To enable authentication. some users use automation scripts to open these ports when needed and close them when MongoDB resources are not in use.Amazon Web Services – MongoDB on AWS January 2012 opened conservatively. all config servers.

com/blog/2011/mongodb-best-practices/ Getting Started with MongoDB and AWS http://www.mongodb.10gen.com/ MongoDB Best Practices by EngineYard http://www. For more information on Amazon AWS.engineyard. architecture designs. high availability and disaster recovery approaches. Further Reading/Next Steps     MongoDB on AWS Presentation http://www. This paper provides specific information on MongoDB.amazon. see http://aws.com. With capacities that can meet dynamic needs. In addition.com/presentations/mongosf-2011/running-mongodb-cloud MongoHQ – Hosted MongoDB Platform running on AWS http://www. cost that is based on use.org/display/DOCS/Amazon+EC2 Page 20 of 20 .Amazon Web Services – MongoDB on AWS January 2012 Summary The AWS cloud provides a unique platform for any NoSQL application. the AWS cloud enables you to run a variety of NoSQL applications without having to manage the hardware yourself.mongohq. and easy integration with other AWS products like Amazon CloudWatch. management features inherent in NoSQL systems can be leveraged in the cloud. including MongoDB. and security details. including hardware configurations.

Sign up to vote on this title
UsefulNot useful