Professional Documents
Culture Documents
Learning Outcomes:
After completion of this experiment, student should be able to
1. Understand private cloud deployment
2. Understand Hadoop distribution and need for it
3. Install and configure Cloudera private cloud
Theory:
Apache Hadoop is a collection of open-source software utilities that facilitates using a network
of many computers to solve problems involving massive amounts of data and computation. It
provides a software framework for distributed storage and processing of big data using the
MapReduce programming model. Cloudera, Inc. is a US-based software company that provides
a software platform for data engineering, data warehousing, machine learning and analytics that
runs in the cloud or on premises. Cloudera started as a hybrid open-source Apache Hadoop
distribution, CDH (Cloudera Distribution Including Apache Hadoop),that targeted enterprise-
class deployments of that technology.
Hadoop is a software framework for distributed processing of large datasets across large
clusters of computers
➢ Large datasets Terabytes or petabytes of data
➢ Large clusters hundreds or thousands of nodes
Hadoop is open-source implementation for Google MapReduce. Hadoop is based on a
simple programming model called MapReduce. Hadoop is based on a simple data model,
any data will fit.
1
Cloud Computing 2021-22
Map Reduce
2
Cloud Computing 2021-22
Mappers:
Reducers:
➢ Consume <key, <list of values>>
➢ Produce <key, value>
Shuffling and Sorting:
➢ Hidden phase between mappers and reducers
➢ Groups all similar keys from all mappers, sorts and passes them to a certain
reducer in the form of <key, <list of values>>
3
Cloud Computing 2021-22
Highlights of Cloudera
1. Control costs and manage resources
– Auto-scale
– Auto-suspend
2. Easy provisioning and support for multiple types of workloads
3. Consistent security and data governance across applications and datasets
4. Enables enterprise IT staff to quickly respond to business demands
For Cloudera
3. Start VirtualBox.
4
Cloud Computing 2021-22
5. Click the Folder icon.
8. Click Import.
5
Cloud Computing 2021-22
9. The virtual machine image will be imported. This can take several minutes.
10. Launch Cloudera VM. When the importing is finished, the quickstart-vm-5.4.2-0 VM will
appear on the left in the VirtualBox window. Select it and click the Start button to launch the
VM.
11. Cloudera VM booting. It will take several minutes for the Virtual Machine to start. The
booting process takes a long time since many Hadoop tools are started.
6
Cloud Computing 2021-22
12. The Cloudera VM desktop. Once the booting process is complete, the desktop will
appear with a browser.
Workaround Procedure:
7
Cloud Computing 2021-22
Download Location: https://downloads.cloudera.com/demo_vm/docker/cloudera-
quickstart-vm-5.12.0-0-beta-docker.tar.gz
Screenshots: -
8
Cloud Computing 2021-22
9
Cloud Computing 2021-22
10
Cloud Computing 2021-22
11
Cloud Computing 2021-22
12
Cloud Computing 2021-22
13
Cloud Computing 2021-22
Questions:
1. How is private cloud different from public cloud in Cloudera?
Answer)
14
Cloud Computing 2021-22
15
Cloud Computing 2021-22
Answer)
16
Cloud Computing 2021-22
17
Cloud Computing 2021-22
3. Why managing resources at platform level is better than at application level? How is
it done in Cloudera?
Answer)
18
Cloud Computing 2021-22
19
Cloud Computing 2021-22
20
Cloud Computing 2021-22
21
Cloud Computing 2021-22
Answer)
Conclusion: - From this experiment, I am able to understand a learn private cloud deployment,
understand the concept of Hadoop Distribution and its significance, and install and configure
Cloudera private cloud.
22