You are on page 1of 11

Big Data Ecosystem

Project Prepared By:

GROUP 8

Monika Kashyap(19PGDM085)
Sangita Adak(19PGDM109)
Saurabh Barman(19PGDM110)
Harsimran Singh Barmi(19PGDM069)
Chandradev Manda(19PGDM066)
Neha kumari(19PGDM088)

Subject: Big Data Ecosystem


Term: 4
Submitted to: Mr. Abhay Sharma
Acknowledgement

We take this opportunity to thank Prof. Abhay Sharma continuously enlightening us


with his experience and knowledge. He has created such curiosity within us to explore
and learn at every stage.
Their colossal way of teaching and supporting us to learn and grow is wonderful.
I feel obliged to him for assistance at every step and smoothening the process while
doing this assignment.
This project is prepared with the help of our team mebers and special thanks to each
one of them for assisting in completing it successfully.
Executive Summary

The title of the project


Big Data and Cloud Computing: Combination

The objective of the project was to identify and analyze the power of Big Data combined
with Cloud Computing.
This was done by analyzing and doing thorough researches on:
• The basics of Big Data like its characteristics and types;
• Understanding the definition of Cloud Computing;
• Applications of Cloud Computing;
• Models of Cloud Computing;
• The difference between Big Data and Cloud Computing;
• Identify the relationship between Big Data and Cloud Computing;
• Various rules that are associated with service level agreements between cloud
service providers and respective customers;
• Identifying the dimensions on which these service level agreements are made;
• Identifying how Big data and Cloud computing relationship can be categorized
based on service types;
• Which are the various sectors in which Big Data is used;
• Identifying the opportunities of Big Data;
• Identifying the opportunities of Cloud Computing
• Detailed application of Cloud Computing;
• Identifying the various challenges while harnessing the combined power of Big
Data and Cloud Computing;
Through the above mentioned detailed Research and analysis we were able find the
following Key Takeaways:
1) Cloud Computing and Big Data has huge opportunities in the upcoming years
2) Cloud Computing refers to the use of remote servers to store, process and
manage the data;
3) Vendavo, Analytics Pros, MongoDB, Collibra, MapR Technologies are some of
the organizations who are effectively harnessing the combined power of Big Data
and Cloud Computing;
4) Enhanced information management, Greater responsiveness, Enhanced product
and market strategy, Improved demand management and production planning,
Positive financial implications,Improved risk management are some of the
opportunities of Big Data that is sustainable, flexible, cost effective and secure.
5) Big data and Cloud computing is closely related;
6) Big data combined with the power of cloud computing has vast application in
sectors like Banking, Insurance, Healthcare, Manufacturing etc.;
7) Before adoption of Big data combined with cloud computing data workshops and
seminars could be held to ensure its acceptance at all levels;
8) Prioritizing data security is important over data storage and analysis;

➢ Big data (govt., market, medical)


➢ Characteristics of big data
➢ Cloud computing:
Use of remote servers to-
• Store data
• Manage data
• Process data

 1)Services
• On Premises
• Infrastructure as a Service (IaaS)
• Platform as a Service (PaaS)
• Software as a Service (SaaS)

 2)Deployment
• Private Cloud
• Public Cloud
• Hybrid Cloud
• Multi-Cloud
• SaaS Model: Cloud computing providers often use SaaS to
allow customers to process data;

• Simplified Infrastructure: provides flexible infrastructure to


generate and integrate Big Data;

• Improved Analysis: helps to improve the big data analysis

• Private Cloud Solution: to ensure data security and provacy


➢ Rules associated with service level agreements for Protects:
• Data
• Capacity
• Scalability
• Security
• Privacy
• Availability Of Data Storage And Data Growth

➢Big data and Cloud computing relationship can be


categorized based on service types as below:
 IaaS in Public Cloud
 PaaS in Private Cloud
 SaaS in Hybrid Cloud
Banking and Securities

Communications, Media and Entertainment

Healthcare Providers

Education

Manufacturing and Natural Resources

Government

Insurance

Platform as a Service (PaaS) and Infrastructure as a Service (IaaS)

Hybrid Cloud Approach

Testing and development

Big data analysis

Storage

Recovery

Backup
Analytics Pros

mongoDB Cloud

Vendavo

Collibra

MapR Technologies

➢ Enhanced information management


➢ Greater responsiveness
➢ Enhanced product and market strategy
➢ Improved demand management and
production planning
➢ Positive financial implications
➢ Improved risk management
➢ Cost savings
➢ Security
➢ Flexibility
➢ Mobility
➢ Sustainability

➢Insufficient understanding and acceptance of big Confusing variety of big data technologies:
data: It can be easy to get lost in the variety of big data
Companies fail to know even the basics: what big data technologies now available on the market. Do you
actually is, what its benefits are, what infrastructure is need Spark or would the speeds of Hadoop MapReduce
needed, etc. Without a clear understanding, a big data be enough? Is it better to store data in Cassandra or
adoption project risks to be doomed to failure. HBase? Finding the answers can be tricky.
SOL: If you are new to the world of big data, trying to
➢SOL: Big Data workshops and seminars must be held at seek professional help would be the right way to go.
companies for everyone. Big data, being a huge change You could hire an expert or turn to a vendor for big data
for a company, should be accepted by top management consulting. In both cases, with joint efforts, you’ll be
first and then down the ladder. To ensure big data able to work out a strategy and, based on that, choose
understanding and acceptance at all levels, IT the needed technology stack.
departments need to organize numerous trainings and
workshops.
Paying loads of money: Complexity of managing data quality:
Big data adoption projects entail lots of expenses. If you opt Data from diverse sources
for an on-premises solution, you’ll have to mind the costs of Sooner or later, you’ll run into the problem of data
new hardware, new hires (administrators and developers), integration, since the data you need to analyze comes from
electricity and so on. Plus: although the needed frameworks diverse sources in a variety of different formats. Unreliable
are open-source, you’ll still need to pay for the development, data
setup, configuration and maintenance of new software. Nobody is hiding the fact that big data isn’t 100% accurate.
If you decide on a cloud-based big data solution, you’ll still And all in all, it’s not that critical. But it doesn’t mean that
need to hire staff (as above) and pay for cloud services, big you shouldn’t at all control how reliable your data is.
data solution development as well as setup and maintenance SOL: There is a whole bunch of techniques dedicated to
of needed frameworks. cleansing data. But first things first. Your big data needs to
SOL: There are also hybrid solutions when parts of data are have a proper model. Only after creating that, you can go
stored and processed in cloud and parts – on-premises, which ahead and do other things, like:
can also be cost-effective. And resorting to data lakes or •Compare data to the single point of truth (for instance,
algorithm optimizations (if done properly) can also save compare variants of addresses to their spellings in the postal
money: system database).
1.Data lakes can provide cheap storage opportunities for the •Match records and merge them, if they relate to the same
data you don’t need to analyze at the moment. entity.
2.Optimized algorithms, in their turn, can reduce computing
power consumption by 5 to 100 times. Or even more.

Dangerous big data security holes: Tricky process of converting big data into valuable insights:
Securing these huge sets of data is one of the daunting Super-cool big data analytics looks at what item pairs people
challenges of Big Data. Often companies are so busy in buy (say, a needle and thread) solely based on your historical
understanding, storing and analyzing their data sets that data about customer behavior. Meanwhile, on Instagram, a
they push data security for later stages. But, this is not a certain soccer player posts his new look, and the two
smart move as unprotected data repositories can characteristic things he’s wearing are white Nike sneakers and
become breeding grounds for malicious hackers. a beige cap. He looks good in them, and people who see that
SOL: The precaution against your possible big data want to look this way too. Thus, they rush to buy a similar pair
security challenges is putting security first. It is of sneakers and a similar cap. But in your store, you have only
particularly important at the stage of designing your the sneakers. As a result, you lose revenue and maybe some
solution’s architecture. Because if you don’t get along loyal customers.
with big data security from the very start, it’ll bite you SOL: Companies have to solve their data integration problems
when you least expect it. by purchasing the right tools. Some of the best data
integration tools are mentioned below:
• Talend Data Integration
• Centerprise Data Integrator
• ArcESB
• IBM InfoSphere
• Xplenty
• Microsoft SQL
• QlikView
• Oracle Data Service Integrator
 Big data and Cloud computing is closely related;
 Big data combined with the power of cloud computing has vast
application in sectors like Banking, Insurance, Healthcare, Manufacturing
etc.;
 Before adoption of Big data combined with cloud computing data
workshops and seminars could be held to ensure its acceptance at all
levels;
 Prioritizing data security is important over data storage and analysis;

You might also like