Professional Documents
Culture Documents
Cloud Computing & Big Data
Cloud Computing & Big Data
Abstract
PURPOSE of this paper is to give more insight about cloud computing and big data, where
later we will discuss about cloud computing services, types, its provider and also gives more
explanation aout big data and its usage.
METHODOLOGY used in this research are library and internet research. It is conducted by
looking for references from textbooks, journals, articles, and various sources on the internet.
First, we determine keywords related to our research topic. These keywords will help us to
find any textbooks or scientific journals we need easier. Second, we select the information
based on our research objectives. The information should also be analysed since they come
from various sources.
THE EXPECTED OUTCOME is to improve the readers understanding about Big Data and
Cloud Computing, its implementation in enterprises, and how it can be used to improve the
companys operation.
CONCLUSION of this paper is that in cloud computing, the word cloud (also phrased as
"the cloud") is used as a metaphor for "the Internet," so the phrase cloud computing means "a
type of Internet-based computing," where different services such as servers, storage and
applications are delivered to an organization's computers and devices through the Internet.
There are several services of cloud computing such as: SaaS, PaaS, and IaaS. Indonesia also
has provider that gives cloud service like biznet, Lintas Media Danawa, and Telkomsigma.
Cloud computing is still categorized as new, and not many companies use cloud computing
because they still consider about internet speed, vendor dependency. bandwith, and security &
privacy matters. For the big data, it has 3 characteristics such as: volume , velocity , variety
that must be considered in managing it.
Keywords: Cloud Computing, Big Data, Structured Data, Unstructured Data, Repository,
Provider
Chapter 1
INTRODUCTION
1. Background
In last three years, cloud computing has begun its phenomena in the world of
IT business. Survey result from several industry analysts such as IDC, Gartner or
Forester Research always put cloud computing as the most important topics discussed
by IT manager in many companies around the world. Even though it may be still a
controversial subject today but it is going to be looked at as a revolutionary
technology in retrospect. There are several reasons that make cloud computing click
with people even now. The main reason cloud computing is becoming popular with
businesses is because it helps them cut down costs. Operational expenses are reduced
significantly with cloud computing. You need to pay only for what you use. These
keep a tab on your expenses and convert your capex to opex as well. In businesses,
time translates to money. As cloud computing becomes functional faster than other
systems, businesses save time at the time of set-up. It also ensures fast recoveries,
making sure businesses dont lose time unnecessarily. In fact, the cost of setting up the
cloud system is not too much. You dont need to get additional hardware or software
for the installation and implementation can be done remotely.
Cloud computing also offers a high level of automation, making life easier for
organization. You dont need to set up a team to handle system updates and back-ups.
Now, this also helps you release internal resources for other high-priority work. The
cloud also allows you to work from anywhere in the world. Your employees can
access work-related information from anywhere. Cloud computing holds promise for
the future. It may take a little more time to make it more secure and sturdy, but we
believe the technology can reap much more benefits for businesses even today and you
will begin to feel how the beneficial use the cloud.
Beside cloud computing, there is another technology that we will discuss
named Big Data. Big Data applies to information that cant be processed or
analyzed using traditional processes or tools. Increasingly, organizations today are
facing more and more Big Data challenges. They have access to a wealth of
information, but they dont know how to get value out of it because it is sitting in its
2. Scope
On this paper we will limit our scope of topic, so it wont be too general. The
scope of the analysis and discussion is about:
Purpose: of this paper is to give more insight about cloud computing and big
data, where later we will discuss about cloud computing services, types, its
provider and also gives more explanation about big data and its usage.
Benefit:
The benefits to be attained is that writers and reader understand the overview
and gain knowledge about cloud computing and data, and able to give more
understanding about the overview, usage and importance of cloud computing
and big data today.
4. Methodology
Data is collected mainly by literature Study method. Literature study is done
by collecting data and information available in many sources, such as books,
internet, television, and other media that provides suitable information with the
object of research. The materials found and used, will be used as a theoretical
basis for next using.
5. Writing Systematic
Chapter 1:
Introduction
This chapter explains the background of the research, scope, the
purpose and benefits, research methodology, and the writing
systematic.
Chapter 2:
Literature Review
This chapter explains about theories used in the research and as
the framework in writing and arranging in this research.
Chapter 3:
Discussion
This chapter discuss about ICT, the overview, career opportunities,
ICT competition in Indonesia, ICT implementation, and IT
implementation in banking.
Chapter 4:
Chapter 2
LITERATURE REVIEW
2.1
Cloud Computing
In cloud computing, the word cloud (also phrased as "the cloud") is used as a
metaphor for "the Internet," so the phrase cloud computing means "a type of Internetbased computing," where different services such as servers, storage and
applications are delivered to an organization's computers and devices through the
Internet. (Webopedia)
2.2
Big Data
Big Data is a bit of a misnomer since it implies that pre-existing data is somehow
small (it isnt) or that the only challenge is its sheer size (size is one of them, but there
are often more). In short, the term Big Data applies to information that cant be
processed or analyzed using traditional processes or tools. (Zikopoulos, 2012)
2.3
2.4
2.5
2.6
Repository
Repository is a database for software and components, with an emphasis on revision
control and configuration management (where they keep the good stuff, in other
words). (Hurwitz, 2013)
2.7
Structured Data
Structured data is a data that has a defined length and format. (Hurwitz, 2013)
2.8
Unstructured Data
Unstructured data is a data that does not follow a specified data format. (Hurwitz,
2013)
2.9
2.6
System
According to (Satzinger, Jackson, & Burd, 2005, p. 4), System is a collection of
interrelated components that function together to achieve some outcome.
According to (Wikipedia, System - Wikipedia, the free encyclopedia, 2014), A
system is a set of interacting or interdependent components forming an integrated
whole or a set of elements and relationships which are different from relationships of
the set or elements to other elements or sets.
accepting
inputs
and
producing
outputs
in
organized
transformation
process.(O'Brien, 2004)
2.7
Internet
The Internet is a global system of interconnected computer networks that use the
standard Internet protocol suite (TCP/IP) to link several billion devices worldwide. It
is a network of networks that consists of millions of private, public, academic,
business, and government networks, of local to global scope, that are linked by a broad
array of electronic, wireless, and optical networking technologies. (Wikipedia)
2.8
Provider
According to dictionary, provider is a group or company that provides a specified
service. (Merriam Webster)
Chapter 3
DISCUSSION
3.1
Google Drive
This software is made by Google. The excellence of this application is this
application can open about 30 kinds of files in a browser without installing the
application to read the files extension. Example: Google drive is able to open
Photoshop, but theres no Photoshop application installed in the computer. The other
excellence is the OCR ability for pictures file that are uploaded in Google Drive. This
OCR makes the picture can be searched based on words or sentences in it. Since this
Google Drive is the Googles application, its integrated with other Googles
Application such as Google Docs.
is also can read other files extension without installing the application to read the
files extension.
IV.
Apple iCloud
This application is the same as other. The difference is, Apple iCloud is can be
used by Apple user. This is not the open application. The functions are same. It can
contain music, pictures, videos, word, etc.
V. SugarSync
SugarSync is a cloud service that enables active synchronization of files across
computers and other devices for file backup, access, syncing, and sharing from a
variety of operating systems, such as Android, BlackBerry OS, iOS, Mac OS X,
Samsung SmartTV, Symbian, Windows, and Windows Mobile devices. For Linux,
only a discontinued unofficial third-party client is available. The program
automatically refreshes its sync by constantly monitoring changes to files additions,
deletions, edits and syncs these changes with any other linked devices as well as the
SugarSync servers. Originally offering a free 5GB plan and several paid plans, the
company transitioned to a paid-only model on February 8th, 2014.
to customers over the Internet. PaaS has several advantages for developers. With PaaS,
operating system features can be changed and upgraded frequently. Geographically
distributed development teams can work together on software development projects.
Services can be obtained from diverse sources that cross international boundaries.
Initial and ongoing costs can be reduced by the use of infrastructure services from a
single vendor rather than maintaining multiple hardware facilities that often perform
duplicate functions or suffer from incompatibility problems. Overall expenses can also
be minimized by unification of programming development efforts.
On the downside, PaaS involves some risk of "lock-in" if offerings require
proprietary service interfaces or development languages. Another potential pitfall is
that the flexibility of offerings may not meet the needs of some users whose
requirements rapidly evolve.
Apprenda
Apprendas Enterprise Platform as a Service (PaaS) delivers significant cost savings &
massive improvements in productivity by freeing app development from internal
infrastructure & IT. In addition to this, the .NET framework and Java support allows
businesses to take their applications with them, with no fear of their web applications
becoming locked-in on the platform. Uniquely, Apprenda combines the development
freedom of traditional Software-as-a-Service models with the individual level of
customization expected of a PaaS environment along with a very competitive cost.
Apprenda has been identified as a private cloud leader according to a recent Gartner
report.
II.
IBM
much support or confirmation as to the direction IBM will take their platform in, its
anyones guess how IBM will handle the service needs of clients in the next few years.
III.
VCES VBLOCK
Another proprietary service provider, VCE is proud of the flexibility and usefulness of
their Vblock platform to the modern business. VCE uses industry-standard pricing
models to help business cope with predictable costs for deployment and
development. The user-friendliness of the platform has sometimes been questioned
however, and VCEs business model incorporates revenue streams from advisory and
implementation services while other companies provide these services as part of their
flat-cost PaaS package.
IV.
Openshift
A new offering by Red Hat Inc., OpenShift is a PaaS marketed towards clients who
wish to use open source technologies. The platform itself is compatible with Ruby,
Python, Java and Perl and offers a variety of open source frameworks for customers.
As with all Linux-centered technologies however, OpenShift suffers from an
underwhelming support base and far-reaching inaccessibility problems. Those
developers who arent intimately familiar with the extant Linux environment might
find the idea of cloud-computing through a command line intimidating, if not
completely alien.
V.
Google App Engine (often referred to as GAE or simply App Engine) is a platform as
a service (PaaS) cloud computing platform for developing and hosting web
applications in Google-managed data centers. Applications are sandboxed and run
across multiple servers.[1] App Engine offers automatic scaling for web applications
as the number of requests increases for an application, App Engine automatically
allocates more resources for the web application to handle the additional demand.
Google App Engine is free up to a certain level of consumed resources. Fees are
charged for additional storage, bandwidth, or instance hours required by the
application. It was first released as a preview version in April 2008, and came out of
preview in September 2011.
II.
Rackspace
India,
operating in Texas, Illinois, Virginia, the United Kingdom, Australia, and Hong Kong.
The company's email and apps division operates from Blacksburg, VA; other offices
are located in Austin, Texas and San Francisco, California. Rackspace has two main
service-level segments: Managed and Intensive. Both service levels receive support
via e-mail, telephone, live chat, and ticket systems, but they are designed to fit the
needs of different businesses.
The Managed support level consists of "on-demand" support where proactive services
are provided, but the customer can contact Rackspace when they need additional
assistance. The Intensive support level consists of "proactive" support where many
proactive services are provided, and customers receive additional consultations about
their server configuration. Highly customized implementations generally fall under
this level of support. Some services and products are only available for certain support
levels
III.
3.2
Telkomsigma
Established in 1987, PT Sigma Cipta Caraka (telkomsigma) is a leading integrated
End-to-End ICT Solutions company for more than 26 years in Indonesia.
Telkomsigma offers comprehensive information technology services comprising of
consulting services, managing IT services, software development services, and
integrated data center operations in the banking (conventional and sharia-based),
financial, telecommunications, manufacturing, distribution and other sectors. Their
solutions portfolio comprises of excellence: Managed Services (International certified
Data Center, Cloud Computing, E-Transaction, Telco Managed Services, and
Edutainment Media and Communication Services), Financial & Banking Software
Development Services, Consulting and System Integrator.
Biznet
Biznet Networks established in 2000 as an Internet Service Provider that provides
Internet needs for business customers. In 2000, Biznet using Wireless and In-Building
Ethernet technology. Owing to the support of the best technical team and a full
commitment, Biznet Networks is leading the way to becoming one of the leading
Network Service Provider in Indonesia
3.3
Usage
Computing (RAM)
Storage
Public IP
Price
IDR 1.000 per hour
IDR 2.000 per month
IDR 100.000 per month
Note
per 1 GB
per 1 GB
per unit
Biznet
Can be checked at: http://www.biznetnetworks.com/id/enterprise/cloud-computing-enterprise/
Feature
Application
Virtualization Technology
Operating System
Flexibility
Support
Contract duration
Monthly Fee
3.4
Monthly Fee
(IDR)
2,250,000
Setup Fee
(IDR)
2,000,000
3,000,000
2,000,000
4,000,000
2,000,000
5,750,000
2,000,000
9,000,000
2,000,000
14,500,000
2,000,000
Monthly Fee
(IDR)
3,000,000
12,500,000
22,500,000
50,000,000
75,000,000
125,000,000
Setup Fee
(IDR)
2,000,000
2,000,000
2,000,000
2,000,000
2,000,000
2,000,000
Big Data
There are three characteristics define by big data: volume, velocity and variety.
VOLUME
The sheer volume of data being stored today is exploding. In the year 2000, 800,000 petabytes
(PB) of data were stored in the world. Of course, a lot of the data thats being created today
isnt analyzed at all and thats another problem IBM is trying to address with BigInsights.
IBM expect this number to reach 35 zettabytes (ZB) by 2020. Twitter alone generates more
than 7 terabytes (TB) of data every day, Facebook 10 TB, and some enterprises generate
terabytes of data every hour of every day of the year. Its no longer unheard of for individual
enterprises to have storage clusters holding petabytes of data.
Figure:
Big data is characterized by its volume,
velocity and variety or simply V3.
VARIETY
The volume associated with the Big Data phenomena brings along new challenges for data
centers trying to deal with it: its variety. With the explosion of sensors, and smart devices, as
well as social collaboration technologies, data in an enterprise has become complex, because
it includes not only traditional relational data, but also raw, semistructured, and unstructured
data from web pages, web log files (including click-stream data), search indexes, social media
forums, e-mail, documents, sensor data from active and passive systems, and so on. Whats
more, traditional systems can struggle to store and perform the required analytics to gain
understanding from the contents of these logs because much of the information being
generated doesnt lend itself to traditional database technologies. In our experience, although
some companies are moving down the path, by and large, most are just beginning to
understand the opportunities of Big Data (and whats at stake if its not considered).
Variety represents all types of dataa fundamental shift in analysis requirements from
traditional structured data to include raw, semistructured, and unstructured data as part of the
decision-making and insight process. Traditional analytic platforms cant handle variety.
However, an organizations success will rely on its ability to draw insights from the various
kinds of data available to it, which includes both traditional and nontraditional.
To capitalize on the Big Data opportunity, enterprises must be able to analyze all types of
data, both relational and nonrelational: text, sensor data, audio, video, transactional, and more
Structured Data
The term structured data generally refers to data that has a defined length and format.
Examples of structured data include numbers, dates, and groups of words and numbers called
strings (for example, a customers name, address, and so on).
Most experts agree that this kind of data accounts for about 20 percent of the data that is out
there. Structured data is the data that youre probably used to dealing with. Its usually stored
in a database. You can query it using a language like structured query language (SQL).
Structured data is taking on a new role in the world of big data. The evolution of technology
provides newer sources of structured data being produced often in real time and in large
volumes. The sources of data are divided into two categories:
Computer- or machine-generated: Machine-generated data generally refers to data that
is created by a machine without human intervention.
Human-generated: This is data that humans, in interaction with computers, supply.
Unstructured Data
Unstructured data is data that does not follow a specified format. Unstructured data is
everywhere. In fact, most individuals and organizations conduct their lives around
unstructured data. Just as with structured data, unstructured data is either machine generated
or human generated.
Semi-structured data
Semi-structured data is a kind of data that falls between structured and unstructured data.
Semi-structured data does not necessarily conform to a fixed schema (that is, structure) but
may be self-describing and may have simple label/value pairs.
Examples of semistructured data include EDI, SWIFT, and XML.
It can be explained more from this figure below:
VELOCITY
A conventional understanding of velocity typically considers how quickly the data is
arriving and stored, and its associated rates of retrieval. While managing all of that quickly is
goodand the volumes of data that we are looking at are a consequence of how quick the
data arriveswe believe the idea of velocity is actually something far more compelling than
these conventional definitions.
To accommodate velocity, a new way of thinking about a problem must start at the
inception point of the data. Rather than confining the idea of velocity to the growth rates
associated with your data repositories, we suggest you apply this definition to data in motion:
The speed at which the data is flowing. After all, were in agreement that todays enterprises
are dealing with petabytes of data instead of terabytes, and the increase in RFID sensors and
other information streams has led to a constant flow of data at a pace that has made it
impossible for traditional systems to handle.
3.5
can be used for logging, detecting fraud pattern, analyzing social media pattern, and many
more. In this part of discussion, we are going to break down some of it in order to fully
understand why big data is important and how it will help company to grow and advance.
This is one of the common uses for an inaugural Big Data project. All of those logs
and trace data generated by the operation of common IT solution implemented in a
company is considered as data exhaust.
Enterprises has a lot of data exhaust and can be pretty much pollutant if left around for
a couple hours or days if there is any case when its needed, and usually those data is
purged when this kind of event occurs. The problem is these data might have a
concentrated value, and IT shops need to figure a way to store and extract value from
it. IT nowadays have to be able to store logs and efficiently store them so these logs
need to be kept for emergencies and discarded as soon as possible. It is also can be
used for looking rare problems.
Nowadays log histories are retained, but usually, only for several days or weeks,
because there are too much data for conventional systems to store and making it
impossible to determine trends and issues within a span of a limited time period.
The nature of these logs is semi structured and raw making it not always suited for
traditional database processing. Log formats are constantly changing due to software
and hardware upgrades, so they cant be tied to strict inflexible analysis paradigm.
Enterprises are trying to get better insight into how their systems are running and
when how things break down. IBM helped them leverage a Big Data platform that is
able to analyze approximately 1TB of log data each day. They are now able to
decipher what is happening across the entire stack with each and every transaction.
They are able to start to develop a base of knowledge from it to anticipate and
understand the interaction between failures, able to generate best-practice remediation
steps in the event of specific problem, or even retune the infrastructure to eliminate
them.
Fraud Detection Pattern
Pretty much anywhere some sort of financial transaction is involved presents a
potential for misuse and the ubiquitous specter of fraud. By leveraging Big Data
platform, enterprise has the opportunity to identify, or even stop it from happening.
Figure 3-1
traditional processes.
A modern-day fraud detection ecosystem provides a low-cost Big Data platform form
exploratory modelling and discovery. This data can be leveraged by traditional
systems either directly or through integration into existing data quality and governance
protocols. The addition of InfoSphere Streams also provides the ecosystem analytics
for data-in-motion and data-at-rest.
In the implementation in an enterprise, its discovered that they could not only
improve just how quickly they were able to speed up the build and refresh their fraud
detection model, but it also provides broader and more accurate insight. A process that
once took about three weeks from the transaction hit the transaction switch until
occurs potential fraud and turned the latency into just a couple hours. The fraud
detection models built also broader by roughly 50 percent than the previous set of
data.
campaigns, accuracy of marketing mix, and many more. Those data can be processed
and give a basic insight of people opinion and their sentiment.
But in the end, the more important question is why people says what they say
and why are they behaving in such way. To answer it requires enriching the social
media feeds with additional and differently shaped information thats likely residing in
other enterprise systems. In order to do that enterprise has to look beyond that;
enterprise has to look at the interaction of what people are doing with their behaviour,
financial trends, actual transactions, and so on.
3.6
3.7
value of the investment is by seeing the impact of the implementation. Big Data will improve
efficiency in analyzing various data at a time. But it is capable far more than that. If an
enterprise able to analyze every detail data they have got, they could win the market because
they are able to determine and direct the sentiment of the market for their benefits.
Chapter 4
CONCLUSION AND SUGGESTION
4.1 CONCLUSION
In cloud computing, the word cloud (also phrased as "the cloud") is used as a
metaphor for "the Internet," so the phrase cloud computing means "a type of Internetbased computing," where different services such as servers, storage and applications are
delivered to an organization's computers and devices through the Internet. There are
several services of cloud computing such as: SaaS, PaaS, and IaaS. Indonesia also has
provider that gives cloud service like biznet, Lintas Media Danawa, and Telkomsigma.
Cloud computing is still categorized as new, and not many companies use cloud
computing because they still consider about internet speed, vendor dependency.
bandwith, and security & privacy matters. For the big data, it has 3 characteristics such
as: volume , velocity , variety that must be considered in managing it.
4.2 SUGGESTION
In using cloud, companies still consider about internet speed, vendor dependency.
bandwith, and security & privacy matters. So what we would recommend is that the
companies buys their own private cloud network to prevent security&privacy issues.
The bandwith should be adjusted with the estimated of bandwith that will be used. The
bigger the company, the more bandwith and storage also privacy needed.
Also for the next issue: As we know, organizations today are facing more and
more Big Data challenges. They have access to a wealth of information, but they dont
know how to get value out of it because it is sitting in its most raw form or in a
semistructured or unstructured format; and as a result, they dont even know whether
its worth keeping (or even able to keep it for that matter). Actually by using big data,
company can do IT log analytics. We think by using the big data would make good
investment for the company itself.
REFERENCES
Books:
Hurwitz, J. S. (2013). Big Data For Dummies. New Jersey: John Wiley & Sons, Inc.
O'Brien, J. A. (2004). In Management Information Systems : Managing Information
Technology in the Business Enterprise (Vol. 6th). New York: McGraw-Hill.
Satzinger, J. W., Jackson, R. B., & Burd, S. D. (2005). Object-Oriented Analysis and Design
with the Unified Process. Boston: Course Technology, Cengage Learning.
Zikopoulos, P. C. (2012). Understanding Big Data: Analytics for Enterprise Class Hadoop
and Streaming Data. United States: McGraw-Hill.
Websites:
Investopedia. (t.thn.). Bank Definition | Investopedia. Diambil kembali dari Investopedia:
http://www.investopedia.com/terms/b/bank.asp
Merriam Webster. (t.thn.). Diambil kembali dari http://www.merriamwebster.com/dictionary/provider
Webopedia. (n.d.). Retrieved from
http://www.webopedia.com/TERM/C/cloud_computing.html
Wikipedia. (t.thn.). Diambil kembali dari http://en.wikipedia.org/wiki/Internet
Wikipedia. (2014, 4 18). Business - Wikipedia, the free encyclopedia. Diambil kembali dari
Wikipedia: http://en.wikipedia.org/wiki/Business
Wikipedia. (2014, 4 3). Company - Wikipedia, the free encyclopedia. Diambil kembali dari
Wikipedia: http://en.wikipedia.org/wiki/Company
Wikipedia. (2014, May 21). Information and communication technology - Wikipedia, the free
encyclopedia. Diambil kembali dari Wikipedia, the free encyclopedia:
en.wikipedia.org/wiki/Information_and_communications_technology
Wikipedia. (2014, 4 19). System - Wikipedia, the free encyclopedia. Diambil kembali dari
Wikipedia: http://en.wikipedia.org/wiki/System