You are on page 1of 27

M0214

ADVANCED TOPICS OF INFORMATION SYSTEMS

Cloud Computing and Big Data

Bina Nusantara University


Jakarta
2014

Abstract

PURPOSE of this paper is to give more insight about cloud computing and big data, where
later we will discuss about cloud computing services, types, its provider and also gives more
explanation aout big data and its usage.
METHODOLOGY used in this research are library and internet research. It is conducted by
looking for references from textbooks, journals, articles, and various sources on the internet.
First, we determine keywords related to our research topic. These keywords will help us to
find any textbooks or scientific journals we need easier. Second, we select the information
based on our research objectives. The information should also be analysed since they come
from various sources.
THE EXPECTED OUTCOME is to improve the readers understanding about Big Data and
Cloud Computing, its implementation in enterprises, and how it can be used to improve the
companys operation.
CONCLUSION of this paper is that in cloud computing, the word cloud (also phrased as
"the cloud") is used as a metaphor for "the Internet," so the phrase cloud computing means "a
type of Internet-based computing," where different services such as servers, storage and
applications are delivered to an organization's computers and devices through the Internet.
There are several services of cloud computing such as: SaaS, PaaS, and IaaS. Indonesia also
has provider that gives cloud service like biznet, Lintas Media Danawa, and Telkomsigma.
Cloud computing is still categorized as new, and not many companies use cloud computing
because they still consider about internet speed, vendor dependency. bandwith, and security &
privacy matters. For the big data, it has 3 characteristics such as: volume , velocity , variety
that must be considered in managing it.

Keywords: Cloud Computing, Big Data, Structured Data, Unstructured Data, Repository,
Provider

Chapter 1
INTRODUCTION
1. Background
In last three years, cloud computing has begun its phenomena in the world of
IT business. Survey result from several industry analysts such as IDC, Gartner or
Forester Research always put cloud computing as the most important topics discussed
by IT manager in many companies around the world. Even though it may be still a
controversial subject today but it is going to be looked at as a revolutionary
technology in retrospect. There are several reasons that make cloud computing click
with people even now. The main reason cloud computing is becoming popular with
businesses is because it helps them cut down costs. Operational expenses are reduced
significantly with cloud computing. You need to pay only for what you use. These
keep a tab on your expenses and convert your capex to opex as well. In businesses,
time translates to money. As cloud computing becomes functional faster than other
systems, businesses save time at the time of set-up. It also ensures fast recoveries,
making sure businesses dont lose time unnecessarily. In fact, the cost of setting up the
cloud system is not too much. You dont need to get additional hardware or software
for the installation and implementation can be done remotely.
Cloud computing also offers a high level of automation, making life easier for
organization. You dont need to set up a team to handle system updates and back-ups.
Now, this also helps you release internal resources for other high-priority work. The
cloud also allows you to work from anywhere in the world. Your employees can
access work-related information from anywhere. Cloud computing holds promise for
the future. It may take a little more time to make it more secure and sturdy, but we
believe the technology can reap much more benefits for businesses even today and you
will begin to feel how the beneficial use the cloud.
Beside cloud computing, there is another technology that we will discuss
named Big Data. Big Data applies to information that cant be processed or
analyzed using traditional processes or tools. Increasingly, organizations today are
facing more and more Big Data challenges. They have access to a wealth of
information, but they dont know how to get value out of it because it is sitting in its

most raw form or in a semistructured or unstructured format; and as a result, they


dont even know whether its worth keeping (or even able to keep it for that matter).
So for that purpose, we bring up Cloud Computing & Big Data for this paper topic.

2. Scope
On this paper we will limit our scope of topic, so it wont be too general. The
scope of the analysis and discussion is about:

Sample of Cloud Computing services :

Cloud computing provider in Indonesia

Fee structure the provider offer to use cloud computing

Big Data Type Source (volume , velocity , variety )

Structured data , unstructured data and semi structure data,

3. Objectives & Benefits

Purpose: of this paper is to give more insight about cloud computing and big
data, where later we will discuss about cloud computing services, types, its
provider and also gives more explanation about big data and its usage.

Benefit:
The benefits to be attained is that writers and reader understand the overview
and gain knowledge about cloud computing and data, and able to give more
understanding about the overview, usage and importance of cloud computing
and big data today.

4. Methodology
Data is collected mainly by literature Study method. Literature study is done
by collecting data and information available in many sources, such as books,
internet, television, and other media that provides suitable information with the
object of research. The materials found and used, will be used as a theoretical
basis for next using.

5. Writing Systematic
Chapter 1:

Introduction
This chapter explains the background of the research, scope, the
purpose and benefits, research methodology, and the writing
systematic.

Chapter 2:

Literature Review
This chapter explains about theories used in the research and as
the framework in writing and arranging in this research.

Chapter 3:

Discussion
This chapter discuss about ICT, the overview, career opportunities,
ICT competition in Indonesia, ICT implementation, and IT
implementation in banking.

Chapter 4:

Conclusion and Suggestion


This chapter explains the conclusion that has been done by
completing the research and suggestion that can be done for the
technology.

Chapter 2
LITERATURE REVIEW

2.1

Cloud Computing
In cloud computing, the word cloud (also phrased as "the cloud") is used as a
metaphor for "the Internet," so the phrase cloud computing means "a type of Internetbased computing," where different services such as servers, storage and
applications are delivered to an organization's computers and devices through the
Internet. (Webopedia)

2.2

Big Data
Big Data is a bit of a misnomer since it implies that pre-existing data is somehow
small (it isnt) or that the only challenge is its sheer size (size is one of them, but there
are often more). In short, the term Big Data applies to information that cant be
processed or analyzed using traditional processes or tools. (Zikopoulos, 2012)

2.3

Software as a Service (SaaS)


Software as a Service (SaaS) is the delivery of computer applications over the Internet.
(Hurwitz, 2013)

2.4

Infrastructure as a Service (IaaS)


Infrastructure as a Service (IaaS) means Infrastructure, including a management
interface and associated software, provided to companies from the cloud as a service.
(Hurwitz, 2013)

2.5

Infrastructure as a Service (IaaS)


Platform as a Service (PaaS) is a cloud service that abstracts the computing services,
including the operating software and the development and deployment and
management life cycle. It sits on top of Infrastructure as a Service. (Hurwitz, 2013)

2.6

Repository
Repository is a database for software and components, with an emphasis on revision
control and configuration management (where they keep the good stuff, in other
words). (Hurwitz, 2013)

2.7

Structured Data
Structured data is a data that has a defined length and format. (Hurwitz, 2013)

2.8

Unstructured Data
Unstructured data is a data that does not follow a specified data format. (Hurwitz,
2013)

2.9

Information and Communication Technology


According to (Wikipedia, Information and communication technology - Wikipedia, the
free encyclopedia, 2014), Information and communications technology (ICT) is often
used as an extended synonym for information technology (IT), but is a more specific
term that stresses the role of unified communications and the integration of
telecommunications (telephone lines and wireless signals), computers as well as
necessary enterprise software, middleware, storage, and audio-visual systems, which
enable users to access, store, transmit, and manipulate information

2.6

System
According to (Satzinger, Jackson, & Burd, 2005, p. 4), System is a collection of
interrelated components that function together to achieve some outcome.
According to (Wikipedia, System - Wikipedia, the free encyclopedia, 2014), A
system is a set of interacting or interdependent components forming an integrated
whole or a set of elements and relationships which are different from relationships of
the set or elements to other elements or sets.

System is a group of interrelated components working together toward a common goal


by

accepting

inputs

and

producing

outputs

in

organized

transformation

process.(O'Brien, 2004)

2.7

Internet
The Internet is a global system of interconnected computer networks that use the
standard Internet protocol suite (TCP/IP) to link several billion devices worldwide. It
is a network of networks that consists of millions of private, public, academic,
business, and government networks, of local to global scope, that are linked by a broad
array of electronic, wireless, and optical networking technologies. (Wikipedia)

2.8

Provider
According to dictionary, provider is a group or company that provides a specified
service. (Merriam Webster)

Chapter 3
DISCUSSION

3.1

Sample of Cloud Computing Services


a. SaaS (Software as a Service)
SaaS consists of Software as a Service is a software delivery method that provides
access to software and its functions remotely as a Web-based service. Software as a
Service allows organizations to access business functionality at a cost typically less
than paying for licensed applications since SaaS pricing is based on a monthly fee.
Also, because the software is hosted remotely, users don't need to invest in additional
hardware. Software as a Service removes the need for organizations to handle the
installation, set-up and often daily upkeep and maintenance.
Newbie user can use the applications or software anywhere and anytime, depends
on the service provider policy, example fee. This data storage is needed and can be
used anywhere and anytime, as long as there is an internet connection. This SaaS isnt
only to be the storage, but it also can open some files extension without installing the
application on the computer. Sample of SaaS services:
I.

Google Drive
This software is made by Google. The excellence of this application is this

application can open about 30 kinds of files in a browser without installing the
application to read the files extension. Example: Google drive is able to open
Photoshop, but theres no Photoshop application installed in the computer. The other
excellence is the OCR ability for pictures file that are uploaded in Google Drive. This
OCR makes the picture can be searched based on words or sentences in it. Since this
Google Drive is the Googles application, its integrated with other Googles
Application such as Google Docs.

II. Drop Box


Drop box is also one of the SaaS services that is also to store the files with online
based. User can get free 2GB memory to contain files. Like Google Drive, Drop Box

is also can read other files extension without installing the application to read the
files extension.

III. Social Media


Social media is also one of the SaaS services. In social media, user is also able to
keep or store some files with many extensions. Facebook and twitter are the example
of the social media. User can contain pictures, text, video, and etc. Soundcloud is also
the SaaS. The ability is soundcloud can store songs file (mp3, WAV, etc).

IV.

Apple iCloud
This application is the same as other. The difference is, Apple iCloud is can be

used by Apple user. This is not the open application. The functions are same. It can
contain music, pictures, videos, word, etc.

V. SugarSync
SugarSync is a cloud service that enables active synchronization of files across
computers and other devices for file backup, access, syncing, and sharing from a
variety of operating systems, such as Android, BlackBerry OS, iOS, Mac OS X,
Samsung SmartTV, Symbian, Windows, and Windows Mobile devices. For Linux,
only a discontinued unofficial third-party client is available. The program
automatically refreshes its sync by constantly monitoring changes to files additions,
deletions, edits and syncs these changes with any other linked devices as well as the
SugarSync servers. Originally offering a free 5GB plan and several paid plans, the
company transitioned to a paid-only model on February 8th, 2014.

b. PaaS (Platform as a Service)


Platform as a Service (PaaS) is a way to rent hardware, operating systems, storage
and network capacity over the Internet. The service delivery model allows the
customer to rent virtualized servers and associated services for running existing
applications or developing and testing new ones.

Platform as a Service (PaaS) is an outgrowth of Software as a Service (SaaS), a


software distribution model in which hosted software applications are made available

to customers over the Internet. PaaS has several advantages for developers. With PaaS,
operating system features can be changed and upgraded frequently. Geographically
distributed development teams can work together on software development projects.
Services can be obtained from diverse sources that cross international boundaries.
Initial and ongoing costs can be reduced by the use of infrastructure services from a
single vendor rather than maintaining multiple hardware facilities that often perform
duplicate functions or suffer from incompatibility problems. Overall expenses can also
be minimized by unification of programming development efforts.
On the downside, PaaS involves some risk of "lock-in" if offerings require
proprietary service interfaces or development languages. Another potential pitfall is
that the flexibility of offerings may not meet the needs of some users whose
requirements rapidly evolve.

Here are the examples of PaaS:


I.

Apprenda

Apprendas Enterprise Platform as a Service (PaaS) delivers significant cost savings &
massive improvements in productivity by freeing app development from internal
infrastructure & IT. In addition to this, the .NET framework and Java support allows
businesses to take their applications with them, with no fear of their web applications
becoming locked-in on the platform. Uniquely, Apprenda combines the development
freedom of traditional Software-as-a-Service models with the individual level of
customization expected of a PaaS environment along with a very competitive cost.
Apprenda has been identified as a private cloud leader according to a recent Gartner
report.

II.

IBM

Traditionally a powerhouse of computing development, IBM was slow to catch on to


the PaaS service and cloud-computing in general. That said, IBM is offering up a pilot
PaaS service aimed largely at the IT market. While you dont need to be an
independent software vendor in order to use IBMs new PaaS service, the company
has structured their existing cloud-services footprint to serve the software vendor
sector best. Partnering with IBM could prove risky for some clients however, as
discovered by Chase Bank. Their architecture is proprietary, meaning that leaving
IBM can render your web-applications useless, and their service is still new. Without

much support or confirmation as to the direction IBM will take their platform in, its
anyones guess how IBM will handle the service needs of clients in the next few years.

III.

VCES VBLOCK

Another proprietary service provider, VCE is proud of the flexibility and usefulness of
their Vblock platform to the modern business. VCE uses industry-standard pricing
models to help business cope with predictable costs for deployment and
development. The user-friendliness of the platform has sometimes been questioned
however, and VCEs business model incorporates revenue streams from advisory and
implementation services while other companies provide these services as part of their
flat-cost PaaS package.

IV.

Openshift

A new offering by Red Hat Inc., OpenShift is a PaaS marketed towards clients who
wish to use open source technologies. The platform itself is compatible with Ruby,
Python, Java and Perl and offers a variety of open source frameworks for customers.
As with all Linux-centered technologies however, OpenShift suffers from an
underwhelming support base and far-reaching inaccessibility problems. Those
developers who arent intimately familiar with the extant Linux environment might
find the idea of cloud-computing through a command line intimidating, if not
completely alien.

V.

Google App Engine

Google App Engine (often referred to as GAE or simply App Engine) is a platform as
a service (PaaS) cloud computing platform for developing and hosting web
applications in Google-managed data centers. Applications are sandboxed and run
across multiple servers.[1] App Engine offers automatic scaling for web applications
as the number of requests increases for an application, App Engine automatically
allocates more resources for the web application to handle the additional demand.
Google App Engine is free up to a certain level of consumed resources. Fees are
charged for additional storage, bandwidth, or instance hours required by the
application. It was first released as a preview version in April 2008, and came out of
preview in September 2011.

c. IaaS (Infrastructure as a Service)


IaaS (Infrastructure as a Service) is the virtual delivery of computing resources in
the form of hardware, networking, and storage services. It may also include the
delivery of operating systems and virtualization technology to manage the resources.
Rather than buying and installing the required resources in their own data center,
companies rent these resources as needed.
Many companies with a hybrid environment are likely to include IaaS in some
form because IaaS is a highly practical solution for companies with various IT
resource challenges. Whether a company needs additional resources to support a
temporary development project, an on-going dedicated development testing
environment, or disaster recovery, paying for infrastructure services on a per-use basis
can be highly cost-effective.
Compared to SaaS and PaaS, IaaS users are responsible for managing more:
applications, data, runtime, middleware, and O/S. Vendors still manage virtualization,
servers, hard drives, storage, and networking. What users gain with IaaS is
infrastructure on top of which they can install any required platforms. Users are
responsible for updating these if new versions are released. Here are the samples of
IaaS Services:
I.

Amazon Elastic Compute Cloud (EC2)

EC2 is a central part of Amazon.com's cloud computing platform, Amazon Web


Services (AWS). EC2 allows users to rent virtual computers on which to run their own
computer applications. EC2 allows scalable deployment of applications by providing a
Web service through which a user can boot an Amazon Machine Image to create a
virtual machine, which Amazon calls an "instance", containing any software desired.
A user can create, launch, and terminate server instances as needed, paying by the
hour for active servers, hence the term "elastic". EC2 provides users with control over
the geographical location of instances that allows for latency optimization and high
levels of redundancy.

II.

Rackspace

Rackspace Inc. is an IT hosting company based in Windcrest, Texas, USA, a suburb of


San Antonio, Texas. The company also has offices in Australia, the United Kingdom,
Switzerland, Israel, The Netherlands,

India,

and Hong Kong, and data centers

operating in Texas, Illinois, Virginia, the United Kingdom, Australia, and Hong Kong.
The company's email and apps division operates from Blacksburg, VA; other offices

are located in Austin, Texas and San Francisco, California. Rackspace has two main
service-level segments: Managed and Intensive. Both service levels receive support
via e-mail, telephone, live chat, and ticket systems, but they are designed to fit the
needs of different businesses.
The Managed support level consists of "on-demand" support where proactive services
are provided, but the customer can contact Rackspace when they need additional
assistance. The Intensive support level consists of "proactive" support where many
proactive services are provided, and customers receive additional consultations about
their server configuration. Highly customized implementations generally fall under
this level of support. Some services and products are only available for certain support
levels

III.

Green House Data

Green House Data is a data center services provider headquartered in Cheyenne,


Wyoming. Cheyenne is home to a data center, administrative offices, and technical
support. The company also has locations in Oregon, New Jersey, and sales and
marketing offices in Laramie and in Denver, Colorado.
As a whole, the data center industry has been highly criticized for heavy electrical use,
and in recent years has actively tried to reduce power consumption by improving
facility design and increasing server virtualization. As a key element of their business
model, Green House Data purchases renewable energy credits, or RECs, for wind
power and documents purchases with the EPA's Green Power Partnership. In 2013,
Green House Data was part of EPA's "Leadership Club" for sustainable power
purchases. A common measure for data center power consumption is Power usage
effectiveness, often abbreviated PUE.

3.2

Cloud Computing Provider in Indonesia


There are several cloud computing providers in Indonesia, some of them are:
Lintas Media Danawa
PT Aplikanusa Lintasarta (Lintasarta) and its subsidiary, PT. Cross Media Danawa
(LMD) did a collaboration in offering a complete cloud computing solution. Lintasarta
currently successfully markets its Infrastructure as a Service (IaaS) solutions to
various industrial sectors and LMD also enjoy the benefits of a solution Software as a
Service (SaaS) is offered.

Telkomsigma
Established in 1987, PT Sigma Cipta Caraka (telkomsigma) is a leading integrated
End-to-End ICT Solutions company for more than 26 years in Indonesia.
Telkomsigma offers comprehensive information technology services comprising of
consulting services, managing IT services, software development services, and
integrated data center operations in the banking (conventional and sharia-based),
financial, telecommunications, manufacturing, distribution and other sectors. Their
solutions portfolio comprises of excellence: Managed Services (International certified
Data Center, Cloud Computing, E-Transaction, Telco Managed Services, and
Edutainment Media and Communication Services), Financial & Banking Software
Development Services, Consulting and System Integrator.

Biznet
Biznet Networks established in 2000 as an Internet Service Provider that provides
Internet needs for business customers. In 2000, Biznet using Wireless and In-Building
Ethernet technology. Owing to the support of the best technical team and a full
commitment, Biznet Networks is leading the way to becoming one of the leading
Network Service Provider in Indonesia

3.3

Fee Structure The Provider Offer to Use Cloud Computing

Lintas Media Danawa


Can be checked at: http://www.lintasmediadanawa.com/cloud-infrastructure-service/cozy-on-demand-cloud-pricing

Usage
Computing (RAM)
Storage
Public IP

Price
IDR 1.000 per hour
IDR 2.000 per month
IDR 100.000 per month

Note
per 1 GB
per 1 GB
per unit

Biznet
Can be checked at: http://www.biznetnetworks.com/id/enterprise/cloud-computing-enterprise/

Feature
Application
Virtualization Technology
Operating System
Flexibility

Support
Contract duration
Monthly Fee

Cloud Computing Enterprise


All applications that are based on Windows & Linux, large
scale critical
VMware ESXi
Windows & Linux
All applications that each system to get the Virtual Data
Center (VDC) so that the system can be partitioned into
multiple servers according to need
24x7x365
Minimum 6 months
Start from IDR. 2,250,000 per month

Biznet Cloud Server Enterprise


Service
Cloud Server Enterprise 1 Core, 1 GB RAM, 100
GB SAN Storage
Cloud Server Enterprise 2 Core, 2 GB RAM, 100
GB SAN Storage
Cloud Server Enterprise 4 Core, 4 GB RAM, 100
GB SAN Storage
Cloud Server Enterprise 8 Core, 8 GB RAM, 100
GB SAN Storage
Cloud Server Enterprise 8 Core, 16 GB RAM, 100
GB SAN Storage
Cloud Server Enterprise 8 Core, 32 GB RAM, 100
GB SAN Storage
Biznet Cloud Storage Enterprise
Service
Cloud Storage Enterprise 1 TB
Cloud Storage Enterprise 5 TB
Cloud Storage Enterprise 10 TB
Cloud Storage Enterprise 25 TB
Cloud Storage Enterprise 50 TB
Cloud Storage Enterprise 100 TB

3.4

Monthly Fee
(IDR)
2,250,000

Setup Fee
(IDR)
2,000,000

3,000,000

2,000,000

4,000,000

2,000,000

5,750,000

2,000,000

9,000,000

2,000,000

14,500,000

2,000,000

Monthly Fee
(IDR)
3,000,000
12,500,000
22,500,000
50,000,000
75,000,000
125,000,000

Setup Fee
(IDR)
2,000,000
2,000,000
2,000,000
2,000,000
2,000,000
2,000,000

Big Data

There are three characteristics define by big data: volume, velocity and variety.

VOLUME
The sheer volume of data being stored today is exploding. In the year 2000, 800,000 petabytes
(PB) of data were stored in the world. Of course, a lot of the data thats being created today
isnt analyzed at all and thats another problem IBM is trying to address with BigInsights.
IBM expect this number to reach 35 zettabytes (ZB) by 2020. Twitter alone generates more
than 7 terabytes (TB) of data every day, Facebook 10 TB, and some enterprises generate
terabytes of data every hour of every day of the year. Its no longer unheard of for individual
enterprises to have storage clusters holding petabytes of data.

Figure:
Big data is characterized by its volume,
velocity and variety or simply V3.

VARIETY
The volume associated with the Big Data phenomena brings along new challenges for data
centers trying to deal with it: its variety. With the explosion of sensors, and smart devices, as
well as social collaboration technologies, data in an enterprise has become complex, because
it includes not only traditional relational data, but also raw, semistructured, and unstructured
data from web pages, web log files (including click-stream data), search indexes, social media
forums, e-mail, documents, sensor data from active and passive systems, and so on. Whats
more, traditional systems can struggle to store and perform the required analytics to gain
understanding from the contents of these logs because much of the information being
generated doesnt lend itself to traditional database technologies. In our experience, although
some companies are moving down the path, by and large, most are just beginning to
understand the opportunities of Big Data (and whats at stake if its not considered).
Variety represents all types of dataa fundamental shift in analysis requirements from
traditional structured data to include raw, semistructured, and unstructured data as part of the
decision-making and insight process. Traditional analytic platforms cant handle variety.

However, an organizations success will rely on its ability to draw insights from the various
kinds of data available to it, which includes both traditional and nontraditional.
To capitalize on the Big Data opportunity, enterprises must be able to analyze all types of
data, both relational and nonrelational: text, sensor data, audio, video, transactional, and more

Structured Data
The term structured data generally refers to data that has a defined length and format.
Examples of structured data include numbers, dates, and groups of words and numbers called
strings (for example, a customers name, address, and so on).
Most experts agree that this kind of data accounts for about 20 percent of the data that is out
there. Structured data is the data that youre probably used to dealing with. Its usually stored
in a database. You can query it using a language like structured query language (SQL).
Structured data is taking on a new role in the world of big data. The evolution of technology
provides newer sources of structured data being produced often in real time and in large
volumes. The sources of data are divided into two categories:
Computer- or machine-generated: Machine-generated data generally refers to data that
is created by a machine without human intervention.
Human-generated: This is data that humans, in interaction with computers, supply.

MACHINE-GENERATED STRUCTURED DATA can include the following:


Sensor data:
Examples include radio frequency ID (RFID) tags, smartmeters, medical devices, and Global
Positioning System (GPS) data. Forexample, RFID is rapidly becoming a popular technology.
It uses tiny computer chips to track items at a distance. An example of this is tracking
containers of produce from one location to another. When information is transmitted from the
receiver, it can go into a server and then be analyzed. Companies are interested in this for
supply chain management and inventory control. Another example of sensor data is
smartphones that contain sensors like GPS that can be used to understand customer behavior
in new ways.

Web log data:


When servers, applications, networks, and so on operate, they capture all kinds of data about
their activity. This can amount to huge volumes of data that can be useful, for example, to
deal with service-level agreements or to predict security breaches.
Point-of-sale data:
When the cashier swipes the bar code of any product that you are purchasing, all that data
associated with the product is generated. Just think of all the products across all the people
who purchase them, and you can understand how big this data set can be.
Financial data:
Lots of financial systems are now programmatic; they are operated based on predefined rules
that automate processes. Stocktrading data is a good example of this. It contains structured
data such as the company symbol and dollar value. Some of this data is machine generated,
and some is human generated.

STRUCTURED HUMAN-GENERATED DATA might include the following:


Input data:
This is any piece of data that a human might input into a computer, such as name, age,
income, non-free-form survey responses, and so on. This data can be useful to understand
basic customer behavior.
Click-stream data:
Data is generated every time you click a link on a website. This data can be analyzed to
determine customer behavior and buying patterns.
Gaming-related data:
Every move you make in a game can be recorded. This can be useful in understanding how
end users move through a gaming portfolio.

Unstructured Data
Unstructured data is data that does not follow a specified format. Unstructured data is
everywhere. In fact, most individuals and organizations conduct their lives around
unstructured data. Just as with structured data, unstructured data is either machine generated
or human generated.

MACHINE-GENERATED UNSTRUCTURED DATA examples:


Satellite images: This includes weather data or the data that the government captures in its
satellite surveillance imagery. Just think about Google Earth, and you get the picture.
Scientific data: This includes seismic imagery, atmospheric data, and high energy physics.
Photographs and video: This includes security, surveillance, and traffic video.
Radar or sonar data: This includes vehicular, meteorological, and oceanographic seismic
profiles.

HUMAN-GENERATED UNSTRUCTURED DATA examples:


Text internal to your company:
Think of all the text within documents, logs, survey results, and e-mails. Enterprise
information actually represents a large percent of the text information in the world today.
Social media data:
This data is generated from the social media platforms such as YouTube, Facebook, Twitter,
LinkedIn, and Flickr.
Mobile data: This includes data such as text messages and location information.
Website content:
This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram.

Semi-structured data
Semi-structured data is a kind of data that falls between structured and unstructured data.
Semi-structured data does not necessarily conform to a fixed schema (that is, structure) but
may be self-describing and may have simple label/value pairs.
Examples of semistructured data include EDI, SWIFT, and XML.
It can be explained more from this figure below:

VELOCITY
A conventional understanding of velocity typically considers how quickly the data is
arriving and stored, and its associated rates of retrieval. While managing all of that quickly is
goodand the volumes of data that we are looking at are a consequence of how quick the
data arriveswe believe the idea of velocity is actually something far more compelling than
these conventional definitions.
To accommodate velocity, a new way of thinking about a problem must start at the
inception point of the data. Rather than confining the idea of velocity to the growth rates
associated with your data repositories, we suggest you apply this definition to data in motion:
The speed at which the data is flowing. After all, were in agreement that todays enterprises
are dealing with petabytes of data instead of terabytes, and the increase in RFID sensors and
other information streams has led to a constant flow of data at a pace that has made it
impossible for traditional systems to handle.

3.5

How to Use Big Data to Give The Benefit for Company


There are many things that big data can be implemented in a company. For instance, it

can be used for logging, detecting fraud pattern, analyzing social media pattern, and many

more. In this part of discussion, we are going to break down some of it in order to fully
understand why big data is important and how it will help company to grow and advance.

IT for IT Log Analytics

This is one of the common uses for an inaugural Big Data project. All of those logs
and trace data generated by the operation of common IT solution implemented in a
company is considered as data exhaust.
Enterprises has a lot of data exhaust and can be pretty much pollutant if left around for
a couple hours or days if there is any case when its needed, and usually those data is
purged when this kind of event occurs. The problem is these data might have a
concentrated value, and IT shops need to figure a way to store and extract value from
it. IT nowadays have to be able to store logs and efficiently store them so these logs
need to be kept for emergencies and discarded as soon as possible. It is also can be
used for looking rare problems.
Nowadays log histories are retained, but usually, only for several days or weeks,
because there are too much data for conventional systems to store and making it
impossible to determine trends and issues within a span of a limited time period.
The nature of these logs is semi structured and raw making it not always suited for
traditional database processing. Log formats are constantly changing due to software
and hardware upgrades, so they cant be tied to strict inflexible analysis paradigm.
Enterprises are trying to get better insight into how their systems are running and
when how things break down. IBM helped them leverage a Big Data platform that is
able to analyze approximately 1TB of log data each day. They are now able to
decipher what is happening across the entire stack with each and every transaction.
They are able to start to develop a base of knowledge from it to anticipate and
understand the interaction between failures, able to generate best-practice remediation
steps in the event of specific problem, or even retune the infrastructure to eliminate
them.
Fraud Detection Pattern
Pretty much anywhere some sort of financial transaction is involved presents a
potential for misuse and the ubiquitous specter of fraud. By leveraging Big Data
platform, enterprise has the opportunity to identify, or even stop it from happening.

Figure 3-1

Modern-day fraud detection ecosystem synergizes a Big Data platform with

traditional processes.

A modern-day fraud detection ecosystem provides a low-cost Big Data platform form
exploratory modelling and discovery. This data can be leveraged by traditional
systems either directly or through integration into existing data quality and governance
protocols. The addition of InfoSphere Streams also provides the ecosystem analytics
for data-in-motion and data-at-rest.

In the implementation in an enterprise, its discovered that they could not only
improve just how quickly they were able to speed up the build and refresh their fraud
detection model, but it also provides broader and more accurate insight. A process that
once took about three weeks from the transaction hit the transaction switch until
occurs potential fraud and turned the latency into just a couple hours. The fraud
detection models built also broader by roughly 50 percent than the previous set of
data.

Social Media Pattern


We can use Big Data to figure out customers rating about enterprises, and this
can be used to figure out the impacts of the decisions made by the executives in the
enterprise and the way they engage their customers. Specifically, we can determine
how sentiment is impacting sales, effectiveness or receptiveness of marketing

campaigns, accuracy of marketing mix, and many more. Those data can be processed
and give a basic insight of people opinion and their sentiment.
But in the end, the more important question is why people says what they say
and why are they behaving in such way. To answer it requires enriching the social
media feeds with additional and differently shaped information thats likely residing in
other enterprise systems. In order to do that enterprise has to look beyond that;
enterprise has to look at the interaction of what people are doing with their behaviour,
financial trends, actual transactions, and so on.

3.6

Why Many Indonesia Companies dont Use Cloud Computing?


Cloud Computing is a new thing in Indonesia and maybe some people still
dont know about Cloud Computing. Because this is still new, there are some issues in
introducing this Cloud Computing to Indonesia people. Example: Cloud computing is
a data storage by internet network, so internet is an important thing in this Cloud
Computing. If theres a problem with internet connection, it will make the computer
becomes slower because the long process.
The other issue is if a company uses cloud computing as data storage, so the
company will depend with the vendor (provider of cloud computing service) because
the company doesnt have enough direct server in cloud computing, and also if the
vendor has a bad backup service or broken server, it will cause loss for the company.
If a company plans to use cloud computing, big bandwidth must be provided as
another main thing. Big bandwidth will support storing the data that is transferred.
Security and privacy issues become the other new issues because when the company
use internet, so it can be seen by other people (it can be from out of company), and if
the management is bad, fatal error will be the worst thing. Lack of supporting from
other departments to use cloud computing service is also the reason why Indonesias
company doesnt use cloud computing.
Besides that, there are crackers or hackers that can access the data without
permission and get the important things from it. So vendor of cloud computing still
works to manage the source that is used in cloud computing service.

3.7

Measuring the Value of Investment of Big Data


For an enterprise that has implemented Big Data, its most likely to determine the

value of the investment is by seeing the impact of the implementation. Big Data will improve
efficiency in analyzing various data at a time. But it is capable far more than that. If an
enterprise able to analyze every detail data they have got, they could win the market because
they are able to determine and direct the sentiment of the market for their benefits.

Chapter 4
CONCLUSION AND SUGGESTION

4.1 CONCLUSION
In cloud computing, the word cloud (also phrased as "the cloud") is used as a
metaphor for "the Internet," so the phrase cloud computing means "a type of Internetbased computing," where different services such as servers, storage and applications are
delivered to an organization's computers and devices through the Internet. There are
several services of cloud computing such as: SaaS, PaaS, and IaaS. Indonesia also has
provider that gives cloud service like biznet, Lintas Media Danawa, and Telkomsigma.
Cloud computing is still categorized as new, and not many companies use cloud
computing because they still consider about internet speed, vendor dependency.
bandwith, and security & privacy matters. For the big data, it has 3 characteristics such
as: volume , velocity , variety that must be considered in managing it.

4.2 SUGGESTION
In using cloud, companies still consider about internet speed, vendor dependency.
bandwith, and security & privacy matters. So what we would recommend is that the
companies buys their own private cloud network to prevent security&privacy issues.
The bandwith should be adjusted with the estimated of bandwith that will be used. The
bigger the company, the more bandwith and storage also privacy needed.
Also for the next issue: As we know, organizations today are facing more and
more Big Data challenges. They have access to a wealth of information, but they dont
know how to get value out of it because it is sitting in its most raw form or in a
semistructured or unstructured format; and as a result, they dont even know whether
its worth keeping (or even able to keep it for that matter). Actually by using big data,
company can do IT log analytics. We think by using the big data would make good
investment for the company itself.

REFERENCES
Books:
Hurwitz, J. S. (2013). Big Data For Dummies. New Jersey: John Wiley & Sons, Inc.
O'Brien, J. A. (2004). In Management Information Systems : Managing Information
Technology in the Business Enterprise (Vol. 6th). New York: McGraw-Hill.
Satzinger, J. W., Jackson, R. B., & Burd, S. D. (2005). Object-Oriented Analysis and Design
with the Unified Process. Boston: Course Technology, Cengage Learning.
Zikopoulos, P. C. (2012). Understanding Big Data: Analytics for Enterprise Class Hadoop
and Streaming Data. United States: McGraw-Hill.

Websites:
Investopedia. (t.thn.). Bank Definition | Investopedia. Diambil kembali dari Investopedia:
http://www.investopedia.com/terms/b/bank.asp
Merriam Webster. (t.thn.). Diambil kembali dari http://www.merriamwebster.com/dictionary/provider
Webopedia. (n.d.). Retrieved from
http://www.webopedia.com/TERM/C/cloud_computing.html
Wikipedia. (t.thn.). Diambil kembali dari http://en.wikipedia.org/wiki/Internet
Wikipedia. (2014, 4 18). Business - Wikipedia, the free encyclopedia. Diambil kembali dari
Wikipedia: http://en.wikipedia.org/wiki/Business
Wikipedia. (2014, 4 3). Company - Wikipedia, the free encyclopedia. Diambil kembali dari
Wikipedia: http://en.wikipedia.org/wiki/Company
Wikipedia. (2014, May 21). Information and communication technology - Wikipedia, the free
encyclopedia. Diambil kembali dari Wikipedia, the free encyclopedia:
en.wikipedia.org/wiki/Information_and_communications_technology
Wikipedia. (2014, 4 19). System - Wikipedia, the free encyclopedia. Diambil kembali dari
Wikipedia: http://en.wikipedia.org/wiki/System

You might also like