You are on page 1of 21

Introduction to Data Analytics

Data analytics (DA) is the science of examining raw data with the purpose of drawing
conclusions about that information. Data analytics is used in many industries to allow companies
and organization to make better business decisions and in the sciences to verify or disprove
existing models or theories. Data analytics is distinguished from data mining by the scope,
purpose and focus of the analysis. Data miners sort through huge data sets using sophisticated
software to identify undiscovered patterns and establish hidden relationships. Data analytics
focuses on inference, the process of deriving a conclusion based solely on what is already known
by the researcher.

The science is generally divided into exploratory data analysis (EDA), where new features in the
data are discovered, and confirmatory data analysis (CDA), where existing hypotheses are
proven true or false. Qualitative data analysis (QDA) is used in the social sciences to draw
conclusions from non-numerical data like words, photographs or video. In information
technology, the term has a special meaning in the context of IT audits, when the controls for an
organization's information systems, operations and processes are examined. Data analysis is used
to determine whether the systems in place effectively protect data, operate efficiently and
succeed in accomplishing an organization's overall goals.

The term "analytics" has been used by many business intelligence (BI) software vendors as a
buzzword to describe quite different functions. Data analytics is used to describe everything from
online analytical processing (OLAP) to CRM analytics in call centers. Banks and credit cards
companies, for instance, analyze withdrawal and spending patterns to prevent fraud or identity
theft. Ecommerce companies examine Web site traffic or navigation patterns to determine which
customers are more or less likely to buy a product or service based upon prior purchases or
viewing trends. Modern data analytics often use information dashboards supported by real-time
data streams. So-called real-time analytics involves dynamic analysis and reporting, based on
data entered into a system less than one minute before the actual time of use.

Big data analytics is a trending practice that many companies are adopting. Before jumping in
and buying big data tools, though, organizations should first get to know the landscape. Data
analytics examines data to uncover hidden patterns, correlations and other insights. With todays
technology, its possible to analyze your data and get answers from it almost immediately an
effort thats slower and less efficient with more traditional business intelligence solutions.
Types of Data Analytics
There are four basic types of data analytics that are being used by the corporations. The four
types of data analytics are described below:

1. PRESCRIPTIVE ANALYTICS: is really valuable, but largely not used. According to


Gartner, 13 percent of organizations are using predictive but only 3 percent are using
prescriptive analytics. Where big data analytics in general sheds light on a subject,
prescriptive analytics gives you a laser-like focus to answer specific questions. For
example, in the health care industry, you can better manage the patient population by
using prescriptive analytics to measure the number of patients who are clinically obese,
then add filters for factors like diabetes and LDL cholesterol levels to determine where to
focus treatment. The same prescriptive model can be applied to almost any industry target
group or problem.

2. PREDICTIVE ANALYTICS: use big data to identify past patterns to predict the future. For
example, some companies are using predictive analytics for sales lead scoring. Some
companies have gone one step further use predictive analytics for the entire sales process,
analyzing lead source, number of communications, types of communications, social
media, documents, CRM data, etc. Properly tuned predictive analytics can be used to
support sales, marketing, or for other types of complex forecasts.

3. DIAGNOSTIC ANALYTICS: are used for discovery or to determine why something


happened. For example, for a social media marketing campaign, you can use descriptive
analytics to assess the number of posts, mentions, followers, fans, page views, reviews,
pins, etc. There can be thousands of online mentions that can be distilled into a single
view to see what worked in your past campaigns and what didnt.

4. DESCRIPTIVE ANALYTICS: or data mining are at the bottom of the big data value chain,
but they can be valuable for uncovering patterns that offer insight. A simple example of
descriptive analytics would be assessing credit risk; using past financial performance to
predict a customers likely financial performance. Descriptive analytics can be useful in
the sales cycle, for example, to categorize customers by their likely product preferences
and sales cycle.

Top 10 Tools of Data Analytics For Business


Although the challenge of collecting and analyzing "Big Data" requires some complex and
technical solutions, the fact is, that most businesses do not realize what they are currently
capable of. Specifically, there are a number of exceptionally powerful analytical tools that are
free and open source that anyone can leverage today to enhance their business and develop
skills. These 10 tools of data analytics have been chosen because of their free availability (for
personal use), ease of use (no coding and intuitively designed), powerful capabilities (beyond
basic excel), and well-documented resources.

1. TABLEAU PUBLIC: Tableau democratizes visualization in an elegantly simple and


intuitive tool. It is exceptionally powerful in business because it communicates insights
through data visualization. Although great alternatives exist, Tableau Public's million row
limit provides a great playground for personal use and the free trial is more than long
enough to get you hooked. In the analytics process, Tableau's visuals allow you to quickly
investigate a hypothesis, sanity check your gut, and just go explore the data before
embarking on a treacherous statistical journey.
2. OPEN REFINE: Formerly Google Refine, Open Refine is a data cleaning software that
allows a person to get everything ready for analysis. For example- Recently, I was
cleaning up a database that included chemical names and noticed that rows had different
spellings, capitalization, spaces, etc. that made it very difficult for a computer to process.
Fortunately, Open Refine contains a number of clustering algorithms (groups together
similar entries) and makes quick work of an otherwise messy problem.
3. KNIME: KNIME allows one to manipulate, analyze, and modeling data in an incredibly
intuitive way through visual programming. Essentially, rather than writing blocks of
code, you drop nodes onto a canvas and drag connection points between activities. More
importantly, KNIME can be extended to run R, python, text mining, chemistry data, etc.,
which gives a person the option to dabble in the more advanced code driven analysis.
**TIP- Use "File Reader" instead of CSV reader for CSV files. Strange quirk of the
software.
4. RAPID MINER: Much like KNIME, Rapid Miner operates through visual programming
and is capable of manipulating, analyzing and modeling data. Most recently, Rapid Miner
won KD nuggets software poll, demonstrating that data science does not need to be a
counter-intuitive coding endeavor.
5. GOOGLE FUSION TABLES: Google Fusion tables are an incredible tool for data
analysis, large data-set visualization, and mapping. Not surprisingly, Google's incredible
mapping software plays a big role in pushing this tool onto the list.
6. NODEXL: NodeXL is visualization and analysis software of networks and relationships.
Think of the giant friendship maps you see that represent LinkedIn or Facebook
connections. NodeXL takes that a step further by providing exact calculations. If you're
looking for something a little less advanced, check out the node graph on Google Fusion
Tables, or for a little more visualization try out Gephi.
7. IMPORT.IO: Web scraping and pulling information off of websites used to be something
reserved for the nerds. Now with Import.io, everyone can harvest data from websites and
forums. A person simply have to highlight what he wants and in a matter of minutes
Import.io walks him through and "learns" what he was looking for. From there, Import.io
will dig, scrape, and pull data for someone to analyze or export.
8. GOOGLE SEARCH OPERATORS: Google is an undeniably powerful resource and
search operators just take it a step up. Operators essentially allow anyone to quickly filter
Google results to get to the most useful and relevant information. For instance, say
Sanghamitra is looking for a Data science report published this year from ABC
Consulting.
9. SOLVER: Solver is an optimization and linear programming tool in excel that allows a
person to set constraints. Although advanced optimization may be better suited for
another program (such as R's optim package), Solver will make quick work of a wide
range of problems.
10. WOLFRAMALPHA: Wolfram Alpha's search engine is one of the web's hidden gems and
helps to power Apple's Siri. Beyond snarky remarks, Wolfram Alpha is the nerdy Google,
provides detailed responses to technical searches and makes quick work of calculus
homework. For business users, it presents information charts and graphs, and is excellent
for high level pricing history, commodity information, and topic overviews.

Powering Analytics: Inside Big Data And Advanced


Analytics Tools
A Google search for big data analytics yields a long list of vendors. However,
many of these vendors provide big data platforms and tools that support the
analytics process -- for example, data integration, data preparation and other
types of data management software. We focus on tools that meet the
following criteria

They provide the analyst with advanced analytics algorithms and models.

They're engineered to run on big data platforms such as Hadoop or specialty high-
performance analytics systems.

They're easily adaptable to use structured and unstructured data from multiple sources.

Their performance is capable of scaling as more data is incorporated into analytical


models.

Their analytical models can be or already are integrated with data visualization and
presentation tools.

They can easily be integrated with other technologies.


The Characteristics That The Tools Must Include
CLUSTERING AND SEGMENTATION: which divides a large collection of entities into
smaller groups that exhibit some (potentially unanticipated) similarities. An example is
analyzing a collection of customers to differentiate smaller segments for targeted
marketing.

CLASSIFICATION: which is a process of organizing data into predefined classes based


on attributes that are either pre-selected by an analyst or identified as a result of a
clustering model. An example is using the segmentation model to determine into which
segment a new customer would be categorized.

REGRESSION: which is used to discover relationships among a dependent variable and


one or more independent variables, and helps determine how the dependent variable's
values change in relation to the independent variable values. An example is using
geographic location, mean income, average summer temperature and square footage to
predict the future value of a property.

ASSOCIATION AND ITEM SET MINING: which looks for statistically relevant
relationships among variables in a large data set. For example, this could help direct call-
center representatives to offer specific incentives based on the caller's customer segment,
duration of relationship and type of complaint.

SIMILARITY AND CORRELATION: which is used to inform undirected clustering


algorithms. Similarity-scoring algorithms can be used to determine the similarity of
entities placed in a candidate cluster.

NEURAL NETWORKS: which are used in undirected analysis for machine learning
based on adaptive weighting and approximation.

This is just a subset of the types of analyses used for predictive and prescriptive analytics. In
addition, different vendors are likely to provide a variety of algorithms supporting each of the
different methods.

How Data Analytics Tools Can Help The Organization


Big data analytics is a trending practice that many companies are adopting.
Before jumping in and buying big data tools, though, organizations should
first get to know the landscape.

The analytics process, including the deployment and use of big data analytics tools, can help
companies improve operational efficiency, drive new revenue and gain competitive advantages
over business rivals. But there are different types of analytics applications to consider. For
example, descriptive analytics focuses on describing something that has already happened, as
well as suggesting its root causes. Descriptive analytics, which remains the lion's share of the
analysis performed, typically hinges on basic querying, reporting and visualization of historical
data.

Alternatively, more complex predictive and prescriptive modeling can help companies anticipate
business opportunities and make decisions that affect profits in areas such as targeting marketing
campaigns, reducing customer churn and avoiding equipment failures. With predictive analytics,
historical data sets are mined for patterns indicative of future situations and behaviors, while
prescriptive analytics subsumes the results of predictive analytics to suggest actions that will best
take advantage of the predicted scenarios.

In many environments, the processing and data storage demands of advanced analytics
applications have limited their adoption -- but those barriers are beginning to fall. The growing
availability of big data platforms and big data analytics tools has enabled environments in which
predictive and prescriptive analytics applications can scale to handle massive data volumes
originating from a wide variety of sources.

The Advanced Analytics Market


The market for advanced analytics tools has evolved over time, and the types of tools that are
available vary in degree of maturity and, consequently, in capability and ease of use. For
example, there are tools with relatively long histories from some mega-vendors like IBM, Oracle
and SAS. Other large vendors have acquired companies whose tools have a more recent history,
such as those provided by Microsoft, Dell, Teradata and SAP.

A number of smaller companies provide big data analytics products, including Angoss,
Predixion, Alteryx, Alpine Data Labs, Pentaho, KNIME and Rapid Miner. In some cases,
companies have developed their own suite of algorithms. Others have adapted the open source
statistical R language and provide predictive and prescriptive modeling capabilities using R's
features, or use the software from the open source Wake project.
A third category of products are those available as open source technologies. Examples include
the previously mentioned R language, the Mahout Software distribution that's part of the Hadoop
stack, and Weka.

In some of these cases (particularly with the mega-vendors), the big data analytics tools are
incorporated into larger big data enterprise suites. In others, the tools are sold as standalone
products. In the latter case, it's the customer's job to integrate with the big data platform being
deployed. Most of the tools provide a visual interface to guide the analytics processes (data
mining/discovery analysis, evaluation and scoring of models, integration with operational
environments), and in most cases, the vendors provide guidance and services to get the customer
up and running.

Who Uses Big Data And Advanced Analytics Tools


While some individuals in the organization are looking to explore and devise new predictive
models, others look to embed these models within their business processes, and still others will
want to understand the overall impact that these tools will have on the business. In other words,
organizations that are adopting big data analytics need to accommodate a variety of user types,
such as:

The data scientist: who likely performs more complex analyses involving more complex
data types and is familiar with how underlying models are designed and implemented to
assess inherent dependencies or biases.

The business analyst: who is likely a more casual user looking to use the tools for
proactive data discovery or visualization of existing information, as well as some
predictive analytics.

The business manager: who is looking to understand the models and conclusions.

IT developers, who support all the prior categories of users.

All of these roles would typically work together in the model development lifecycle. The data
scientist subjects a swath of big data sets to the undirected analyses provided, and looks for any
patterns that would be of business interest. After engaging the business analyst to review how the
models work and evaluate how each of those discovered models or patterns could potentially
positively affect the business, the business manager and IT teams are brought in to embed or
integrate the models into business processes or devise new processes around the models.

From a market perspective, though, it's interesting to consider the types of businesses that are
embracing big data analytics. Many of the early users of big data technologies were Internet
companies (e.g., Google, Yahoo, Facebook, LinkedIn and Netflix) or analytics services
providers. Each of these companies relied on operational and analytical applications requiring
fast-flowing streams of data to ingest process, analyze, and then feed the results back to
continuously improve performance.

As appetites for data expand among companies in more mainstream industries, big data analytics
has found a place in a more general corporate population. In the past, the cost factors for a large-
scale analytics platform would have limited the adoption to only the very largest businesses.
However, the availability of utility-style hosted big data platforms (such as those available via
Amazon Web Services) and the ability to instantiate big data platforms such as Hadoop on-
premises without a large investment have reduced the barrier to entry. In addition, open data sets
and accessibility to fire hose data feeds from social media channels provide the raw material for
larger-scale data analyses when blended with internal data sets.

Larger businesses may still opt for high-end big data analytics tools, but lower-cost alternatives
deployed on cost-effective platforms enable small and medium-size businesses to evaluate and
launch big data analytics programs and achieve the desired business improvement results.

Now that we've examined the different types of tools and their uses, the next step is to determine
how these tools could benefit your company. By taking a look at the various use cases for big
data analytics, you will begin to see where a general big data analytics capability can be
leveraged for creating and enhancing value.

10 Data Analytics Privacy Problems


Big data analytics are being used more widely every day for an even wider number of reasons.
These new methods of applying analytics certainly can bring innovative improvements for
business. For example, retail businesses are successfully using big data analytics to predict the
hot items each season, and to predict geographic areas where demand will be greatest, just to
name a couple of uses. The power of big data analytics is so great that in addition to all the
positive business possibilities, there are just as many new privacy concerns being created. Here
are ten of the most significant privacy risks.
1. PRIVACY BREACHES AND EMBARRASSMENTS

The actions taken by businesses and other organizations as a result of big data analytics
may breach the privacy of those involved, and lead to embarrassment and even lost jobs.
Consider that some retailers have used big data analysis to predict such intimate personal
details such as the due dates of pregnant shoppers. In such cases subsequent marketing
activities resulted in having members of the household discover a family member was
pregnant before she had told anyone, resulting in an uncomfortable and damaging family
situation. Retailers, and other types of businesses, should not take actions that result in
such situations.

2. ANONYMIZATION COULD BECOME IMPOSSIBLE

With so much data, and with powerful analytics, it could become impossible to
completely remove the ability to identify an individual if there are no rules established for
the use of anonymized data files. For example, if one anonymized data set was combined
with another completely separate data base, without first determining if any other data
items should be removed prior to combining to protect anonymity, it is possible
individuals could be re-identified. The important and necessary key that is usually
missing is establishing the rules and policies for how anonymized data files can be
combined and used together.

3. DATA MASKING COULD BE DEFEATED TO REVEAL PERSONAL


INFORMATION

If data masking is not used appropriately, big data analysis could easily reveal the actual
individuals who data has been masked. Organizations must establish effective policies,
procedures and processes for using data masking to ensure privacy is preserved. Since big
data analytics is so new, most organizations dont realize there are risks, so they use data
masking in ways that could breach privacy. Many resources are available, such as those
from IBM, to provide guidance in data masking for big data analytics.

4. UNETHICAL ACTIONS BASED ON INTERPRETATIONS

Data analytics can be used to try and influence behaviors. There are my ethical issues
with driving behavior. Just because you CAN do something doesnt mean you should.
For example, in the movie The Fight Club, Ed Nortons characters job was to determine
if an automobile manufacturer should do a recall based strictly on financial consideration,
without taking into account the associated health risks. Or, in other words, if it is cheaper
for people to be killed or injured instead of fixing the faulty equipment in the vehicles.
Big data analytics can be used by organizations to make a much wider variety of business
decisions that do not take into account the human lives that are involved. The potential to
reveal personal information because it is not illegal, but can damage the lives of
individuals, must be considered.

5. DATA ANALYTICS ARE NOT 100% ACCURATE

While big data analytics are powerful, the predictions and conclusions that result are not
always accurate. The data files used for big data analysis can often contain inaccurate
data about individuals, use data models that are incorrect as they relate to particular
individuals, or simply be flawed algorithms (the results of big data analytics are only as
good, or bad, as the computations used to get those results). These risks increase as more
data is added to data sets, and as more complex data analysis models are used without
including rigorous validation within the analysis process. As a result, organizations could
make bad decisions and take inappropriate and damaging actions. When decisions
involving individuals are made based upon inaccurate data or flawed models, as a result
individuals can suffer harm by being denied services, being falsely accused or
misdiagnosed, or otherwise be treated inappropriately.

6. DISCRIMINATION

Using big data analytics to try and choose job candidates, give promotions, etc. may
backfire if the analytics are not truly objective. Discrimination has been a problem for
years of course, but the danger is that big data analytics makes it more prevalent, a kind
of automated discrimination if you will. For example, a bank or other type of financial
organization may not be able to tell by a credit application the applicants race or sexual
orientation (since it is generally illegal to base such a credit decision upon race), but
could deduce race or sexual orientation based upon a wide variety of data, collected
online and through the Internet of Things (IoT), using big data analytics to then turn
down a loan to an individual after obtaining and learning such information.

7. FEW (IF ANY) LEGAL PROTECTIONS EXIST FOR THE INVOLVED


INDIVIDUALS

Most organizations still only address privacy risks as explicitly required by existing data
protection laws, regulations and contractual requirements. While the U.S. White House,
the Federal Trade Commission, and others, have recently expressed concern about the
privacy risks that are created within using big data analytics, there are no legal
requirements for how to protect privacy while using big data analytics.

8. BIG DATA WILL PROBABLY EXIST FOREVER


There are many studies and articles regarding the use of big data and data analytics used
in organizations. But there is no organization found that indicate they will delete big data
repositories. In fact, all have indicated that they instead typically view them as infinitely
growing repositories; the bigger the better! As more data is collected and retained, the
more easily analytics will be able to determine more insights into individuals lives.

9. CONCERNS FOR E-DISCOVERY

There was a flurry of articles written about the e-discovery problems created by big data
analytics in the past year. The e-discovery process generally requires organizations to
identify and produce documents relevant to litigation. When dealing with millions of
documents, as most organizations now have in their repositories, this becomes an
expensive, time-consuming activity. A big data analytics using an approach called
predictive coding is now starting to be used on the huge repositories to more quickly
narrow down the documents most likely to be necessary for litigation, and then allow
individuals the ability to more closely review. There are concerns that by using such
analytics to produce documents an organization may be accused of not including all the
necessary documents.

10. MAKING PATENTS AND COPYRIGHTS IRRELEVANT

There is concern that big data could make patents harder to obtain because patent offices
will not be able to verify if a submitted patent is unique since there will be too much data
to check through within all the growing numbers of big data repositories. Big data could
make copyrights a thing of the past because it will be too hard to control information that
can be hidden or propagated infinitely within big data repositories. As an associated
effect, the royalties associated with copyrighted information are expected to decrease or
possibly disappear altogether.

The Future Of Analytics Top 5 Predictions For 2016

MACHINE LEARNING ESTABLISHED IN THE ENTERPRISE


Machine learning dates back to at least 1950 but until recently has been the domain of
elites and subject to winters of inattention. I predict that it is here to stay, because large
enterprises are embracing it. In addition to researchers and digital natives, these days
established companies are asking how to move machine learning into production. Even in
regulated industries, where low interpretability of models has historically choked their
usage, practitioners are finding creative ways to use machine learning techniques to select
variables for models, which can then be formulated using more commonly accepted
techniques.

Expect greater interest across academic disciplines, because machine learning benefits
from many different approaches. Consider the popular keynote from the INFORMS
Annual Meeting last year, where Dimitris Bertsimas talked about Statistics and Machine
Learning via a Modern Optimization Lens.

INTERNET OF THINGS HYPE HITS REALITY


The Internet of Things (IoT) is at the peak of the Gartner Hype Cycle, but in 2016 I
expect this hype to hit reality. One real barrier is plumbing theres a lot of it! One of my
colleagues is analyzing the HVAC system on our newest building as an IoT test project.
The building is replete with sensors, but getting to the data was not easy. Facilities told
him data are the domain of IT, who then sent him to the manufacturer, because while the
HVAC system collects the data, it is sent to the manufacturer.

Data ownership is an emerging issue you produce the data but may not have access to
it. An even larger challenge for IoT will be to prove its value. There are limited
implementations of IoT in full production at the enterprise level. The promise of IoT is
fantastic, so in 2016 look to early adopters to work out the kinks and deliver results.

BIG DATA MOVES BEYOND HYPE TO ENRICH MODELING


Big data has moved beyond hype to provide real value. Modelers today can access a
wider then ever range of data types (e.g., unstructured data, geospatial data, images,
voice), which offer great opportunities to enrich models. Another new gain from big data
is due to competitions, which have moved beyond gamification to provide real value via
crowdsourcing and data sharing. Consider the Prostate Cancer DREAM Challenge, where
teams were challenged to address open clinical research questions using anonymized data
provided by four different clinical trials run by multiple providers, much of it publicly
available for the first time. An unprecedented number of teams competed, and winners
beat existing models developed by the top researchers in the field.

CYBER SECURITY IMPROVED VIA ANALYTICS


And as IoT grows, the growing use of sensors must thrill cybercriminals, who use these
devices to hack in using a slow but insidious Trojan Horse approach. Many traditional
fraud detection techniques do not apply, because detection is no longer seeking one rare
event but requires understanding an accumulation of events in context. Similar to IoT,
one challenge of cyber security involves data, because streaming data is managed and
analyzed differently. I expect advanced analytics to shed new light on detection and
prevention as our methods catch up with the data.

Unfortunately, growing methods for big data collaboration are off limits, because we
dont want the bad guys to know how well find them, and much of the best work is done
behind high security clearance. But that won't stop SAS and others from focusing heavily
on cyber security in 2016.

ANALYTICS DRIVES INCREASED INDUSTRY-ACADEMIC INTERACTION

The Institute for Advanced Analytics (IAA) at NC State University tracks the growth in
analytics masters programs, and new programs seem to pop up daily. Industry demand for
recruits fuels this growth, but I see increased interest in research. More companies are
setting up academic outreach with an explicit interest in research collaborations.
Sometimes this interest goes beyond partnership and into direct hiring of academic
superstars, who either take sabbaticals work on the side, or even go back and forth. For
example, top machine learning researcher Yann LeCun worked at Bell Labs, became a
professor at NYU, was the founding director of the NYU Center for Data Science, and
now leads Artificial Intelligence Research at Facebook.

INFORMS supports this academic-industry collaboration by providing academics a


resource of teaching materials related to analytics. In 2016 INFORMS will offer industry
a searchable database of analytics programs to facilitate connections and the
new Associate Certified Analytics Professional credential to help vet recent graduates.

How Companies are using Data Analytics


Senior leaders provide insight into the challenges and opportunities of using the data analytics.
Few disputes are created when organizations have more data than ever at their disposal. But
actually deriving meaningful insights from that dataand converting knowledge into actionis
easier said than done. Some big leaders have given the insight of the challenges and
opportunities that are involved in adopting the data analytics. The challenges are stated below:

Murli Buluswar, chief science officer, AIG: The biggest challenge of making the evolution
from a knowing culture to a learning culturefrom a culture that largely depends on heuristics in
decision making to a culture that is much more objective and data driven and embraces the
power of data and technologyis really not the cost. Initially, it largely ends up being
imagination and inertia.
Ruben Sigala, chief analytics officer, Caesars Entertainment: What he has found challenging
is finding the set of tools that enable organizations to efficiently generate value through the
process. He said-I hear about individual wins in certain applications, but having a more sort of
cohesive ecosystem in which this is fully integrated is something that I think we are all
struggling with, in part because its still very early days. Although weve been talking about it
seemingly quite a bit over the past few years, the technology is still changing; the sources are
still evolving.

Zoher Karu, vice president, global customer optimization and data, eBay: One of the biggest
challenges is around data privacy and what is shared versus what is not shared. And his
perspective on that is consumers are willing to share if theres value returned. One-way sharing is
not going to fly anymore. So how do we protect and how do we harness that information and
become a partner with our consumers rather than kind of just a vendor for them?

Capturing Impact From Analytics


Ruben Sigala: You have to start with the charter of the organization. You have to be very
specific about the aim of the function within the organization and how its intended to interact
with the broader business. There are some organizations that start with a fairly focused view
around support on traditional functions like marketing, pricing, and other specific areas. And
then there are other organizations that take a much broader view of the business. I think you have
to define that element first.

That helps best inform the appropriate structure, the forums, and then ultimately it sets the more
granular levels of operation such as training, recruitment, and so forth. But alignment around
how youre going to drive the business and the way youre going to interact with the broader
organization is absolutely critical. From there, everything else should fall in line. Thats how we
started with our path.

Vince Campisi, chief information officer, GE Software: One of the things weve learned is
when we start and focus on an outcome, its a great way to deliver value quickly and get people
excited about the opportunity. And its taken us to places we havent expected to go before. So
we may go after a particular outcome and try and organize a data set to accomplish that outcome.
Once you do that, people start to bring other sources of data and other things that they want to
connect. And it really takes you in a place where you go after a next outcome that you didnt
anticipate going after before. You have to be willing to be a little agile and fluid in how you think
about things. But if you start with one outcome and deliver it, youll be surprised as to where it
takes you next.

Ash Gupta, chief risk officer, American Express: The first change we had to make was just to
make our data of higher quality. We have a lot of data, and sometimes we just werent using that
data and we werent paying as much attention to its quality as we now need to. That was, one, to
make sure that the data has the right lineage, that the data has the right permissible purpose to
serve the customers. This, in my mind, is a journey. We made good progress and we expect to
continue to make this progress across our system.

The second area is working with our people and making certain that we are centralizing some
aspects of our business. We are centralizing our capabilities and we are democratizing its use. I
think the other aspect is that we recognize as a team and as a company that we ourselves do not
have sufficient skills, and we require collaboration across all sorts of entities outside of American
Express. This collaboration comes from technology innovators, it comes from data providers, it
comes from analytical companies. We need to put a full package together for our business
colleagues and partners so that its a convincing argument that we are developing things together,
that we are colearning, and that we are building on top of each other.

Examples Of Impact
Victor Nilson, senior vice president, big data, AT&T: We always start with the customer
experience. Thats what matters most. In our customer care centers now, we have a large number
of very complex products. Even the simple products sometimes have very complex potential
problems or solutions, so the workflow is very complex. So how do we simplify the process for
both the customer-care agent and the customer at the same time, whenever theres an interaction?

Weve used data analytics techniques to analyze all the different permutations to augment that
experience to more quickly resolve or enhance a particular situation. We take the complexity out
and turn it into something simple and actionable. Simultaneously, we can then analyze that data
and then go back and say, Are we optimizing the network proactively in this particular case?
So, we take the optimization not only for the customer care but also for the network, and then tie
that together as well.

Vince Campisi: Ill give you one internal perspective and one external perspective. One is we
are doing a lot in what we call enabling a digital threadhow you can connect innovation
through engineering, manufacturing, and all the way out to servicing a product. [For more on the
companys digital thread approach, see GEs Jeff Immelt on digitizing in the industrial
space.] And, within that, weve got a focus around brilliant factory. So, take driving supply-
chain optimization as an example. Weve been able to take over 60 different silos of information
related to direct-material purchasing, leverage analytics to look at new relationships, and use
machine learning to identify tremendous amounts of efficiency in how we procure direct
materials that go into our product.
An external example is how we leverage analytics to really make assets perform better. We call it
asset performance management. And were starting to enable digital industries, like a digital
wind farm, where you can leverage analytics to help the machines optimize themselves. So you
can help a power-generating provider who uses the same wind thats come through and, by
having the turbines pitch themselves properly and understand how they can optimize that level of
wind, weve demonstrated the ability to produce up to 10 percent more production of energy off
the same amount of wind. Its an example of using analytics to help a customer generate more
yields and more productivity out of their existing capital investment.

Winning The Talent War


Ruben Sigala: Competition for analytical talent is extreme. And preserving and maintaining a
base of talent within an organization is difficult, particularly if you view this as a core
competency. What weve focused on mostly is developing a platform that speaks to what we
think is a value proposition that is important to the individuals who are looking to begin a career
or to sustain a career within this field.

When we talk about the value proposition, we use terms like having an opportunity to truly affect
the outcomes of the business, to have a wide range of analytical exercises that youll be
challenged with on a regular basis. But, by and large, to be part of an organization that views this
as a critical part of how it competes in the marketplaceand then to execute against that
regularly. In part, and to do that well, you have to have good training programs, you have to have
very specific forms of interaction with the senior team. And you also have to be a part of the
organization that actually drives the strategy for the company.

Murli Buluswar: I have found that focusing on the fundamentals of why science was created,
what our aspirations are, and how being part of this team will shape the professional evolution of
the team members has been pretty profound in attracting the caliber of talent that we care about.
And then, of course, comes the even harder part of living that promise on a day-in, day-out basis.

Yes, money is important. My philosophy on money is I want to be in the 75th percentile range; I
dont want to be in the 99th percentile. Because no matter where you are, most people
especially people in the data-science functionhave the ability to get a 20 to 30 percent increase
in their compensation, should they choose to make a move. My intent is not to try and reduce
that gap. My intent is to create an environment and a culture where they see that theyre learning;
they see that theyre working on problems that have a broader impact on the company, on the
industry, and, through that, on society; and theyre part of a vibrant team that is inspired by why
it exists and how it defines success. Focusing on that, to me, is an absolutely critical enabler to
attracting the caliber of talent that I need and, for that matter, anyone else would need.
Case Study Data Analytics In Cloud Manufacturing
In this section, the application of Data Analytics in manufacturing systems is used as a case
study. A manufacturing system needs information systems to make decisions for the system
operations at different levels and domains. The complexity of an information system depends on
the numbers of inputs and outputs as well as their relations. In manufacturing applications,
numerous workers have investigated the impact of IT, such as mainstream computers, wireless
networks, and the Internet, on the development of information systems (Bi, Xu, & Wang, 2014;
Dumitrache & Caramihai, 2010; Koenig, 2013; Lee & Lapira, 2014; Wang, Wang, Gao, &
Vancza, 2014). In the following sections, we look into
(1) The change trends of scale and complexity of information systems;
(2) Available hardware and software IT; and
(3) The requirements of IoT-based information systems.

We pay special attention to the roles of Big Data in cloud manufacturing; since cloud
manufacturing has been identified as the most promising paradigm for next-generation
manufacturing systems.

EVOLUTION OF MANUFACTURING INFORMATION SYSTEMS


The advancement of a manufacturing system can be measured by scale, complexity, and
responsiveness of automation (Bi, 2011; Bi et al., 2008. The evolution of manufacturing
technologies is classified into the phases of using Numerical Control (NC)/Computer Numerical
Control (CNC) workstations, flexible manufacturing systems (FMSs), computer integrated
manufacturing (CIM), distributed manufacturing (DM), and predictive manufacturing (PM).
Typical software tools to support these manufacturing technologies are Quality Control (QC),
Total Quality Management (TQM), Enterprise Requirements Planning (ERP-I), Enterprise
Resources Planning (ERP-II), Product Lifecycle Management (PLM), and Software as a Service
(SaaS)/Platform as a Service (PaaS)/Infrastructure as a Service (IaaS), respectively.
Correspondingly, the volume, variety, velocity of the data involved in different information
systems has been increased gradually from stream data early in the digital era to big data now. IT
hardware systems must be capable of processing data in a timely manner. The computing
environments have evolved from Microchip, mainframe, servers, the Internet, to today's Cloud.

DATA FOR INFORMATION SYSTEMS


Technology makes it possible to connect people around the world; it brings new opportunities to
share knowledge and expertise via collaboration over the cloud (Wu, Greer, Rosen, & Schaefer,
2013). Cloud technology allows enterprises to transform their business models by catching new
business windows, improving productivity, reducing cost, accelerating deliveries, and improving
customer satisfaction and market share (Xu, 2012). The cloud concept was adopted by
manufacturers as Cloud Manufacturing (CM). CM corresponds to a cyber-physical system; it
offers on-demand manufacturing services with an optimized utilization of manufacturing
resources (Wang Bi, & Xu, 2014). CM is a customer-oriented manufacturing model. Enterprises
benefit from the share-to-gain philosophy so that manufacturing resources and expertise from
different resources are shared to provide participants with the enhanced capabilities for a high
level of customer satisfaction.

With a rapid development of wireless sensor networks and IoT, the data have become ubiquitous
and very accessible; it contributes to the Big Data manufacturing environment (Lee et al., 2013).
Today's information processing is becoming more and more powerful and flexible (Davis, 2014).
Information systems can benefit from Big Data greatly in fulfilling 5C functions, i.e.
connection with sensors and networks; cloud to store and provide data anytime and anywhere;
content to mine correlations and meanings; community to share data and promote social
interactions; and customization to personalize products or services (Lee et al., 2013).

The Big Data applications for a wider scope involve some technical challenges. Big Data is at
the heart of many cloud-based services, including CM. It is critical for users to have a clear
understanding of the requirements of Big Data applications, the capabilities of Big Data, and the
best practices for implementation (Cloud Standards Customer Council, 2014). Helo, Suorsa,
Hao, and Anussornnitisarn (2014) discussed the challenges of enterprise systems in the cloud. It
was found that extended enterprise support or cloud-based sharing approaches were lacking in
current IT solutions. It was essential to develop advanced platforms to support the needs of real-
time, cloud based, and lightweight operations. Lee et al. (2013) indicated that Big Data and
cyber-physical systems must take into account the productivity and efficiency of information
systems. Kumar, Niu, and Re (2013) indicated that the breakthroughs in Big Data were
anticipated in its capabilities of rapidly combining, deploying, and maintaining existing
algorithms. Considering the application in supply chain management, Waller and Fawcett (2013)
discussed the research challenges on data science and predictive analytics. Lee, Kao, and Yang
(2014) note that two technical trends in manufacturing applications are cyber-physical system-
based manufacturing and service innovations. With a special interest on manufacturing
applications, the research on Big Data needs to address the following challenges.

CHALLENGES OF BIG DATA IN MANUFACTURING


Big Data is changing business models for both information and communication technology (ICT)
providers and their customers, including manufacturing enterprises. Big Data is helping
manufacturing enterprises to process massive data and gain global competitiveness. In Big Data,
all of the application tools, platforms, and infrastructure are accessed as services over the cloud
(Wang, Wang, Gao, and Vancza 2014). Big Data has a great impact on manufacturing enterprises.
Enterprise systems rely on data to fine-tune supply chains, plan and schedule operations at the
shop floor, gauge consumer sentiment, and some go as far to say to make strategic plans based
on, high-level analytics on large-scale datasets (Oracle, 2013).
FRAMEWORKS FOR INTEGRATION
CM is a form of networked manufacturing; it offers manufacturing services over cloud (Hao &
Helo, 2014). CM calls for a new integration model that is distributed, and more interoperable,
smart, and adaptable in dealing with changes and uncertainties in the environment. When Big
Data is deployed in CM, it serves numerous functions, such as modeling system behaviors,
supporting interoperations, and ensuring tractability, agility, and robustness of enterprise
information systems (European Commission, 2013). Manufacturers should develop an
innovative infrastructure capable of utilizing ever-increasing data from structured or unstructured
sources in the heterogeneous environment (Papanagnou, 2014). Wang, Wang, Gao, and Vancza
(2014) suggested a service-oriented platform to support, recycle, reuse, and remanufacture. Hao
and Helo (2014) proposed a concept of cloud future factory to manage all distributed factories in
a case company; it was a matrix-type organizational structure. Esposito, Ficco, Palmieri, and
Castiglione (2014) proposed a knowledge-based platform based on publish/subscribe services for
Big Data. The purpose of such a platform was to address the issues of data heterogeneity and
advanced processing capabilities. Fiore et al. (2013) introduced the Ophidia platform, where a
hierarchical storage model, and distributed parallel framework, was to increase the scalability of
BIG DATAA; the developed Message Passing Interface was capable of running from single tasks
to very complex dataflow for scientific numerical studies.

ADVANCED BIG DATA TOOLS


Today's enterprises need to acquire large amounts of data from various sources and to leverage
the information by Big Data (Doulkeridis & Norvag, 2014). New analytics platforms are in
demand to deal with massively scalable data, support low-latency data, and accelerate advanced
analytics modeling and processing. In manufacturing applications, the number of software tools
is growing exponentially. Therefore, BIG DATAA tools have to be designed to take into account
an increase in volume of requests, size of data, computing load, locality and type of users
(Pandey & Nepal, 2013). BIG DATAA must be able to deal with a mixed public and private data
appropriately. Talia (2013) further discussed the expectations of Big Data tools with regard to the
following:

Programming Abstracts: Programming tools require sophisticated abstracting


structures; MapReduce is commonly used on clusters and clouds, but more scalable
higher-level models and tools are needed.
Interoperability: and openness of data and tools. Interoperability is a main challenge
in large-scale applications. The models and interfaces have to be standardized to support
interoperability.
System Integration: Service-oriented applications make it possible to run complex and
distributed workflows over heterogeneous platforms. BIG DATAA should be capable of
managing the system integration over the cloud.
Annotation Mechanisms: These are needed to develop new techniques to visualize and
mine provenance data. These solutions, as well as the tools for assuring data privacy and
security, are essential for Big Data in large-scale companies, and eventually benefit small
and medium-sized enterprises (SMEs).

PRIVACY PROTECTION
Privacy is particularly important when data are shared among industry sectors. Conventionally,
privacy relied largely on technological limitations to extract, analyze, and to correlate sensitive
datasets. However, the advances in Big Data make it possible to extract and correlate data much
easier. Therefore, the Big Data methods must take into consideration of privacy principles and
recommendations to ensure the safe application over the cloud. Data provenance is another
challenge. It is hard to validate that every data source meets the required trustworthiness to
produce acceptable results.

OTHER CHALLENGES
Different from the application of BIG DATA in other areas, CM operates on manufacturing
resources and associated services; domain-related services should be developed on the top of
SaaS (software as a service), PaaS (platform as a service), IaaS (infrastructure as a service), such
as Testing as a Service, Simulation as a Service, and Management as a Service, Production as a
Service, Design as a Service, (Jaleel, Rajendran, & George, 2014; Tao, Zhang, Venkatesh, Luo,
& Chen, 2011). Besides the technical challenges, the applications of Big Data in manufacturing
also face the difficulties of high financials costs, scarce solutions for vertical integrations, vendor
lock-in risk, lack of workers with IT skills, and lack of SME focus by IT vendors (KPMG 2011).
From the perspective of the hardware development, Zou, Yu, Tang, Chen, and Chen (2014)
indicated there is the gap between the required capacity of computation and the available
capacities of the high-end computing machines.

SUMMARY AND BIG DATA ANALYTICS CHALLENGES


The cloud is becoming the infrastructure for analytics services on pervasive and scalable data;
over 50% of large companies will adopt the cloud for data management by 2016 (Talia, 2013).
Extracting knowledge and wisdom from large data sets needs scalable and intelligent algorithms,
tools, and applications. Big Data and its tools have been developed and described to serve this
purpose; Big Data has been used in predictive analytics for decades (Earley, 2014). However,
Big Data Analytics is facing new challenges due to the rapid growth of data in terms of volume,
velocity, and variety from the IoT. In this paper, we summarize our findings from the literature
review on critical technologies, applications, research opportunities and challenges of Big Data
and Big Data Analytics.

Special interest should be paid to the applications of Big Data Analytics in cloud manufacturing.
Nowadays, the success of a manufacturing enterprise relies greatly on the advancement of IT to
support and enhance the value stream. Big Data Analytics tools help an information system to
capture, process, and use ubiquitous data from IoT effectively. It allows a manufacturing
enterprise to capture business opportunities, to readily adapt to change and to deal with
uncertainty promptly. However, the development of Big Data Analytics for cloud manufacturing
is preliminary; intensive research efforts are in demand to address the concerns about integration
frameworks, advanced Big Data Analytics tools, privacy protection, customized applications for
SMEs, and other challenges. This reported work will be used as a guide for us in developing
integrated Big Data Analytics tools for cloud manufacturing.

You might also like