Professional Documents
Culture Documents
Data analytics (DA) is the science of examining raw data with the purpose of drawing
conclusions about that information. Data analytics is used in many industries to allow companies
and organization to make better business decisions and in the sciences to verify or disprove
existing models or theories. Data analytics is distinguished from data mining by the scope,
purpose and focus of the analysis. Data miners sort through huge data sets using sophisticated
software to identify undiscovered patterns and establish hidden relationships. Data analytics
focuses on inference, the process of deriving a conclusion based solely on what is already known
by the researcher.
The science is generally divided into exploratory data analysis (EDA), where new features in the
data are discovered, and confirmatory data analysis (CDA), where existing hypotheses are
proven true or false. Qualitative data analysis (QDA) is used in the social sciences to draw
conclusions from non-numerical data like words, photographs or video. In information
technology, the term has a special meaning in the context of IT audits, when the controls for an
organization's information systems, operations and processes are examined. Data analysis is used
to determine whether the systems in place effectively protect data, operate efficiently and
succeed in accomplishing an organization's overall goals.
The term "analytics" has been used by many business intelligence (BI) software vendors as a
buzzword to describe quite different functions. Data analytics is used to describe everything from
online analytical processing (OLAP) to CRM analytics in call centers. Banks and credit cards
companies, for instance, analyze withdrawal and spending patterns to prevent fraud or identity
theft. Ecommerce companies examine Web site traffic or navigation patterns to determine which
customers are more or less likely to buy a product or service based upon prior purchases or
viewing trends. Modern data analytics often use information dashboards supported by real-time
data streams. So-called real-time analytics involves dynamic analysis and reporting, based on
data entered into a system less than one minute before the actual time of use.
Big data analytics is a trending practice that many companies are adopting. Before jumping in
and buying big data tools, though, organizations should first get to know the landscape. Data
analytics examines data to uncover hidden patterns, correlations and other insights. With todays
technology, its possible to analyze your data and get answers from it almost immediately an
effort thats slower and less efficient with more traditional business intelligence solutions.
Types of Data Analytics
There are four basic types of data analytics that are being used by the corporations. The four
types of data analytics are described below:
2. PREDICTIVE ANALYTICS: use big data to identify past patterns to predict the future. For
example, some companies are using predictive analytics for sales lead scoring. Some
companies have gone one step further use predictive analytics for the entire sales process,
analyzing lead source, number of communications, types of communications, social
media, documents, CRM data, etc. Properly tuned predictive analytics can be used to
support sales, marketing, or for other types of complex forecasts.
4. DESCRIPTIVE ANALYTICS: or data mining are at the bottom of the big data value chain,
but they can be valuable for uncovering patterns that offer insight. A simple example of
descriptive analytics would be assessing credit risk; using past financial performance to
predict a customers likely financial performance. Descriptive analytics can be useful in
the sales cycle, for example, to categorize customers by their likely product preferences
and sales cycle.
They provide the analyst with advanced analytics algorithms and models.
They're engineered to run on big data platforms such as Hadoop or specialty high-
performance analytics systems.
They're easily adaptable to use structured and unstructured data from multiple sources.
Their analytical models can be or already are integrated with data visualization and
presentation tools.
ASSOCIATION AND ITEM SET MINING: which looks for statistically relevant
relationships among variables in a large data set. For example, this could help direct call-
center representatives to offer specific incentives based on the caller's customer segment,
duration of relationship and type of complaint.
NEURAL NETWORKS: which are used in undirected analysis for machine learning
based on adaptive weighting and approximation.
This is just a subset of the types of analyses used for predictive and prescriptive analytics. In
addition, different vendors are likely to provide a variety of algorithms supporting each of the
different methods.
The analytics process, including the deployment and use of big data analytics tools, can help
companies improve operational efficiency, drive new revenue and gain competitive advantages
over business rivals. But there are different types of analytics applications to consider. For
example, descriptive analytics focuses on describing something that has already happened, as
well as suggesting its root causes. Descriptive analytics, which remains the lion's share of the
analysis performed, typically hinges on basic querying, reporting and visualization of historical
data.
Alternatively, more complex predictive and prescriptive modeling can help companies anticipate
business opportunities and make decisions that affect profits in areas such as targeting marketing
campaigns, reducing customer churn and avoiding equipment failures. With predictive analytics,
historical data sets are mined for patterns indicative of future situations and behaviors, while
prescriptive analytics subsumes the results of predictive analytics to suggest actions that will best
take advantage of the predicted scenarios.
In many environments, the processing and data storage demands of advanced analytics
applications have limited their adoption -- but those barriers are beginning to fall. The growing
availability of big data platforms and big data analytics tools has enabled environments in which
predictive and prescriptive analytics applications can scale to handle massive data volumes
originating from a wide variety of sources.
A number of smaller companies provide big data analytics products, including Angoss,
Predixion, Alteryx, Alpine Data Labs, Pentaho, KNIME and Rapid Miner. In some cases,
companies have developed their own suite of algorithms. Others have adapted the open source
statistical R language and provide predictive and prescriptive modeling capabilities using R's
features, or use the software from the open source Wake project.
A third category of products are those available as open source technologies. Examples include
the previously mentioned R language, the Mahout Software distribution that's part of the Hadoop
stack, and Weka.
In some of these cases (particularly with the mega-vendors), the big data analytics tools are
incorporated into larger big data enterprise suites. In others, the tools are sold as standalone
products. In the latter case, it's the customer's job to integrate with the big data platform being
deployed. Most of the tools provide a visual interface to guide the analytics processes (data
mining/discovery analysis, evaluation and scoring of models, integration with operational
environments), and in most cases, the vendors provide guidance and services to get the customer
up and running.
The data scientist: who likely performs more complex analyses involving more complex
data types and is familiar with how underlying models are designed and implemented to
assess inherent dependencies or biases.
The business analyst: who is likely a more casual user looking to use the tools for
proactive data discovery or visualization of existing information, as well as some
predictive analytics.
The business manager: who is looking to understand the models and conclusions.
All of these roles would typically work together in the model development lifecycle. The data
scientist subjects a swath of big data sets to the undirected analyses provided, and looks for any
patterns that would be of business interest. After engaging the business analyst to review how the
models work and evaluate how each of those discovered models or patterns could potentially
positively affect the business, the business manager and IT teams are brought in to embed or
integrate the models into business processes or devise new processes around the models.
From a market perspective, though, it's interesting to consider the types of businesses that are
embracing big data analytics. Many of the early users of big data technologies were Internet
companies (e.g., Google, Yahoo, Facebook, LinkedIn and Netflix) or analytics services
providers. Each of these companies relied on operational and analytical applications requiring
fast-flowing streams of data to ingest process, analyze, and then feed the results back to
continuously improve performance.
As appetites for data expand among companies in more mainstream industries, big data analytics
has found a place in a more general corporate population. In the past, the cost factors for a large-
scale analytics platform would have limited the adoption to only the very largest businesses.
However, the availability of utility-style hosted big data platforms (such as those available via
Amazon Web Services) and the ability to instantiate big data platforms such as Hadoop on-
premises without a large investment have reduced the barrier to entry. In addition, open data sets
and accessibility to fire hose data feeds from social media channels provide the raw material for
larger-scale data analyses when blended with internal data sets.
Larger businesses may still opt for high-end big data analytics tools, but lower-cost alternatives
deployed on cost-effective platforms enable small and medium-size businesses to evaluate and
launch big data analytics programs and achieve the desired business improvement results.
Now that we've examined the different types of tools and their uses, the next step is to determine
how these tools could benefit your company. By taking a look at the various use cases for big
data analytics, you will begin to see where a general big data analytics capability can be
leveraged for creating and enhancing value.
The actions taken by businesses and other organizations as a result of big data analytics
may breach the privacy of those involved, and lead to embarrassment and even lost jobs.
Consider that some retailers have used big data analysis to predict such intimate personal
details such as the due dates of pregnant shoppers. In such cases subsequent marketing
activities resulted in having members of the household discover a family member was
pregnant before she had told anyone, resulting in an uncomfortable and damaging family
situation. Retailers, and other types of businesses, should not take actions that result in
such situations.
With so much data, and with powerful analytics, it could become impossible to
completely remove the ability to identify an individual if there are no rules established for
the use of anonymized data files. For example, if one anonymized data set was combined
with another completely separate data base, without first determining if any other data
items should be removed prior to combining to protect anonymity, it is possible
individuals could be re-identified. The important and necessary key that is usually
missing is establishing the rules and policies for how anonymized data files can be
combined and used together.
If data masking is not used appropriately, big data analysis could easily reveal the actual
individuals who data has been masked. Organizations must establish effective policies,
procedures and processes for using data masking to ensure privacy is preserved. Since big
data analytics is so new, most organizations dont realize there are risks, so they use data
masking in ways that could breach privacy. Many resources are available, such as those
from IBM, to provide guidance in data masking for big data analytics.
Data analytics can be used to try and influence behaviors. There are my ethical issues
with driving behavior. Just because you CAN do something doesnt mean you should.
For example, in the movie The Fight Club, Ed Nortons characters job was to determine
if an automobile manufacturer should do a recall based strictly on financial consideration,
without taking into account the associated health risks. Or, in other words, if it is cheaper
for people to be killed or injured instead of fixing the faulty equipment in the vehicles.
Big data analytics can be used by organizations to make a much wider variety of business
decisions that do not take into account the human lives that are involved. The potential to
reveal personal information because it is not illegal, but can damage the lives of
individuals, must be considered.
While big data analytics are powerful, the predictions and conclusions that result are not
always accurate. The data files used for big data analysis can often contain inaccurate
data about individuals, use data models that are incorrect as they relate to particular
individuals, or simply be flawed algorithms (the results of big data analytics are only as
good, or bad, as the computations used to get those results). These risks increase as more
data is added to data sets, and as more complex data analysis models are used without
including rigorous validation within the analysis process. As a result, organizations could
make bad decisions and take inappropriate and damaging actions. When decisions
involving individuals are made based upon inaccurate data or flawed models, as a result
individuals can suffer harm by being denied services, being falsely accused or
misdiagnosed, or otherwise be treated inappropriately.
6. DISCRIMINATION
Using big data analytics to try and choose job candidates, give promotions, etc. may
backfire if the analytics are not truly objective. Discrimination has been a problem for
years of course, but the danger is that big data analytics makes it more prevalent, a kind
of automated discrimination if you will. For example, a bank or other type of financial
organization may not be able to tell by a credit application the applicants race or sexual
orientation (since it is generally illegal to base such a credit decision upon race), but
could deduce race or sexual orientation based upon a wide variety of data, collected
online and through the Internet of Things (IoT), using big data analytics to then turn
down a loan to an individual after obtaining and learning such information.
Most organizations still only address privacy risks as explicitly required by existing data
protection laws, regulations and contractual requirements. While the U.S. White House,
the Federal Trade Commission, and others, have recently expressed concern about the
privacy risks that are created within using big data analytics, there are no legal
requirements for how to protect privacy while using big data analytics.
There was a flurry of articles written about the e-discovery problems created by big data
analytics in the past year. The e-discovery process generally requires organizations to
identify and produce documents relevant to litigation. When dealing with millions of
documents, as most organizations now have in their repositories, this becomes an
expensive, time-consuming activity. A big data analytics using an approach called
predictive coding is now starting to be used on the huge repositories to more quickly
narrow down the documents most likely to be necessary for litigation, and then allow
individuals the ability to more closely review. There are concerns that by using such
analytics to produce documents an organization may be accused of not including all the
necessary documents.
There is concern that big data could make patents harder to obtain because patent offices
will not be able to verify if a submitted patent is unique since there will be too much data
to check through within all the growing numbers of big data repositories. Big data could
make copyrights a thing of the past because it will be too hard to control information that
can be hidden or propagated infinitely within big data repositories. As an associated
effect, the royalties associated with copyrighted information are expected to decrease or
possibly disappear altogether.
Expect greater interest across academic disciplines, because machine learning benefits
from many different approaches. Consider the popular keynote from the INFORMS
Annual Meeting last year, where Dimitris Bertsimas talked about Statistics and Machine
Learning via a Modern Optimization Lens.
Data ownership is an emerging issue you produce the data but may not have access to
it. An even larger challenge for IoT will be to prove its value. There are limited
implementations of IoT in full production at the enterprise level. The promise of IoT is
fantastic, so in 2016 look to early adopters to work out the kinks and deliver results.
Unfortunately, growing methods for big data collaboration are off limits, because we
dont want the bad guys to know how well find them, and much of the best work is done
behind high security clearance. But that won't stop SAS and others from focusing heavily
on cyber security in 2016.
The Institute for Advanced Analytics (IAA) at NC State University tracks the growth in
analytics masters programs, and new programs seem to pop up daily. Industry demand for
recruits fuels this growth, but I see increased interest in research. More companies are
setting up academic outreach with an explicit interest in research collaborations.
Sometimes this interest goes beyond partnership and into direct hiring of academic
superstars, who either take sabbaticals work on the side, or even go back and forth. For
example, top machine learning researcher Yann LeCun worked at Bell Labs, became a
professor at NYU, was the founding director of the NYU Center for Data Science, and
now leads Artificial Intelligence Research at Facebook.
Murli Buluswar, chief science officer, AIG: The biggest challenge of making the evolution
from a knowing culture to a learning culturefrom a culture that largely depends on heuristics in
decision making to a culture that is much more objective and data driven and embraces the
power of data and technologyis really not the cost. Initially, it largely ends up being
imagination and inertia.
Ruben Sigala, chief analytics officer, Caesars Entertainment: What he has found challenging
is finding the set of tools that enable organizations to efficiently generate value through the
process. He said-I hear about individual wins in certain applications, but having a more sort of
cohesive ecosystem in which this is fully integrated is something that I think we are all
struggling with, in part because its still very early days. Although weve been talking about it
seemingly quite a bit over the past few years, the technology is still changing; the sources are
still evolving.
Zoher Karu, vice president, global customer optimization and data, eBay: One of the biggest
challenges is around data privacy and what is shared versus what is not shared. And his
perspective on that is consumers are willing to share if theres value returned. One-way sharing is
not going to fly anymore. So how do we protect and how do we harness that information and
become a partner with our consumers rather than kind of just a vendor for them?
That helps best inform the appropriate structure, the forums, and then ultimately it sets the more
granular levels of operation such as training, recruitment, and so forth. But alignment around
how youre going to drive the business and the way youre going to interact with the broader
organization is absolutely critical. From there, everything else should fall in line. Thats how we
started with our path.
Vince Campisi, chief information officer, GE Software: One of the things weve learned is
when we start and focus on an outcome, its a great way to deliver value quickly and get people
excited about the opportunity. And its taken us to places we havent expected to go before. So
we may go after a particular outcome and try and organize a data set to accomplish that outcome.
Once you do that, people start to bring other sources of data and other things that they want to
connect. And it really takes you in a place where you go after a next outcome that you didnt
anticipate going after before. You have to be willing to be a little agile and fluid in how you think
about things. But if you start with one outcome and deliver it, youll be surprised as to where it
takes you next.
Ash Gupta, chief risk officer, American Express: The first change we had to make was just to
make our data of higher quality. We have a lot of data, and sometimes we just werent using that
data and we werent paying as much attention to its quality as we now need to. That was, one, to
make sure that the data has the right lineage, that the data has the right permissible purpose to
serve the customers. This, in my mind, is a journey. We made good progress and we expect to
continue to make this progress across our system.
The second area is working with our people and making certain that we are centralizing some
aspects of our business. We are centralizing our capabilities and we are democratizing its use. I
think the other aspect is that we recognize as a team and as a company that we ourselves do not
have sufficient skills, and we require collaboration across all sorts of entities outside of American
Express. This collaboration comes from technology innovators, it comes from data providers, it
comes from analytical companies. We need to put a full package together for our business
colleagues and partners so that its a convincing argument that we are developing things together,
that we are colearning, and that we are building on top of each other.
Examples Of Impact
Victor Nilson, senior vice president, big data, AT&T: We always start with the customer
experience. Thats what matters most. In our customer care centers now, we have a large number
of very complex products. Even the simple products sometimes have very complex potential
problems or solutions, so the workflow is very complex. So how do we simplify the process for
both the customer-care agent and the customer at the same time, whenever theres an interaction?
Weve used data analytics techniques to analyze all the different permutations to augment that
experience to more quickly resolve or enhance a particular situation. We take the complexity out
and turn it into something simple and actionable. Simultaneously, we can then analyze that data
and then go back and say, Are we optimizing the network proactively in this particular case?
So, we take the optimization not only for the customer care but also for the network, and then tie
that together as well.
Vince Campisi: Ill give you one internal perspective and one external perspective. One is we
are doing a lot in what we call enabling a digital threadhow you can connect innovation
through engineering, manufacturing, and all the way out to servicing a product. [For more on the
companys digital thread approach, see GEs Jeff Immelt on digitizing in the industrial
space.] And, within that, weve got a focus around brilliant factory. So, take driving supply-
chain optimization as an example. Weve been able to take over 60 different silos of information
related to direct-material purchasing, leverage analytics to look at new relationships, and use
machine learning to identify tremendous amounts of efficiency in how we procure direct
materials that go into our product.
An external example is how we leverage analytics to really make assets perform better. We call it
asset performance management. And were starting to enable digital industries, like a digital
wind farm, where you can leverage analytics to help the machines optimize themselves. So you
can help a power-generating provider who uses the same wind thats come through and, by
having the turbines pitch themselves properly and understand how they can optimize that level of
wind, weve demonstrated the ability to produce up to 10 percent more production of energy off
the same amount of wind. Its an example of using analytics to help a customer generate more
yields and more productivity out of their existing capital investment.
When we talk about the value proposition, we use terms like having an opportunity to truly affect
the outcomes of the business, to have a wide range of analytical exercises that youll be
challenged with on a regular basis. But, by and large, to be part of an organization that views this
as a critical part of how it competes in the marketplaceand then to execute against that
regularly. In part, and to do that well, you have to have good training programs, you have to have
very specific forms of interaction with the senior team. And you also have to be a part of the
organization that actually drives the strategy for the company.
Murli Buluswar: I have found that focusing on the fundamentals of why science was created,
what our aspirations are, and how being part of this team will shape the professional evolution of
the team members has been pretty profound in attracting the caliber of talent that we care about.
And then, of course, comes the even harder part of living that promise on a day-in, day-out basis.
Yes, money is important. My philosophy on money is I want to be in the 75th percentile range; I
dont want to be in the 99th percentile. Because no matter where you are, most people
especially people in the data-science functionhave the ability to get a 20 to 30 percent increase
in their compensation, should they choose to make a move. My intent is not to try and reduce
that gap. My intent is to create an environment and a culture where they see that theyre learning;
they see that theyre working on problems that have a broader impact on the company, on the
industry, and, through that, on society; and theyre part of a vibrant team that is inspired by why
it exists and how it defines success. Focusing on that, to me, is an absolutely critical enabler to
attracting the caliber of talent that I need and, for that matter, anyone else would need.
Case Study Data Analytics In Cloud Manufacturing
In this section, the application of Data Analytics in manufacturing systems is used as a case
study. A manufacturing system needs information systems to make decisions for the system
operations at different levels and domains. The complexity of an information system depends on
the numbers of inputs and outputs as well as their relations. In manufacturing applications,
numerous workers have investigated the impact of IT, such as mainstream computers, wireless
networks, and the Internet, on the development of information systems (Bi, Xu, & Wang, 2014;
Dumitrache & Caramihai, 2010; Koenig, 2013; Lee & Lapira, 2014; Wang, Wang, Gao, &
Vancza, 2014). In the following sections, we look into
(1) The change trends of scale and complexity of information systems;
(2) Available hardware and software IT; and
(3) The requirements of IoT-based information systems.
We pay special attention to the roles of Big Data in cloud manufacturing; since cloud
manufacturing has been identified as the most promising paradigm for next-generation
manufacturing systems.
With a rapid development of wireless sensor networks and IoT, the data have become ubiquitous
and very accessible; it contributes to the Big Data manufacturing environment (Lee et al., 2013).
Today's information processing is becoming more and more powerful and flexible (Davis, 2014).
Information systems can benefit from Big Data greatly in fulfilling 5C functions, i.e.
connection with sensors and networks; cloud to store and provide data anytime and anywhere;
content to mine correlations and meanings; community to share data and promote social
interactions; and customization to personalize products or services (Lee et al., 2013).
The Big Data applications for a wider scope involve some technical challenges. Big Data is at
the heart of many cloud-based services, including CM. It is critical for users to have a clear
understanding of the requirements of Big Data applications, the capabilities of Big Data, and the
best practices for implementation (Cloud Standards Customer Council, 2014). Helo, Suorsa,
Hao, and Anussornnitisarn (2014) discussed the challenges of enterprise systems in the cloud. It
was found that extended enterprise support or cloud-based sharing approaches were lacking in
current IT solutions. It was essential to develop advanced platforms to support the needs of real-
time, cloud based, and lightweight operations. Lee et al. (2013) indicated that Big Data and
cyber-physical systems must take into account the productivity and efficiency of information
systems. Kumar, Niu, and Re (2013) indicated that the breakthroughs in Big Data were
anticipated in its capabilities of rapidly combining, deploying, and maintaining existing
algorithms. Considering the application in supply chain management, Waller and Fawcett (2013)
discussed the research challenges on data science and predictive analytics. Lee, Kao, and Yang
(2014) note that two technical trends in manufacturing applications are cyber-physical system-
based manufacturing and service innovations. With a special interest on manufacturing
applications, the research on Big Data needs to address the following challenges.
PRIVACY PROTECTION
Privacy is particularly important when data are shared among industry sectors. Conventionally,
privacy relied largely on technological limitations to extract, analyze, and to correlate sensitive
datasets. However, the advances in Big Data make it possible to extract and correlate data much
easier. Therefore, the Big Data methods must take into consideration of privacy principles and
recommendations to ensure the safe application over the cloud. Data provenance is another
challenge. It is hard to validate that every data source meets the required trustworthiness to
produce acceptable results.
OTHER CHALLENGES
Different from the application of BIG DATA in other areas, CM operates on manufacturing
resources and associated services; domain-related services should be developed on the top of
SaaS (software as a service), PaaS (platform as a service), IaaS (infrastructure as a service), such
as Testing as a Service, Simulation as a Service, and Management as a Service, Production as a
Service, Design as a Service, (Jaleel, Rajendran, & George, 2014; Tao, Zhang, Venkatesh, Luo,
& Chen, 2011). Besides the technical challenges, the applications of Big Data in manufacturing
also face the difficulties of high financials costs, scarce solutions for vertical integrations, vendor
lock-in risk, lack of workers with IT skills, and lack of SME focus by IT vendors (KPMG 2011).
From the perspective of the hardware development, Zou, Yu, Tang, Chen, and Chen (2014)
indicated there is the gap between the required capacity of computation and the available
capacities of the high-end computing machines.
Special interest should be paid to the applications of Big Data Analytics in cloud manufacturing.
Nowadays, the success of a manufacturing enterprise relies greatly on the advancement of IT to
support and enhance the value stream. Big Data Analytics tools help an information system to
capture, process, and use ubiquitous data from IoT effectively. It allows a manufacturing
enterprise to capture business opportunities, to readily adapt to change and to deal with
uncertainty promptly. However, the development of Big Data Analytics for cloud manufacturing
is preliminary; intensive research efforts are in demand to address the concerns about integration
frameworks, advanced Big Data Analytics tools, privacy protection, customized applications for
SMEs, and other challenges. This reported work will be used as a guide for us in developing
integrated Big Data Analytics tools for cloud manufacturing.