You are on page 1of 9

Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for

Today's Businesses. Michael Minelli, Michele Chambers and Ambiga Dhiraj.


2013 Michael Minelli, Michele Chambers and
Ambiga Dhiraj. Published 2013 by John Wiley & Sons, Inc.

INDEX

Acts and legislation. See Legislation platforms (see Platforms)


and acts predictive (see Predictive analytics)
Advertising quants for Big Data (see Data
ad targeting, 16, 102, 105 scientists)
geospatial intelligence and, 105106, risk analytics, 3740, 4243
119120 self-service approach, 6566, 73
industry examples, 5157, 102 software as a service (SaaS), 56, 7273
Affiliate marketing, 3031 spectrum of, 14, 99100
Algorithm, definition, 83, 100 tools for, 85, 124125
Algorithmic trading, 4044, 71, 91 value of, 1215, 1718, 102
Alvarez, Chuck, 117118 volume and, 9
Analysts. See Data analysts Web Analytics 2.0 (Kaushik), 20, 136
Analytics. See also Business Anonymized data, 77, 164
intelligence (BI) Apache Hadoop. See Hadoop
analytic applications, 125 Appliances
being analytical (see Institutionalizing data warehouse appliances, 90, 91
analytics) definition, 83
business intelligence versus, 14, 49, 70 GPU appliances, 98
communication cycle, 108109, 142 MPP appliances, 85, 9293
crowdsourcing analytics, 7677 Argyros, Tasso, 69
data discovery, 6566 ASCII codes in binary, 86
definition, 83, 103 Avro, definition, 83
delivering results, 101102, Awadallah, Amr, 6162
108110 (see also Data scientists;
Visualizing data) Bandwidth, 97
description of, 56, 8 Batch, definition, 83
firewalls and, 73, 7780 Beard, Randal, 5253, 54, 55, 5657
geospatial intelligence, 104106, Behavioral data
119120 advertising impact, 5455
health care analytics, 16, 4647 behavioral targeting, 101102
holistic, 4749, 139, 140141 credit scoring, 39
in-database analytics, 93, 95 cross-channel marketing, 2829,
institutionalizing (see Institutionalizing 31, 33
analytics) customer insights, 71, 77, 79, 117
marketing analytics, 44 (see also emergence of, 25, 27
Marketing) fraud prevention, 35, 3738, 44
on mobile devices, 7375 recommendation engines, 62, 71
MPP appliance analytics, 93 technology and behavioral
new versus old, 7, 49, 64, 128129 sciences, 129
organizational structure and, 146149 BI. See Business intelligence (BI)

179
180 INDEX

Big Data. See also Data Clusters. See Grids


analytics (see Analytics) Collaboration
business intelligence versus, consumption of analytics, 108
5, 1920 convergence and, 131
definition, 5, 9, 83 fear of, 77, 80, 147
emergence of, xiixiv, 12, 7 innovation management, 81
interdependence of, 34 (see also interdependence, 3
Industry examples) organizational structure, 146
processing of, 9091, 9697 trans-firewall analytics, 7780
recommended resources, 175 Collaborative planning, forecasting, and
storage, 8990, 92, 96 (see also HDFS) replenishment (CPFR), 77
Big Insights, definition, 83 Columnar database, 83, 84, 89, 96
Binary code, 86 Communication cycle, 108109, 142
Bit, 86 Comparative effectiveness research
Blue Ocean Strategy (CER), 47
(Kim & Mauborgne), 71 Complex event processing (CEP),
Botkin, David, 117 definition,84
Brocklebank, John, 72 Compute unified device architecture
Burns, Nate, 50 (CUDA), 98
Business analytics. See Analytics; Conferences and tradeshows, 176
Predictive analytics Consumer Privacy Bill of Rights,
Business intelligence (BI) 154, 156
advanced analytics versus, 14, Consumer Privacy Protection Act
49, 70 of 2011, 155
Big Data versus, 5, 1920 Consumer products, 5758
land and expand, 65 Contextual integrity (CI), 160. See also
Bust out fraud case, 37 Privacy
Byte, 86 Conway, Drew, 3, 134135
Cookies, and privacy, 157, 158, 159
C programming language, 98 Cox, Donna, 112113
Cafarella, Mike, 61 CPU bound, 97
Cassandra, definition, 83 Credit card fraud, 3437
Centralized analytics, 146, 148 Credit risk analytics, 3839, 43
Champagne, David, 67, 128 Credit risk management, 3840
Childrens Online Protection Privacy Critical thinking, 137
Act (2000), 155 CRM (customer relationship
CI (contextual integrity), 160. See also management), 6, 10, 153154
Privacy Cross-channel lifecycle marketing,
Clojure, definition, 83 2829, 31, 33
Cloud computing Crowdsourcing analytics, 7677
Big Data and, 6970 Culture
definition, 83 for data scientist development,
private cloud, 73 144146
public cloud, 92 privacy and, 159162
Whirr, 86 Cunningham, Joe, 8082
Cloudera, 6162, 67, 83 Customer churn, 16, 44
INDEX 181

Customer insight engines, 71 Data artisans, 111, 113


Customer intent, 3334 Data discovery, 6566
Customer lifetime value, 44, 71, 103 Data mining
Customer relationship management in analytics spectrum, 14, 100
(CRM), 6, 10, 153154 definition, 84
Customer relationships Data privacy. See Privacy
consumer products, 58 Data scientists
credit risk management, 38 data analysts versus, 134
cross-channel consumers, 2829 development of, 139140, 142146
customer lifetime value, 44, 71 emergence of, 128132
database marketing, 2427 investing in, 127128, 136
digital marketing, 2223 last mile people, 101102
health care, 4748 recommended resources, 176
online privacy, 151, 154, 155, 162, 167 shortage of, 128
Cutting, Doug, 61, 62 skillsets, 49, 100, 122, 133136,
138, 143
Dashboard example, 118119 Data warehouse
Data. See also Big Data for Big Data storage, 8991, 96
anonymized data, 77, 164 Big Data versus, 1920, 49
classification, 159 database marketing and, 153
cloud computing (see Cloud MapReduce for, 94
computing) Database marketing, 2427, 152153
control of, 2324 Datakind, 34
data discovery, 6566 de Montcheiul, Yves, 67
dirty data, 107 Decentralized analytics, 146, 148
filter via influence scores, 12 Decision management, 137139
Hadoop for management Definitions of technology terms, 8386
(see Hadoop) Deighton, Anthony, 66
latency, 98 Descriptive analytics
samples versus all data, 121122 in analytics spectrum, 99100
semi-structured, 10, 11, 85 in holistic view, 140141
sensitive data, 158159, 161, 162 Digital marketing, 1924, 25, 5455,
signal versus noise, 106108 101102
size, 8687 Disk bound, 96
structured, 10, 11, 35, 86 Disruptive innovations, 77, 79, 132
terrorism and, 165 Distributed processing, definition, 84.
unstructured, 10, 1112, 35, 86 See also Parallel computing
variety (see Variety of data) Do Not Track, 155
velocity (see Velocity of data) Domo, 72, 73
visualizing (see Visualizing data) Dowlaty, Zubin, 135
volume (see Volume of data) Doyle, Shaun, 24, 2527
Data analysts Dremel, definition, 84
data scientists versus, 134 Drilling down, 112
last mile people, 101102 Driscoll, Michael, 136
recommended resources, 176 Dumbill, Edd, 5
value of, 102 Dweck, Carol S., 145
182 INDEX

EB (exabyte), 87 Google Analytics, 21, 33, 172173


Edsall, Robert M., 114 GPGPU (general-purpose GPU), 98
Elastic Search, 35 GPU (graphical processing units)
Electronic Communication Privacy Act as emerging technologies, 98
(ECPA; 1986), 155 for parallel computing, 40, 43, 44
Enterprise resource planning (ERP), 10 Grids
Ethics, 162164. See also Privacy definition, 84
Evidence-based medicine (EBM), 47 GPU grids, 98
Examples from industry. See Industry as parallel computing platform, 92, 96
examples
Hadapt, definition, 84
Fault tolerance Hadoop. See also Cloudera; MapReduce
Hadoop, 6263 associated software, 8386, 124125
MapReduce, 90, 91 data accessibility, 7, 48, 69
Storm, 86 for financial industry, 64
Fayyad, Usama, 1518, 101102 integrating into enterprise, 8182
Federal Trade Commission (FTC), Mahout, 84, 95, 124
155, 156,157 overview of, 6163, 84
Federated model of analytics, 146, 148 SQL on data, 69
Financial Services Modernization Act Hadoop Distributed File System
(1999),155 (HDFS), 63, 84, 8990
Firewalls Hammerbacher, Jeff, 62
analytics and, 7780 HANA, 8, 84
software as a service and, 73 Harahan, Pat, 6566
Flume, definition, 84 Harrower, Mark, 114
Fraud Hbase, definition, 84
Big Data and, 3437, 44 HDFS (Hadoop Distributed File
database marketing tools for, 27 System), 63, 84, 8990
FTC. See Federal Trade Health care
Commission (FTC) analytics, 16, 4647
Fuzzy Logix, 40, 98 data in health care, 4546
holistic value proposition, 4749
GB (gigabyte), 87 insurance companies, 44, 47
Geospatial intelligence, 104106, quantitative pharmacology, 50
119120 Health Insurance Portability
Ghosh, Misha, 2, 11 and Accountability Act
Glossary of technology terms, 8386 (HIPAA; 1996), 155
GNU General Public License Heer, Jeffrey, 113114
(GPL), 67 Hive, 84, 94
Goldbloom, Andrew, 76 Holistic analytics
Golden, James, 4649 holistic education, 140
Google holistic framework, 140141
data processed per day, 87 holistic value proposition, 4749
Hadoop and, 61 Mu Sigma vision, 139
Percolator, 35 Horton Works, definition, 84
privacy and, 157, 160, 161 Hougland, Curtis, 33
INDEX 183

HPC (high-performance computing), decision sciences for, 132


84, 93, 96 innovation ecosystem, 82
HStreaming, definition, 84 innovation engines, 71
Hughes, David, 2021 Inquisitive analytics, 140141
Institutionalizing analytics
Incremental innovations, 132 being analytical, 15, 16, 108110
In-database analytics, 93, 95, 96 culture creation, 144146
Industry examples data democracy, 169170, 172, 174
advertising, 5157, 102 data scientists (see Data scientists)
algorithmic trading, 4044 decision management, 137138
consumer products, 5758 executive sponsorship, 127, 137138
consumer relationships, 2223 holistic analytics, 4749, 139, 140141
convergence, 131 investing in analytics, 127128,
data ethics, 162164 136137
data visualization, 111112 keys to success, 127, 129132
database marketing, 2427, 153 OODA Loop, xv
decision management, 137 organizational structure, 127, 146149
definition of Big Data, 9 value of analytics, 1215, 1718, 102
developing data scientists, 139 Insurance companies, 44, 47
digital marketing, 1924, 25, 5455 Investing in analytics, 127128,
emergence of Big Data, 28 136137
ethics, 162164 I/O bound, 97
fraud detection, 3437
geospatial intelligence, 103106 James, Josh, 7273
health care (see Health care) Jewett, Dan, 65
last mile people, 101102 Jonas, Jeff, 103106, 151
new school of marketing, 2734
privacy, 157 Kaggle, 7677
risk management, 3740 Kaushik, Avinash, 1920, 21, 22, 2324,
value of analytics, 1213 69, 136137, 169174
visualizing data, 116, 117, 118121 KB (kilobyte), 87
Influencers, in marketing, 3132 Kent, Paul, 2
Information management. See also Kerzner, Dan, 74, 75
Technology Kim, W. Chan, 71
as analytic spectrum, 100
Big Data storage, 8990, 92, 96 Land and expand, 65
(see also HDFS) Lander, Jared, 3
bottlenecks, 64, 91, 95, 9697, 124 Last mile people, 101102
emerging technologies, 9798 Legislation and acts
parallel programming, 91, 9396, 98 Childrens Online Protection Privacy
(see also Parallel computing) Act (2000), 155
platforms (see Platforms) Consumer Privacy Bill of Rights,
processing of data, 9091, 9397 154, 156
Innovation Consumer Privacy Protection Act
in analytic technology, 8081, 130, 132 of 2011, 155
Big Data for, 103 Do Not Track, 155
184 INDEX

Legislation and acts (continued ) prospect universes, 27


Electronic Communication Privacy word-of-mouth, 3031
Act (ECPA; 1986), 155 Marketing mixed modeling (MMM), 56
Financial Services Modernization Act Mark-to-market valuation, 42
(1999), 155 Mauborgne, Rene, 71
Health Insurance Portability MB (megabyte), 87
and Accountability Act Measurement of impact, 5457
(HIPAA; 1996), 155 Mehta, Abhishek, 64, 6970
U.S. versus Europe, 159 Meister, John, 2
LinkedIn groups, 176 Memory bound, 97
Lucas, Steve, 78 Mennis, Jeremy L., 114
Lucene, 95, 124 Message passing interface (MPI),
93, 9596, 125
Machine learning MicroStrategy, 74
in analytics spectrum, 100 Mind-sets, fixed versus growth, 145
definition, 84 Mobile devices, analytics on, 7375
Mahout, 84, 95, 124 Monte Carlo simulations, 14, 44, 97, 98
model reevaluation, 122 Montessori, Maria, 145
as new analytics, 48, 49 Moores Law, definition, 1
Mahout Motivation, 144, 145
definition, 84 MPI (message passing interface),
libraries of analytics, 95 93, 9596, 125
as open source analytic tool, 124 MPP (massively parallel processing),
Maples, Creve, 114116 85, 9293, 96
MapR, definition, 84 MPP appliances, 85, 9293
MapReduce MPP database, definition, 85
definition, 85 Mu Sigma, 56, 135, 139, 143, 144146
as Hadoop component, 61, 63, 84
as HDFS processing software, 9091 Non-line world, 20
for parallel programming, 93, NoSQL databases, 84, 85
9495, 96
SQL-MapReduce, 69 Olson, Mike, 62
Market risk, 38, 4144 Omniture, 21, 33, 56, 7273
Marketing OODA Loop, xv
advertising (see Advertising) Oozie, definition, 85
analytics, 44 Open source technology
Blue Ocean Strategy (Kim & commercial versus, 8, 6769
Mauborgne), 71 definition, 67
campaign optimization, 16 Elastic Search, 35
customer insight engines, 71 GNU General Public License (GPL), 67
database marketing, 2427, 152153 Hadoop (see Hadoop)
digital marketing, 1924, 25, 5455, Mahout and Lucene, 95
101102 R (analytics tool), 85, 124
marketing automation, 2627 Talend, 67
measurement of impact, 5457 tools for analytics, 85, 124125
new school of, 2734 visualization tools, 118
INDEX 185

Optimization engines, 71, 133 Tableau Software, 6566, 116,


Organizational structure, 127, 144146, 117, 118
146149 Tracx, 31, 33
Ownership, of data Unica, 56
organizational structure and, 148 Porway, Jake, 34
privacy and, 165 Predictive analytics. See also Analytics
in analytics spectrum, 14, 99100
Parallel computing applying, 71
Clojure, 83 business intelligence versus,
description of, 90 49, 70
GPUs for, 40, 43, 44 data volume and, 9
Hadoop, 63 in database marketing, 2526
MapReduce, 85, 9091, 9495 in holistic view, 140141
MPP (massively parallel processing), machine learning, 14
85, 9293, 96 origin of term, 99
as new analytics, 64 risk management, 43
Pig, 85 Prescriptive analytics
platforms for, 9293 (see also in analytics spectrum, 99100
Platforms) in holistic view, 141
Parallel programming Privacy
description of, 91 anonymized data, 77, 164
for GPUs, 98 bill of rights, 154, 156
on parallel computing platforms, cloud computing, 70
9396 contexts of, 159162, 163164
PB (petabyte), 87 definition, 157
Peled, Ori, 38 legislation, 155
Percolator, 3536 middleware layer, 165166
Personal information (PI), 158 online agreements, 151, 154
Personally identifiable information (PII), personal information (PI), 158
159,164 personally identifiable information
Pig, 85, 94 (PII), 159, 164
Pink, Daniel H., 144 privacy landscape, 152
Platforms privacy policies, 157
analytics platform definition, 83 Safe Harbor, 155156, 161
Google Analytics, 21, 33, 172173 sensitive data, 158159, 161, 162
Hadoop (see Hadoop) terrorism and, 165
in holistic value proposition, 48 transparency of data, 11, 34,
MPP appliances, 85, 9293 151, 156
Mu Sigma SaaS, 56 Prospect universes, 27
as new analytics, 64
Omniture, 21, 33, 56, 7273 QlikTech International, 6566
for parallel computing, 9293 Quants. See Data scientists
QlikTech International, 6566
Revolution Analytics, 67 R (analytics tool), 85, 124
SAP HANA, 8, 84 Rajaram, Dhiraj, 139
SAS (see SAS) Ramanathan, Murali, 50
186 INDEX

Real-time processing Silos, of data


algorithmic trading, 91 in database marketing, 25
data into action, 123 social data as, 32
definition, 85 Singer, Niv, 3132, 33
geospatial intelligence, 104 Smith, David, 5, 6768
streaming analytics, 106 SNA (social network analysis), 37
Recommendation engines, 62, 71 Social media
Reiskind, Andrew, 157158, 159, 161, consumer relationships, 2223
163164 intelligence software, 3134
Relational databases, 85, 96 as system monitors, 12
Revolution Analytics, 67, 124, 128 Social network analysis (SNA), 37
Risk management, 16, 3740, 4144, 71 Software as a service (SaaS), 56, 7273
Rockefeller, John D. (Jay), 154155 Solid-state drives (SSD), 9798
Spark, 85
SaaS (software as a service), 56, 7273 Springer, Dan, 2728
Safe Harbor, 155156, 161 SQL (structured query language)
Sample, of data, 121 in analytics spectrum, 14
SAP HANA, 8, 84 definition, 85
SAS Hbase, 84
as advanced analytics platform, 8, 56 Hive, 84, 94
analytics on-demand center, 72 NoSQL database, 84, 85
descriptive analytics and, 99 Pig, 85, 94
Social Network Analysis (SNA), 37 SQL-H, 69
Scalability SQL-MapReduce, 69
cloud computing, 70 Sqoop, 86
data democracy, 169170 user-defined extensions, 95
Dremel, 84 Sqoop, 86
Hadoop, 6263 Stolte, Chris, 65
limitations, 9697 Stompff, Guido, 111
Mahout, 84 Storm, 86
as new analytics, 2627, 48, 64 Structured data, 10, 11, 35, 86
NoSQL databases, 85 Supply chain optimization, 10, 16, 71
parallel computing platforms, 9293
Percolator, 35 Tableau Software, 6566, 116, 117, 118
Schema, 89 Tags
Scoring, definition, 85 as data structure, 11
Security. See also Privacy in digital advertising, 5455
cloud computing, 70 Tal, Marcia, 137139
mobile analytics, 75 TB (terabyte), 87
as privacy principle, 156 Technology. See also Information
Self-service approach, 6566, 73 management
Semi-structured data, 10, 11, 85 cloud computing (see Cloud
Sen, Partha, 40, 4142, 43, 44 computing)
Sensitive data, 158159, 161, 162 emerging technologies, 9798
Sensor data, 104 evolution of, xiixiv, 13, 57, 64
Shneiderman, Ben, 113114 Hadoop (see Hadoop)
INDEX 187

open architecture, 3233 Velocity of data


open source (see Open source advertising and, 57
technology) as Big Data, 1, 9, 10, 83
outsourcing, 23 for fraud detection, 35
parallel computing (see Parallel Visiphors, 112
computing) Visualizing data
people/process first, 6 data artisans, 111, 113
Percolator, 3536 definition, 110
term definitions, 8386 examples of, 118121
10/90 rule for investing, 136 intangibles visualized, 111112
Teradata Aster, 69 interactivity, 113114, 116117
Thampi, Arun, 162163 recommended resources, 175
Thorp, Jer, 114 talented producers of, 134
Total cost of ownership (TCO) tools for, 118
grids, 92 visiphors, 112
Hadoop, 62 visualization designer, 114
institutionalizing analytics, 132 word clouds, 114116
software as a service, 72 VMI (vendor managed inventory), 77
Tracx, 31, 33 Volume of data
Tradeshows and conferences, 176 advertising and, 5657
Translational medicine (TM), 47 analytics and, 9
annual growth projected, 5
Unica, 56 as Big Data, 1, 9, 83
U.S. Library of Congress, data collection for fraud detection, 35
of, 5 Vs of Big Data, 1, 9, 35, 5657, 83
Unstructured data, 10, 1112, 35, 86
User-defined aggregate (UDA), 95 Web Analytics (Kaushik), 136
User-defined extensions, 95 Web Analytics 2.0 (Kaushik), 20, 136
User-defined functions (UDF), 95 Websites
User-defined table functions ads on, 5455
(UDTF), 95 control over data, 2324
digital marketing, 1923, 5455
Value of analytics, 1215, Whirr, 86
1718, 102 Word clouds, 114116
Variables, in analytics, 1718, Word-of-mouth marketing, 3031
41, 121
Variety of data YARN, 86
advertising and, 57 Yau, Nathan, 134
as Big Data, 1, 910, 83 YB (yottabyte), 87
for fraud detection, 35
online data, 20 ZB (zettabyte), 87
Vector processing, 98 Zeitlin, Michael, 111112