You are on page 1of 53

Big Data Computing A Guide for

Business and Technology Managers


1st Edition Vivek Kale
Visit to download the full and correct content document:
https://textbookfull.com/product/big-data-computing-a-guide-for-business-and-technol
ogy-managers-1st-edition-vivek-kale/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Digital Transformation of Enterprise Architecture 1st


Edition Vivek Kale

https://textbookfull.com/product/digital-transformation-of-
enterprise-architecture-1st-edition-vivek-kale/

Agile network businesses : collaboration, coordination,


and competitive advantage 1st Edition Vivek Kale

https://textbookfull.com/product/agile-network-businesses-
collaboration-coordination-and-competitive-advantage-1st-edition-
vivek-kale/

Big Data Analytics for Cloud IoT and Cognitive


Computing 1st Edition Kai Hwang

https://textbookfull.com/product/big-data-analytics-for-cloud-
iot-and-cognitive-computing-1st-edition-kai-hwang/

Emerging Technology and Architecture for Big data


Analytics 1st Edition Anupam Chattopadhyay

https://textbookfull.com/product/emerging-technology-and-
architecture-for-big-data-analytics-1st-edition-anupam-
chattopadhyay/
Big Data Analytics and Computing for Digital Forensic
Investigations 1st Edition Suneeta Satpathy (Editor)

https://textbookfull.com/product/big-data-analytics-and-
computing-for-digital-forensic-investigations-1st-edition-
suneeta-satpathy-editor/

Big Data Analytics Tools and Technology for Effective


Planning 1st Edition Arun K. Somani

https://textbookfull.com/product/big-data-analytics-tools-and-
technology-for-effective-planning-1st-edition-arun-k-somani/

Big Data Analytics Tools and Technology for Effective


Planning 1st Edition Arun K. Somani

https://textbookfull.com/product/big-data-analytics-tools-and-
technology-for-effective-planning-1st-edition-arun-k-somani-2/

Bio-inspired Algorithms for Data Streaming and


Visualization, Big Data Management, and Fog Computing
Simon James Fong

https://textbookfull.com/product/bio-inspired-algorithms-for-
data-streaming-and-visualization-big-data-management-and-fog-
computing-simon-james-fong/

Technology Development Multidimensional Review for


Engineering and Technology Managers 1st Edition Tugrul
U. Daim

https://textbookfull.com/product/technology-development-
multidimensional-review-for-engineering-and-technology-
managers-1st-edition-tugrul-u-daim/
Big Data
Computing
A Guide for Business
and Technology Managers
Chapman & Hall/CRC
Big Data Series

SERIES EDITOR
Sanjay Ranka

AIMS AND SCOPE


This series aims to present new research and applications in Big Data, along with the computa-
tional tools and techniques currently in development. The inclusion of concrete examples and
applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the
areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical
data analytics, large-scale e-commerce, and other relevant topics that may be proposed by poten-
tial contributors.

PUBLISHED TITLES
BIG DATA COMPUTING: A GUIDE FOR BUSINESS AND TECHNOLOGY
MANAGERS
Vivek Kale
BIG DATA OF COMPLEX NETWORKS
Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, and Andreas Holzinger
BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS
Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea
NETWORKING FOR BIG DATA
Shui Yu, Xiaodong Lin, Jelena Mišić, and Xuemin (Sherman) Shen
Big Data
Computing
A Guide for Business
and Technology Managers

Vivek Kale
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2017 by Vivek Kale
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20160426

International Standard Book Number-13: 978-1-4987-1533-1 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Names: Kale, Vivek, author.


Title: Big data computing : a guide for business and technology managers /
author, Vivek Kale.
Description: Boca Raton : Taylor & Francis, CRC Press, 2016. | Series:
Chapman & Hall/CRC big data series | Includes bibliographical references
and index.
Identifiers: LCCN 2016005989 | ISBN 9781498715331
Subjects: LCSH: Big data.
Classification: LCC QA76.9.B45 K35 2016 | DDC 005.7--dc23
LC record available at https://lccn.loc.gov/2016005989

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To

Nilesh Acharya and family

for unstinted support on

references and research

for my numerous book projects.


This page intentionally left blank
Contents

List of Figures .............................................................................................................................. xxi


List of Tables .............................................................................................................................. xxiii
Preface .......................................................................................................................................... xxv
Acknowledgments .................................................................................................................... xxxi
Author .......................................................................................................................................xxxiii

1. Computing Beyond the Moore’s Law Barrier While Being More Tolerant of
Faults and Failures..................................................................................................................1
1.1 Moore’s Law Barrier .....................................................................................................2
1.2 Types of Computer Systems ........................................................................................4
1.2.1 Microcomputers ...............................................................................................4
1.2.2 Midrange Computers ......................................................................................4
1.2.3 Mainframe Computers ....................................................................................5
1.2.4 Supercomputers ...............................................................................................5
1.3 Parallel Computing .......................................................................................................6
1.3.1 Von Neumann Architectures ................................................................. 8
1.3.2 Non-Neumann Architectures ........................................................................9
1.4 Parallel Processing ........................................................................................................9
1.4.1 Multiprogramming........................................................................................ 10
1.4.2 Vector Processing ........................................................................................... 10
1.4.3 Symmetric Multiprocessing Systems .......................................................... 11
1.4.4 Massively Parallel Processing ...................................................................... 11
1.5 Fault Tolerance ............................................................................................................. 12
1.6 Reliability Conundrum .............................................................................................. 14
1.7 Brewer’s CAP Theorem .............................................................................................. 15
1.8 Summary ...................................................................................................................... 18

Section I Genesis of Big Data Computing

2. Database Basics ..................................................................................................................... 21


2.1 Database Management System .................................................................................21
2.1.1 DBMS Benefits ................................................................................................22
2.1.2 Defining a Database Management System.................................................23
2.1.2.1 Data Models alias Database Models ............................................ 26
2.2 Database Models .........................................................................................................27
2.2.1 Relational Database Model ...........................................................................28
2.2.2 Hierarchical Database Model .......................................................................30
2.2.3 Network Database Model .............................................................................32
2.2.4 Object-Oriented Database Models...............................................................32
2.2.5 Comparison of Models ..................................................................................33
2.2.5.1 Similarities ...................................................................................... 33
2.2.5.2 Dissimilarities ................................................................................. 35

vii
viii Contents

2.3 Database Components................................................................................................36


2.3.1 External Level ................................................................................................. 37
2.3.2 Conceptual Level ........................................................................................... 37
2.3.3 Physical Level ................................................................................................. 38
2.3.4 The Three-Schema Architecture ................................................................. 38
2.3.4.1 Data Independence ........................................................................ 39
2.4 Database Languages and Interfaces ......................................................................... 40
2.5 Categories of Database Management Systems .......................................................42
2.6 Other Databases ..........................................................................................................44
2.6.1 Text Databases ................................................................................................44
2.6.2 Multimedia Databases ..................................................................................44
2.6.3 Temporal Databases.......................................................................................44
2.6.4 Spatial Databases ........................................................................................... 45
2.6.5 Multiple or Heterogeneous Databases ........................................................ 45
2.6.6 Stream Databases ........................................................................................... 45
2.6.7 Web Databases ............................................................................................... 46
2.7 Evolution of Database Technology .......................................................................... 46
2.7.1 Distribution .....................................................................................................47
2.7.2 Performance ....................................................................................................47
2.7.2.1 Database Design for Multicore Processors .................................48
2.7.3 Functionality .................................................................................................. 49
2.8 Summary ...................................................................................................................... 50

Section II Road to Big Data Computing

3. Analytics Basics .................................................................................................................... 53


3.1 Intelligent Analysis ..................................................................................................... 53
3.1.1 Intelligence Maturity Model.........................................................................55
3.1.1.1 Data ..................................................................................................55
3.1.1.2 Communication ..............................................................................55
3.1.1.3 Information .....................................................................................56
3.1.1.4 Concept ............................................................................................56
3.1.1.5 Knowledge ......................................................................................57
3.1.1.6 Intelligence ......................................................................................58
3.1.1.7 Wisdom............................................................................................58
3.2 Decisions ...................................................................................................................... 59
3.2.1 Types of Decisions .........................................................................................59
3.2.2 Scope of Decisions .........................................................................................61
3.3 Decision-Making Process........................................................................................... 61
3.4 Decision-Making Techniques ....................................................................................63
3.4.1 Mathematical Programming ........................................................................63
3.4.2 Multicriteria Decision Making .....................................................................64
3.4.3 Case-Based Reasoning...................................................................................64
3.4.4 Data Warehouse and Data Mining ..............................................................64
3.4.5 Decision Tree...................................................................................................64
3.4.6 Fuzzy Sets and Systems ................................................................................65
Contents ix

3.5 Analytics.......................................................................................................................65
3.5.1 Descriptive Analytics .................................................................................... 66
3.5.2 Predictive Analytics ...................................................................................... 66
3.5.3 Prescriptive Analytics ................................................................................... 67
3.6 Data Science Techniques ............................................................................................ 68
3.6.1 Database Systems...........................................................................................68
3.6.2 Statistical Inference ........................................................................................68
3.6.3 Regression and Classification.......................................................................69
3.6.4 Data Mining and Machine Learning ...........................................................70
3.6.5 Data Visualization ..........................................................................................70
3.6.6 Text Analytics .................................................................................................71
3.6.7 Time Series and Market Research Models..................................................72
3.7 Snapshot of Data Analysis Techniques and Tasks ................................................. 74
3.8 Summary ......................................................................................................................77

4. Data Warehousing Basics .................................................................................................... 79


4.1 Relevant Database Concepts...................................................................................... 79
4.1.1 Physical Database Design .............................................................................80
4.2 Data Warehouse .......................................................................................................... 81
4.2.1 Multidimensional Model ..............................................................................83
4.2.1.1 Data Cube ........................................................................................84
4.2.1.2 Online Analytical Processing ........................................................84
4.2.1.3 Relational Schemas ........................................................................87
4.2.1.4 Multidimensional Cube.................................................................88
4.3 Data Warehouse Architecture ................................................................................... 91
4.3.1 Architecture Tiers ...........................................................................................91
4.3.1.1 Back-End Tier .................................................................................. 91
4.3.1.2 Data Warehouse Tier ..................................................................... 91
4.3.1.3 OLAP Tier ........................................................................................ 93
4.3.1.4 Front-End Tier ................................................................................. 93
4.4 Data Warehouse 1.0..................................................................................................... 93
4.4.1 Inmon’s Information Factory .......................................................................93
4.4.2 Kimbal’s Bus Architecture ............................................................................94
4.5 Data Warehouse 2.0 .................................................................................................... 95
4.5.1 Inmon’s DW 2.0 ..............................................................................................95
4.5.2 Claudia Imhoff and Colin White’s DSS 2.0 ................................................96
4.6 Data Warehouse Architecture Challenges .............................................................. 96
4.6.1 Performance ....................................................................................................98
4.6.2 Scalability ........................................................................................................98
4.7 Summary .................................................................................................................... 100

5. Data Mining Basics ............................................................................................................ 101


5.1 Data Mining ............................................................................................................... 101
5.1.1 Benefits .......................................................................................................... 103
5.2 Data Mining Applications ....................................................................................... 104
5.3 Data Mining Analysis .............................................................................................. 106
5.3.1 Supervised Analysis .................................................................................... 106
5.3.1.1 Exploratory Analysis ................................................................... 106
5.3.1.2 Classification ................................................................................. 107
x Contents

5.3.1.3 Regression ..................................................................................... 107


5.3.1.4 Time Series .................................................................................... 108
5.3.2 Un-Supervised Analysis ............................................................................. 108
5.3.2.1 Association Rules ......................................................................... 108
5.3.2.2 Clustering ...................................................................................... 108
5.3.2.3 Description and Visualization.................................................... 109
5.4 CRISP-DM Methodology ......................................................................................... 109
5.4.1 Business Understanding ............................................................................. 110
5.4.2 Data Understanding .................................................................................... 111
5.4.3 Data Preparation .......................................................................................... 111
5.4.4 Modeling ....................................................................................................... 112
5.4.5 Model Evaluation ......................................................................................... 113
5.4.6 Model Deployment ...................................................................................... 113
5.5 Machine Learning ..................................................................................................... 114
5.5.1 Cybersecurity Systems ................................................................................ 116
5.5.1.1 Data Mining for Cybersecurity .................................................. 117
5.6 Soft Computing ......................................................................................................... 118
5.6.1 Artificial Neural Networks ........................................................................ 119
5.6.2 Fuzzy Systems .............................................................................................. 120
5.6.3 Evolutionary Algorithms ............................................................................ 120
5.6.4 Rough Sets .................................................................................................... 121
5.7 Summary .................................................................................................................... 122

6. Distributed Systems Basics .............................................................................................. 123


6.1 Distributed Systems .................................................................................................. 123
6.1.1 Parallel Computing...................................................................................... 125
6.1.2 Distributed Computing............................................................................... 128
6.1.2.1 System Architectural Styles ........................................................ 129
6.1.2.2 Software Architectural Styles..................................................... 130
6.1.2.3 Technologies for Distributed Computing ................................. 135
6.2 Distributed Databases .............................................................................................. 138
6.2.1 Characteristics of Distributed Databases ................................................. 140
6.2.1.1 Transparency ................................................................................ 140
6.2.1.2 Availability and Reliability ......................................................... 140
6.2.1.3 Scalability and Partition Tolerance ............................................ 141
6.2.1.4 Autonomy ...................................................................................... 141
6.2.2 Advantages and Disadvantages of Distributed Databases.................... 142
6.2.3 Data Replication and Allocation ................................................................ 146
6.2.4 Concurrency Control and Recovery ......................................................... 146
6.2.4.1 Distributed Recovery ................................................................... 147
6.2.5 Query Processing and Optimization ........................................................ 148
6.2.6 Transaction Management ........................................................................... 149
6.2.6.1 Two-Phase Commit Protocol ...................................................... 149
6.2.6.2 Three-Phase Commit Protocol ................................................... 150
6.2.7 Rules for Distributed Databases ................................................................ 151
6.3 Summary .................................................................................................................... 152
Contents xi

7. Service-Oriented Architecture Basics ............................................................................153


7.1 Service-Oriented Architecture ................................................................................ 153
7.1.1 Defining SOA................................................................................................ 155
7.1.1.1 Services .......................................................................................... 155
7.2 SOA Benefits .............................................................................................................. 156
7.3 Characteristics of SOA.............................................................................................. 157
7.3.1 Dynamic, Discoverable, Metadata Driven ............................................... 157
7.3.2 Designed for Multiple Invocation Styles .................................................. 158
7.3.3 Loosely Coupled .......................................................................................... 158
7.3.4 Well-Defined Service Contracts ................................................................. 158
7.3.5 Standard Based ............................................................................................ 158
7.3.6 Granularity of Services and Service Contracts........................................ 158
7.3.7 Stateless ......................................................................................................... 159
7.3.8 Predictable Service-Level Agreements (SLAs) ........................................ 159
7.3.9 Design Services with Performance in Mind ............................................ 159
7.4 SOA Applications ...................................................................................................... 159
7.4.1 Rapid Application Integration ................................................................... 160
7.4.2 Multichannel Access.................................................................................... 160
7.4.3 Business Process Management .................................................................. 160
7.5 SOA Ingredients ........................................................................................................ 161
7.5.1 Objects, Services, and Resources ............................................................... 161
7.5.1.1 Objects............................................................................................ 161
7.5.1.2 Services .......................................................................................... 161
7.5.1.3 Resources ....................................................................................... 162
7.5.2 SOA and Web Services ................................................................................ 163
7.5.2.1 Describing Web Services: Web Services Description
Language ...................................................................................... 165
7.5.2.2 Accessing Web Services: Simple Object Access Protocol ...... 165
7.5.2.3 Finding Web Services: Universal Description, Discovery,
and Integration ............................................................................. 165
7.5.3 SOA and RESTful Services ......................................................................... 166
7.6 Enterprise Service Bus .............................................................................................. 167
7.6.1 Characteristics of an ESB Solution ............................................................ 170
7.6.1.1 Key Capabilities of an ESB .......................................................... 171
7.6.1.2 ESB Scalability .............................................................................. 174
7.6.1.3 Event-Driven Nature of ESB ....................................................... 174
7.7 Summary .................................................................................................................... 175

8. Cloud Computing Basics ...................................................................................................177


8.1 Cloud Definition........................................................................................................ 177
8.2 Cloud Characteristics ............................................................................................... 179
8.2.1 Cloud Storage Infrastructure Requirements ........................................... 180
8.3 Cloud Delivery Models ............................................................................................ 181
8.3.1 Infrastructure as a Service (IaaS)............................................................... 182
8.3.2 Platform as a Service (PaaS) ....................................................................... 182
8.3.3 Software as a Service (SaaS) ....................................................................... 183
xii Contents

8.4 Cloud Deployment Models .....................................................................................185


8.4.1 Private Clouds ..............................................................................................185
8.4.2 Public Clouds ...............................................................................................185
8.4.3 Hybrid Clouds..............................................................................................186
8.4.4 Community Clouds .....................................................................................186
8.5 Cloud Benefits ...........................................................................................................186
8.6 Cloud Challenges ......................................................................................................190
8.6.1 Scalability ......................................................................................................191
8.6.2 Multitenancy.................................................................................................192
8.6.3 Availability ....................................................................................................193
8.6.3.1 Failure Detection .......................................................................... 194
8.6.3.2 Application Recovery................................................................... 195
8.7 Cloud Technologies...................................................................................................195
8.7.1 Virtualization ................................................................................................196
8.7.1.1 Characteristics of Virtualized Environment ............................ 197
8.7.2 Service-Oriented Computing .....................................................................200
8.7.2.1 Advantages of SOA ...................................................................... 201
8.7.2.2 Layers in SOA ............................................................................... 202
8.8 Summary ....................................................................................................................203

Section III Big Data Computing

9. Introducing Big Data Computing ...................................................................................207


9.1 Big Data ......................................................................................................................207
9.1.1 What Is Big Data?.........................................................................................208
9.1.1.1 Data Volume ..................................................................................208
9.1.1.2 Data Velocity .................................................................................210
9.1.1.3 Data Variety................................................................................... 211
9.1.1.4 Data Veracity .................................................................................212
9.1.2 Common Characteristics of Big Data Computing Systems ...................213
9.1.3 Big Data Appliances ....................................................................................214
9.2 Tools and Techniques of Big Data ...........................................................................215
9.2.1 Processing Approach ...................................................................................215
9.2.2 Big Data System Architecture ....................................................................216
9.2.2.1 BASE (Basically Available, Soft State, Eventual
Consistency) .............................................................................. 217
9.2.2.2 Functional Decomposition ..........................................................218
9.2.2.3 Master–Slave Replication ............................................................218
9.2.3 Row Partitioning or Sharding ....................................................................218
9.2.4 Row versus Column-Oriented Data Layouts ..........................................219
9.2.5 NoSQL Data Management..........................................................................220
9.2.6 In-Memory Computing ...............................................................................221
9.2.7 Developing Big Data Applications ............................................................222
9.3 Aadhaar Project .........................................................................................................223
9.4 Summary ....................................................................................................................226
Contents xiii

10. Big Data Technologies .......................................................................................................227


10.1 Functional Programming Paradigm .....................................................................227
10.1.1 Parallel Architectures and Computing Models .................................. 228
10.1.2 Data Parallelism versus Task Parallelism............................................. 228
10.2 Google MapReduce.................................................................................................229
10.2.1 Google File System ..................................................................................231
10.2.2 Google Bigtable ........................................................................................ 232
10.3 Yahoo!’s Vision of Big Data Computing ..............................................................233
10.3.1 Apache Hadoop........................................................................................234
10.3.1.1 Components of Hadoop Ecosystem.....................................235
10.3.1.2 Principles and Patterns Underlying the Hadoop
Ecosystem ................................................................................ 236
10.3.1.3 Storage and Processing Strategies ........................................237
10.3.2 Hadoop 2 alias YARN ............................................................................. 238
10.3.2.1 HDFS Storage .......................................................................... 239
10.3.2.2 MapReduce Processing .......................................................... 239
10.4 Hadoop Distribution ..............................................................................................240
10.4.1 Cloudera Distribution of Hadoop (CDH) ............................................243
10.4.2 MapR..........................................................................................................243
10.4.3 Hortonworks Data Platform (HDP) ......................................................243
10.4.4 Pivotal HD................................................................................................. 243
10.5 Storage and Processing Strategies ........................................................................244
10.5.1 Characteristics of Big Data Storage Methods.......................................244
10.5.2 Characteristics of Big Data Processing Methods................................. 244
10.6 NoSQL Databases ....................................................................................................245
10.6.1 Column-Oriented Stores or Databases .................................................246
10.6.2 Key-Value Stores (K-V Stores) or Databases .........................................246
10.6.3 Document-Oriented Databases ..............................................................247
10.6.4 Graph Stores or Databases ...................................................................... 248
10.6.5 Comparison of NoSQL Databases ......................................................... 248
10.7 Summary .................................................................................................................. 249

11. Big Data NoSQL Databases ..............................................................................................251


11.1 Characteristics of NoSQL Systems .......................................................................254
11.1.1 NoSQL Characteristics Related to Distributed Systems and
Distributed Databases .............................................................................254
11.1.2 NoSQL Characteristics Related to Data Models and Query
Languages ................................................................................................. 256
11.2 Column Databases ..................................................................................................256
11.2.1 Cassandra.................................................................................................. 258
11.2.1.1 Cassandra Features.................................................................258
11.2.2 Google BigTable .......................................................................................260
11.2.3 HBase ......................................................................................................... 260
11.2.3.1 HBase Data Model and Versioning ......................................260
11.2.3.2 HBase CRUD Operations ...................................................... 262
11.2.3.3 HBase Storage and Distributed System Concepts ............. 263
xiv Contents

11.3 Key-Value Databases............................................................................................. 263


11.3.1 Riak .......................................................................................................... 264
11.3.1.1 Riak Features ........................................................................ 264
11.3.2 Amazon Dynamo .................................................................................. 265
11.3.2.1 DynamoDB Data Model ...................................................... 266
11.4 Document Databases ............................................................................................ 266
11.4.1 CouchDB ................................................................................................. 268
11.4.2 MongoDB ................................................................................................ 268
11.4.2.1 MongoDB Features ..............................................................269
11.4.2.2 MongoDB Data Model ........................................................270
11.4.2.3 MongoDB CRUD Operations .............................................272
11.4.2.4 MongoDB Distributed Systems Characteristics .............. 272
11.5 Graph Databases ................................................................................................... 274
11.5.1 OrientDB ................................................................................................. 274
11.5.2 Neo4j ........................................................................................................ 275
11.5.2.1 Neo4j Features ......................................................................275
11.5.2.2 Neo4j Data Model ................................................................ 276
11.6 Summary ................................................................................................................ 277

12. Big Data Development with Hadoop..............................................................................279


12.1 Hadoop MapReduce .............................................................................................284
12.1.1 MapReduce Processing .........................................................................284
12.1.1.1 JobTracker..............................................................................284
12.1.1.2 TaskTracker ........................................................................... 286
12.1.2 MapReduce Enhancements and Extensions ...................................... 286
12.1.2.1 Supporting Iterative Processing .........................................286
12.1.2.2 Join Operations .....................................................................288
12.1.2.3 Data Indices ..........................................................................289
12.1.2.4 Column Storage .................................................................... 290
12.2 YARN ......................................................................................................................291
12.3 Hadoop Distributed File System (HDFS)........................................................... 293
12.3.1 Characteristics of HDFS ........................................................................ 293
12.4 HBase ...................................................................................................................... 295
12.4.1 HBase Architecture ............................................................................... 296
12.5 ZooKeeper .............................................................................................................. 297
12.6 Hive .........................................................................................................................297
12.7 Pig ............................................................................................................................ 298
12.8 Kafka .......................................................................................................................299
12.9 Flume ......................................................................................................................300
12.10 Sqoop ......................................................................................................................300
12.11 Impala ..................................................................................................................... 301
12.12 Drill .........................................................................................................................302
12.13 Whirr .......................................................................................................................302
12.14 Summary ................................................................................................................ 302

13. Big Data Analysis Languages, Tools, and Environments ..........................................303


13.1 Spark ....................................................................................................................... 303
13.1.1 Spark Components ................................................................................305
Contents xv

13.1.2 Spark Concepts ......................................................................................306


13.1.2.1 Shared Variables ..................................................................306
13.1.2.2 SparkContext .......................................................................306
13.1.2.3 Resilient Distributed Datasets ...........................................306
13.1.2.4 Transformations...................................................................306
13.1.2.5 Action .................................................................................... 307
13.1.3 Benefits of Spark .................................................................................... 307
13.2 Functional Programming ......................................................................................308
13.3 Clojure....................................................................................................................... 312
13.4 Python ....................................................................................................................... 313
13.4.1 NumPy.................................................................................................... 313
13.4.2 SciPy ........................................................................................................ 313
13.4.3 Pandas ..................................................................................................... 313
13.4.4 Scikit-Learn ............................................................................................ 313
13.4.5 IPython ................................................................................................... 314
13.4.6 Matplotlib ............................................................................................... 314
13.4.7 Stats Models ...........................................................................................314
13.4.8 Beautiful Soup ....................................................................................... 314
13.4.9 NetworkX ............................................................................................... 314
13.4.10 NLTK....................................................................................................... 314
13.4.11 Gensim .................................................................................................... 314
13.4.12 PyPy ........................................................................................................ 315
13.5 Scala .......................................................................................................................... 315
13.5.1 Scala Advantages .................................................................................. 316
13.5.1.1 Interoperability with Java .................................................. 316
13.5.1.2 Parallelism............................................................................ 316
13.5.1.3 Static Typing and Type Inference ......................................316
13.5.1.4 Immutability ........................................................................316
13.5.1.5 Scala and Functional Programs .........................................317
13.5.1.6 Null Pointer Uncertainty.................................................... 317
13.5.2 Scala Benefits ......................................................................................... 318
13.5.2.1 Increased Productivity .......................................................318
13.5.2.2 Natural Evolution from Java .............................................318
13.5.2.3 Better Fit for Asynchronous and Concurrent Code ....... 318
13.6 R ................................................................................................................................. 319
13.6.1 Analytical Features of R ....................................................................... 319
13.6.1.1 General..................................................................................319
13.6.1.2 Business Dashboard and Reporting .................................320
13.6.1.3 Data Mining .........................................................................320
13.6.1.4 Business Analytics .............................................................. 320
13.7 SAS ............................................................................................................................ 321
13.7.1 SAS DATA Step...................................................................................... 321
13.7.2 Base SAS Procedures ............................................................................ 322
13.8 Summary .................................................................................................................. 323

14. Big Data DevOps Management .......................................................................................325


14.1 Big Data Systems Development Management .................................................... 326
14.1.1 Big Data Systems Architecture............................................................ 326
xvi Contents

14.1.2 Big Data Systems Lifecycle ..................................................................... 326


14.1.2.1 Data Sourcing ..........................................................................326
14.1.2.2 Data Collection and Registration in a Standard Format .......326
14.1.2.3 Data Filter, Enrich, and Classification..................................327
14.1.2.4 Data Analytics, Modeling, and Prediction ..........................327
14.1.2.5 Data Delivery and Visualization ..........................................328
14.1.2.6 Data Supply to Consumer Analytics Applications ............ 328
14.2 Big Data Systems Operations Management ........................................................ 328
14.2.1 Core Portfolio of Functionalities ............................................................ 328
14.2.1.1 Metrics for Interfacing to Cloud Service Providers ........... 330
14.2.2 Characteristics of Big Data and Cloud Operations ............................. 332
14.2.3 Core Services ............................................................................................ 332
14.2.3.1 Discovery and Replication ....................................................332
14.2.3.2 Load Balancing........................................................................333
14.2.3.3 Resource Management ...........................................................333
14.2.3.4 Data Governance .................................................................... 333
14.2.4 Management Services .............................................................................334
14.2.4.1 Deployment and Configuration ...........................................334
14.2.4.2 Monitoring and Reporting ....................................................334
14.2.4.3 Service-Level Agreements (SLAs) Management ................334
14.2.4.4 Metering and Billing ..............................................................335
14.2.4.5 Authorization and Authentication .......................................335
14.2.4.6 Fault Tolerance ........................................................................ 335
14.2.5 Governance Services ............................................................................... 336
14.2.5.1 Governance ..............................................................................336
14.2.5.2 Security.....................................................................................337
14.2.5.3 Privacy ......................................................................................338
14.2.5.4 Trust ..........................................................................................339
14.2.5.5 Security Risks ..........................................................................340
14.2.6 Cloud Governance, Risk, and Compliance .......................................... 341
14.2.6.1 Cloud Security Solutions .......................................................344
14.3 Migrating to Big Data Technologies .....................................................................346
14.3.1 Lambda Architecture ..............................................................................348
14.3.1.1 Batch Processing .....................................................................348
14.3.1.2 Real Time Analytics ............................................................... 349
14.4 Summary .................................................................................................................. 349

Section IV Big Data Computing Applications

15. Web Applications................................................................................................................353


15.1 Web-Based Applications ........................................................................................ 353
15.2 Reference Architecture ...........................................................................................354
15.2.1 User Interaction Architecture ................................................................ 355
15.2.2 Service-Based Architecture .................................................................... 355
15.2.3 Business Object Architecture ................................................................. 356
15.3 Realization of the Reference Architecture in J2EE ............................................. 356
15.3.1 JavaServer Pages and Java Servlets as the User Interaction
Components .............................................................................................. 356
Contents xvii

15.3.2 Session Bean EJBs as Service-Based Components ...............................356


15.3.3 Entity Bean EJBs as the Business Object Components .......................357
15.3.4 Distributed Java Components ................................................................357
15.3.5 J2EE Access to the EIS (Enterprise Information Systems) Tier .......... 357
15.4 Model–View–Controller Architecture ................................................................. 357
15.5 Evolution of the Web ............................................................................................... 359
15.5.1 Web 1.0.......................................................................................................359
15.5.2 Web 2.0....................................................................................................... 359
15.5.2.1 Weblogs or Blogs.....................................................................359
15.5.2.2 Wikis .........................................................................................360
15.5.2.3 RSS Technologies ....................................................................360
15.5.2.4 Social Tagging .........................................................................361
15.5.2.5 Mashups: Integrating Information .......................................361
15.5.2.6 User Contributed Content ..................................................... 361
15.5.3 Web 3.0....................................................................................................... 362
15.5.4 Mobile Web ...............................................................................................363
15.5.5 The Semantic Web .................................................................................... 363
15.5.6 Rich Internet Applications ......................................................................364
15.6 Web Applications ....................................................................................................364
15.6.1 Web Applications Dimensions............................................................... 365
15.6.1.1 Presentation .............................................................................365
15.6.1.2 Dialogue ...................................................................................366
15.6.1.3 Navigation ...............................................................................366
15.6.1.4 Process ......................................................................................366
15.6.1.5 Data ........................................................................................... 367
15.7 Search Analysis ....................................................................................................... 367
15.7.1 SLA Process .............................................................................................. 368
15.8 Web Analysis ........................................................................................................... 371
15.8.1 Veracity of Log Files Data ....................................................................... 374
15.8.1.1 Unique Visitors .......................................................................374
15.8.1.2 Visitor Count ...........................................................................374
15.8.1.3 Visit Duration .......................................................................... 375
15.8.2 Web Analysis Tools.................................................................................. 375
15.9 Summary .................................................................................................................. 376

16. Social Network Applications ...........................................................................................377


16.1 Networks .................................................................................................................. 378
16.1.1 Concept of Networks...............................................................................378
16.1.2 Principles of Networks ............................................................................ 379
16.1.2.1 Metcalfe’s Law ........................................................................379
16.1.2.2 Power Law ...............................................................................379
16.1.2.3 Small Worlds Networks ......................................................... 379
16.2 Computer Networks ............................................................................................... 380
16.2.1 Internet ......................................................................................................381
16.2.2 World Wide Web (WWW) ...................................................................... 381
16.3 Social Networks....................................................................................................... 382
16.3.1 Popular Social Networks ........................................................................ 386
16.3.1.1 LinkedIn................................................................................... 386
16.3.1.2 Facebook................................................................................... 386
xviii Contents

16.3.1.3 Twitter ................................................................................... 387


16.3.1.4 Google+ ................................................................................ 388
16.3.1.5 Other Social Networks........................................................... 389
16.4 Social Networks Analysis (SNA) .......................................................................... 389
16.5 Text Analysis ............................................................................................................ 391
16.5.1 Defining Text Analysis ............................................................................ 392
16.5.1.1 Document Collection .......................................................... 392
16.5.1.2 Document ............................................................................. 393
16.5.1.3 Document Features ............................................................. 393
16.5.1.4 Domain Knowledge ............................................................ 395
16.5.1.5 Search for Patterns and Trends .......................................... 396
16.5.1.6 Results Presentation ............................................................... 396
16.6 Sentiment Analysis ................................................................................................. 397
16.6.1 Sentiment Analysis and Natural Language Processing (NLP) ......... 398
16.6.2 Applications ..............................................................................................400
16.7 Summary ..................................................................................................................400

17. Mobile Applications...........................................................................................................401


17.1 Mobile Computing Applications .......................................................................... 401
17.1.1 Generations of Communication Systems ............................................. 402
17.1.1.1 1st Generation: Analog ....................................................... 402
17.1.1.2 2nd Generation: CDMA, TDMA, and GSM ..................... 402
17.1.1.3 2.5 Generation: GPRS, EDGE, and CDMA 2000 .............. 405
17.1.1.4 3rd Generation: wCDMA, UMTS, and iMode ................. 406
17.1.1.5 4th Generation ......................................................................... 406
17.1.2 Mobile Operating Systems ..................................................................... 406
17.1.2.1 Symbian ................................................................................ 406
17.1.2.2 BlackBerry OS ...................................................................... 407
17.1.2.3 Google Android ................................................................... 407
17.1.2.4 Apple iOS ............................................................................. 408
17.1.2.5 Windows Phone ......................................................................408
17.2 Mobile Web Services ...............................................................................................408
17.2.1 Mobile Field Cloud Services ................................................................... 412
17.3 Context-Aware Mobile Applications .................................................................... 414
17.3.1 Ontology-Based Context Model ............................................................. 415
17.3.2 Context Support for User Interaction .................................................... 415
17.4 Mobile Web 2.0 ........................................................................................................ 416
17.5 Mobile Analytics ..................................................................................................... 418
17.5.1 Mobile Site Analytics ...............................................................................418
17.5.2 Mobile Clustering Analysis ....................................................................418
17.5.3 Mobile Text Analysis ...............................................................................419
17.5.4 Mobile Classification Analysis ...............................................................420
17.5.5 Mobile Streaming Analysis .................................................................... 421
17.6 Summary .................................................................................................................. 421

18. Location-Based Systems Applications ............................................................................423


18.1 Location-Based Systems .........................................................................................423
18.1.1 Sources of Location Data ........................................................................ 424
18.1.1.1 Cellular Systems ..................................................................... 424
Contents xix

18.1.1.2 Multireference Point Systems ............................................ 426


18.1.1.3 Tagging..................................................................................... 427
18.1.2 Mobility Data ............................................................................................ 429
18.1.2.1 Mobility Data Mining ............................................................430
18.2 Location-Based Services ......................................................................................... 432
18.2.1 LBS Characteristics ..................................................................................435
18.2.2 LBS Positioning Technologies ................................................................436
18.2.3 LBS System Architecture .........................................................................437
18.2.4 LBS System Components ........................................................................439
18.2.5 LBS System Challenges ........................................................................... 439
18.3 Location-Based Social Networks .......................................................................... 441
18.4 Summary ..................................................................................................................443

19. Context-Aware Applications.............................................................................................445


19.1 Context-Aware Applications..................................................................................446
19.1.1 Types of Context-Awareness ..................................................................448
19.1.2 Types of Contexts .....................................................................................449
19.1.3 Context Acquisition .................................................................................450
19.1.4 Context Models ........................................................................................450
19.1.5 Generic Context-Aware Application Architecture ..............................452
19.1.6 Illustrative Context-Aware Applications .............................................. 452
19.2 Decision Pattern as Context ................................................................................... 453
19.2.1 Concept of Patterns ..................................................................................454
19.2.1.1 Patterns in Information Technology (IT) Solutions ........... 455
19.2.2 Domain-Specific Decision Patterns ....................................................... 455
19.2.2.1 Financial Decision Patterns ................................................ 455
19.2.2.2 CRM Decision Patterns .......................................................... 457
19.3 Context-Aware Mobile Services ............................................................................ 460
19.3.1 Limitations of Existing Infrastructure.................................................. 460
19.3.1.1 Limited Capability of Mobile Devices .............................. 460
19.3.1.2 Limited Sensor Capability .................................................. 461
19.3.1.3 Restrictive Network Bandwidth ........................................ 461
19.3.1.4 Trust and Security Requirements ...................................... 461
19.3.1.5 Rapidly Changing Context .................................................... 461
19.3.2 Types of Sensors .......................................................................................462
19.3.3 Context-Aware Mobile Applications ..................................................... 462
19.3.3.1 Context-Awareness Management Framework ...................464
19.4 Summary .................................................................................................................. 467

Epilogue: Internet of Things ................................................................................................... 469


References ................................................................................................................................... 473
Index ............................................................................................................................................. 475
This page intentionally left blank
List of Figures

Figure 1.1 Increase in the number of transistors on an Intel chip.........................................2


Figure 1.2 Hardware trends in the 1990s and the first decade .............................................3
Figure 1.3 Von Neumann computer architecture ...................................................................9
Figure 2.1 A hierarchical organization ...................................................................................30
Figure 2.2 The three-schema architecture ............................................................................ 37
Figure 2.3 Evolution of database technology......................................................................... 47
Figure 3.1 Characteristics of enterprise intelligence in terms of the scope of the
decisions ................................................................................................................... 61
Figure 4.1 Cube for sales data having dimensions store, time, and product and a
measure amount ......................................................................................................84
Figure 4.2 OLAP Operations. (a) Original Cube (b) Roll-up to the Country
level (c) Drill down to the month level (d) Pivot (e) Slice on
Store.City = ‘Mumbai’ (f) Dice on Store.Country = ‘US’ and Time.
Quarter = ‘Q1’ or ‘Q2’ .............................................................................................85
Figure 4.3 Example of a star schema....................................................................................... 87
Figure 4.4 Example of a snowflake schema ........................................................................... 88
Figure 4.5 Example of a constellation schema ....................................................................... 89
Figure 4.6 Lattice of cuboids derived from a four-dimensional cube................................90
Figure 4.7 Reference data warehouse architecture ............................................................... 92
Figure 5.1 Schematic of CRISP-DM methodology .............................................................. 110
Figure 5.2 Architecture of a machine-learning system...................................................... 115
Figure 5.3 Architecture of a fuzzy inference system.......................................................... 120
Figure 5.4 Architecture of a rough sets system ................................................................... 122
Figure 6.1 Parallel computing architectures: (a) Flynn’s taxonomy (b) shared
memory system, and (c) distributed system...................................................... 127
Figure 7.1 Web Services usage model ................................................................................... 164
Figure 7.2 ESB reducing connection complexity (a) Direct point-to-point
connections (n*n) and (b) Connecting through the bus (n) ............................ 168
Figure 7.3 Enterprise service bus (ESB) linking disparate systems and computing
environments ......................................................................................................... 169
Figure 8.1 The cloud reference model ................................................................................... 183
Figure 8.2 Portfolio of services for the three cloud delivery models ............................... 184

xxi
xxii List of Figures

Figure 9.1 4V characteristics of big data ...................................................................................... 208


Figure 9.2 Use cases for big data computing ........................................................................ 209
Figure 9.3 Parallel architectures .......................................................................................... 215
Figure 9.4 The solution architecture for the Aadhaar project .........................................225
Figure 10.1 Execution phases in a generic MapReduce application ................................. 230
Figure 10.2 Comparing the architecture of Hadoop 1 and Hadoop 2 ............................. 240
Figure 12.1 Hadoop ecosystem .............................................................................................. 282
Figure 12.2 Hadoop MapReduce architecture..................................................................... 285
Figure 12.3 YARN architecture ............................................................................................. 292
Figure 12.4 HDFS architecture .............................................................................................. 295
Figure 14.1 Big data systems architecture ............................................................................ 327
Figure 14.2 Big data systems lifecycle (BDSL) ..................................................................... 328
Figure 14.3 Lambda architecture ..........................................................................................348
Figure 15.1 Enterprise application in J2EE ........................................................................... 355
Figure 15.2 MVC and enterprise application architecture ................................................ 358
Figure 18.1 Principle of lateration.......................................................................................... 427
Figure 18.2 Trajectory mapping ............................................................................................. 432
Figure 19.1 Context definition ................................................................................................465
Figure 19.2 Conceptual metamodel of a context ................................................................. 466
List of Tables

Table 2.1 Characteristics of the Four Database Models ..................................................... 36


Table 2.2 Levels of Data Abstraction .................................................................................... 38
Table 3.1 Intelligence Maturity Model (IMM) ..................................................................... 55
Table 3.2 Analysis Techniques versus Tasks ....................................................................... 74
Table 4.1 Comparison between OLTP and OLAP Systems ............................................... 82
Table 4.2 Comparison between Operational Databases and Data Warehouses ............83
Table 4.3 The DSS 2.0 Spectrum ............................................................................................ 96
Table 5.1 Data Mining Application Areas.......................................................................... 105
Table 5.2 Characteristics of Soft Computing Compared with Traditional Hard
Computing ............................................................................................................. 118
Table 8.1 Key Attributes of Cloud Computing .................................................................. 178
Table 8.2 Key Attributes of Cloud Services ....................................................................... 179
Table 8.3 Comparison of Cloud Delivery Models............................................................. 185
Table 8.4 Comparison of Cloud Benefits for Small and Medium Enterprises
(SMEs) and Large Enterprises ............................................................................. 189
Table 9.1 Scale of Data ........................................................................................................... 210
Table 9.2 Value of Big Data across Industries .................................................................... 211
Table 9.3 Industry Use Cases for Big Data ......................................................................... 212
Table 10.1 MapReduce Cloud Implementations .................................................................. 233
Table 10.2 Comparison of MapReduce Implementations .................................................. 233
Table 12.1 Hadoop Ecosystem Classification by Timescale and General
Purpose of Usage .................................................................................................. 283
Table 15.1 Comparison between Web 1.0 and Web 2.0 ...................................................... 362
Table 17.1 Evolution of Wireless Networks ..........................................................................404
Table 17.2 Comparison of Mobile Operating Systems ....................................................... 407
Table 18.1 Location-Based Services (LBS) Classification ................................................... 424
Table 18.2 LBS Quality of Service (QOS) Requirements ....................................................434
Table 18.3 Location Enablement Technologies .................................................................... 437
Table 18.4 Accuracy and TIFF for Several Location Techniques ...................................... 438

xxiii
This page intentionally left blank
Preface

The rapid growth of the Internet and World Wide Web has led to vast amounts of infor-
mation available online. In addition, business and government organizations create large
amounts of both structured and unstructured information that need to be processed, ana-
lyzed, and linked. It is estimated that the amount of information stored in a digital form in
2007 was 281 exabytes, and the overall compound growth rate has been 57% with informa-
tion in organizations growing at an even faster rate. It is also estimated that 95% of all cur-
rent information exists in unstructured form with increased data processing requirements
compared to structured information. The storing, managing, accessing, and processing
of this vast amount of data represent a fundamental need and an immense challenge in
order to satisfy the need to search, analyze, mine, and visualize these data as information.
This deluge of data, along with emerging techniques and technologies used to handle it, is
commonly referred to today as big data computing.
Big data can be defined as volumes of data available in varying degrees of complexity,
generated at different velocities and varying degrees of ambiguity, that cannot be pro-
cessed using traditional technologies, processing methods, algorithms, or any commercial
off-the-shelf solutions. Such data include weather, geo-spatial and GIS data, consumer-
driven data from social media, enterprise-generated data from legal, sales, marketing,
procurement, finance and human-resources departments, and device-generated data from
sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines. This
book describes the characteristics, challenges, and solutions for enabling such big data
computing.
The fundamental challenges of big data computing are managing and processing expo-
nentially growing data volumes, significantly reducing associated data analysis cycles to
support practical, timely applications, and developing new algorithms that can scale to
search and process massive amounts of data. The answer to these challenges is a scalable,
integrated computer systems hardware and software architecture designed for parallel
processing of big data computing applications. Cloud computing is a prerequisite to big
data computing; cloud computing provides the opportunity for organizations with lim-
ited internal resources to implement large-scale big data computing applications in a cost-
effective manner.
Relational databases are based on the relational model and provide online transac-
tion processing (OLTP), schema-on-write, and SQL. Data warehouses are based on the
relational model and support online analytical processing (OLAP). Data warehouses are
designed to optimize data analysis, reporting, and data mining; data are extracted, trans-
formed, and loaded (ETL) from other data sources to load data into the data warehouse.
However, today’s data environment demands innovations that are faster, extremely scal-
able, scale cost effectively, and work easily with structured, semistructured, and unstruc-
tured data. Hadoop and NoSQL databases are designed to work easily with structured,
unstructured, and semistructured data. Hadoop, along with relational databases and data
warehouses, increases the strategies and capabilities of leveraging data to increase the
accuracy and speed of business decisions. Hadoop plays an important role in the modern
data architecture. Organizations that can leverage the capabilities of relational databases,
data warehouses and Hadoop, and all the available data sources will have a competitive
advantage.

xxv
xxvi Preface

What Makes This Book Different?


This book interprets the 2010s big data computing phenomenon from the point of view
of business as well as technology. This book unravels the mystery of big data computing
environments and applications and their power and potential to transform the operating
contexts of business enterprises. It addresses the key differentiator of big data comput-
ing environments, namely, that big data computing systems combine the power of elas-
tic infrastructure (via cloud computing) and information management with the ability to
analyze and discern recurring patterns in the colossal pools of operational and transac-
tions data (via big data computing) to leverage and transform them into success patterns
for an enterprise’s business. These extremes of requirements for storage and processing
arose primarily from developments of the past decade in the areas of social networks,
mobile computing, location-based systems, and so on.
This book highlights the fact that handling gargantuan amounts of data became possi-
ble only because big data computing makes feasible computing beyond the practical limits
imposed by Moore’s law. Big data achieves this by employing non–Neumann architectures
enabling parallel processing on a network of nodes equipped with extremely cost-effective
commoditized hardware while simultaneously being more tolerant of fault and failures
that are unavoidable in any realistic computing environments.

On April 19, 1965, Gordon Moore, the cofounder of Intel Corporation, pub-
lished an article in Electronics Magazine titled “Cramming More Components
onto Integrated Circuits” in which he identified and conjectured a trend that
computing power would double every 2 years (this was termed as Moore’s
law in 1970 by CalTech professor and VLSI pioneer Calvin Mead). This law has been
able to predict reliably both the reduction in costs and the improvements in comput-
ing capability of microchips, and those predictions have held true since then. This
law effectively is a measure and professes limits on the increase in computing power
that can be realistically achieved every couple of years. The requirements of modern
applications such as social networks and mobile applications far outstrip what can be
delivered by conventional Von Neumann architectures employed since the 1950s.

The phenomenon of big data computing has attained prominence in the context of the
heightened interest in business analytics. An inevitable consequence of organizations
using the pyramid-shaped hierarchy is that there is a decision-making bottleneck at the
top of the organization. The people at the top are overwhelmed by the sheer volume of
decisions they have to make; they are too far away from the scene of the action to really
understand what’s happening; and by the time decisions are made, the actions are usually
too little and too late. The need to be responsive to evolving customer needs and desires
creates operational structures and systems where business analysis and decision mak-
ing are pushed out to operating units that are closest to the scene of the action—which,
however, lack the expertise and resources to access, process, evaluate, and decide on the
course of action. This engenders the significance of analysis systems that are essential for
enabling decisive action as close to the customer as possible.
Preface xxvii

The characteristic features of this book are as follows:

1. It enables IT managers and business decision-makers to get a clear understanding


of what big data computing really means, what it might do for them, and when it
is practical to use it.
2. It gives an introduction to the database solutions that were a first step toward
enabling data-based enterprises. It also gives a detailed description of data
warehousing- and data mining-related solutions that paved the road to big data
computing solutions.
3. It describes the basics of distributed systems, service-oriented architecture (SOA),
web services, and cloud computing that were essential prerequisites to the emer-
gence of big data computing solutions.
4. It provides a very wide treatment of big data computing that covers the
functionalities (and features) of NoSQL, MapReduce programming models,
Hadoop development ecosystem, and analysis tools environments.
5. It covers most major application areas of interest in big data computing: web, social
networks, mobile, and location-based systems (LBS) applications.
6. It is not focused on any particular vendor or service offering. Although there is a
good description of the open source Hadoop development ecosystem and related
tools and technologies, the text also introduces NoSQL and analytics solutions
from commercial vendors.

In the final analysis, big data computing is a realization of the vision of intelligent
infrastructure that reforms and reconfigures automatically to store and/or process
incoming data based on the predefined requirements and predetermined context of the
requirements. The intelligent infrastructure itself will automatically capture, store, man-
age and analyze incoming data, take decisions, and undertake prescribed actions for
standard scenarios, whereas nonstandard scenarios would be routed to DevOperators.
An extension of this vision also underlies the future potential of Internet of Things (IoT)
discussed in the Epilogue. IoT is the next step in the journey that commenced with com-
puting beyond the limits of Moore’s law made possible by cloud computing followed by
the advent of big data computing technologies (like MapReduce and NoSQL). I wanted
to write a book presenting big data computing from this novel perspective of computing
beyond the practical limits imposed by Moore’s law; the outcome is the book that you are
reading now. Thank you!

How This Book Is Organized


This book traces the road to big data computing, the detailed features and characteristics
of big data computing solutions and environments, and, in the last section, high-potential
application areas of big data.
Section I provides a glimpse of the genesis of big data computing. Chapter 2 provides an
overview of traditional database environments.
xxviii Preface

Section II describes significant milestones on the road to big data computing. Chapters 3
and 4 review the basics of analytics and data warehousing. Chapter 5 presents the stan-
dard characteristics of data mining. Chapters 7 and 8 wrap up this part with a detailed
discussion on the nature and characteristics of service oriented architecture (SOA) and
web services and cloud computing.
Section III presents a detailed discussion on various aspects of a big data computing
solution. The approach adopted in this book will be useful to any professional who must
present a case for realizing big data computing solutions or to those who could be involved
in a big data computing project. It provides a framework that will enable business and
technical managers to make the optimal decisions necessary for the successful migra-
tion to big data computing environments and applications within their organizations.
Chapter 9 introduces the basics of big data computing and gives an introduction to the
tools and technologies, including those that are essential for big data computing.
Chapter 10 describes the various technologies employed for realization of big data com-
puting solutions: Hadoop development, NoSQL databases, and YARN. Chapter 11 details
the offerings of various big data NoSQL database vendors. Chapter 12 details the Hadoop
ecosystem essential for the development of big data applications. Chapter 13 presents
details on analysis languages, tools and development environments. Chapter 14 describes
big data-related management and operation issues that become critical as the big data
computing environments become more complex.
Section IV presents detailed descriptions of major areas of big data computing applica-
tions. Chapter 15 discusses now familiar web-based application environments. Chapter 16
addresses popular social network applications such as Facebook and Twitter. Chapter 17
deals with the burgeoning mobile applications such as WhatsApp. Finally, Chapter 18
describes lesser known but more promising areas of location-based applications.
Context-aware applications can significantly enhance the efficiency and effectiveness
of even routinely occurring transactions. Chapter 19 introduces the concept of context as
constituted by an ensemble of function-specific decision patterns. This chapter highlights
the fact that any end-user application’s effectiveness and performance can be enhanced by
transforming it from a bare transaction to a transaction clothed by a surrounding context
formed as an aggregate of all relevant decision patterns in the past. This generation of
an important component of the generalized concept of context is critically dependent on
employing big data computing techniques and technologies deployed via cloud computing.

Who Should Read This Book?


All stakeholders of a big data project can read this book.
This book presents a detailed discussion on various aspects of a big data computing
solution. The approach adopted in this book will be useful to any professional who must
present a case for realizing big data computing solutions or to those who could be involved
in a big data computing project. It provides a framework to enable business and technical
managers to make the optimal decisions necessary for the successful migration to big data
computing environments and applications within their organizations.
Another random document with
no related content on Scribd:
The Stretcher Bed may be Stuffed with Browse or
Leaves, or Suspended from Poles and Stakes to
Make a Camp Cot
Food Bags with
Friction-Top Tins to
Fit Them, in Which
Lard, Butter, Pork,
Ham, and Other
Greasy Necessities
are Carried
A Pack Basket with a Waterproof Canvas Lid and Cover,
Having Straps to Go over the Shoulders, Is a General
Favorite with Woodsmen and Guides

The Compass Is by Far the Most


Useful Instrument for the Woods,
but Any Reliable and Inexpensive
Watch may be Carried

A Good, Tempered Knife Should be


Worn at the Belt

When Going Light the Belt Ax is Used


The Cooking Kit may be of Aluminum or Steel, All
Nesting within the Largest Pot, and may Include a
Folding Baker, or Reflector, with Bread Board in
Canvas Bag, a Wood Salt Box, and a Water-Tight
Can for Matches
Folding Candle
Lanterns are the
Most Convenient for
the Woods and They
Give Sufficient Light
for Camp Life

The Camper’s Outfit

The personal outfit should include only the most useful articles,
and each member of the party should be provided with a dunnage
bag of canvas to hold bedding and clothing, and a smaller, or “ditty,”
bag for keeping together the toilet and other personal belongings
which most everyone finds necessary for everyday comfort. A
mending kit, containing a few yards of silk, linen, and twist; a length
of mending cotton; buttons; a few needles and pins, both safety and
the common kinds, should not be overlooked. The veteran usually
stows away a bit of wire; a length of strong twine; a few nails and
tacks; rivets, etc., for emergency use, and it is surprising to the
novice how handy these several odds and ends are found while in
camp. A compact tin box will form a convenient place to keep them
and will take up little room in the dunnage bag. A medicine case and
a first-aid outfit are well worth packing; the smallest cases containing
a few of the common remedies will fully meet the camper’s needs.
When carrying food by canoe or pack basket, the canoe duffel and
provision bags are a great convenience, enabling the outer to carry
different foodstuffs in a compact and sanitary manner. Food bags
may be had in different sizes, and friction-top tins may be purchased
to fit them; and one or more of these liquid-proof containers are
desirable for transporting lard, butter, pork, ham, and other greasy
necessities. The food bags slip into the larger duffel bags, making a
very compact bundle for stowing away in a canoe or pack harness.

Carrying List for the Camp Outfit

For permanent camps, take the wall tent with fly, although the
Baker or camp-fire styles are also good. When traveling light by
canoe, the canoe or protean tents are recommended. When going
very light by pack, use the forester’s or ranger’s tent. Sod and floor
cloths and mosquito netting are optional.
The cooking kit may be of aluminum or steel, all nesting within the
largest pot. Include a folding baker, or reflector, with bread board in a
canvas bag, a wood salt box, and a water-tight can for matches.
Furniture for the permanent camp consists of a full-sized ax,
double-blade or tomahawk style with straight handle, in a protecting
case, whetstone and file for keeping the ax in shape. A shovel and
saw will be needed when a cabin is built. A canteen may be
included, but is not required on most trips. A folding candle lantern is
the best for the average trip, but an oil, or acetylene, lantern may be
used in a fixed camp. Cots, folding chairs, tables, hangers, etc., are
only useful in fixed camps.
A pack basket with a waterproof-canvas lid and cover, having
straps to go over the shoulders, is a general favorite with woodsmen
and guides. Canvas packs or dunnage bags may be used if
preferred. There are two sizes of food bags, one holding 5 lb. and
another of 10-lb. capacity, with draw-strings at the top, and these are
the best for carrying provisions.
Pack harness, with a tumpline to go across the forehead, is
needed when the outfit must be carried on portages, etc. This may
be omitted when pack baskets are used. Packing cases of fiber may
be used for shipping the outfit to the camping ground, but ordinary
trunks, or wood boxes, will answer as well.

The Personal Outfit

An old ordinary suit that is not worn too thin is sufficient. Corduroy
is too heavy for the summer and too cold for winter, and canvas is
too stiff and noisy for the woods. Cotton khaki is excellent for the
summer, and all-wool khaki, or mackinaw, coat and trousers are
comfortable for winter. Wool is the best material for undergarments
in all seasons. Two sets of garments will be sufficient, as the
washing is done at night. Be sure to have the garments large enough
to allow for shrinkage. Light-weight cashmere is the best material for
socks during summer, and heavier weight for the winter. Three pairs
of ordinary-weight and one pair of heavy-weight will be sufficient. A
medium-weight gray-flannel overshirt, with breast pockets having
button flaps, is the woodsman’s choice. On short and light trips one
shirt will do. A light-weight, all-wool gray or brown, sweater is a good
thing to carry along, it is easily wetted through and a famous brier
catcher, yet most woodsmen carry one.
The regulation army poncho is more suited to the woods than a
rubber coat or oilskins. The larger-size poncho is more bulky to pack,
but may be used as a shelter by rigging it up with poles, lean-to
fashion. A poncho makes a good ground blanket also.
A medium wide-brimmed hat, in gray or brown, is better than a
cap. A gray, or brown, silk handkerchief should be included to wear
around the neck to protect it from the sun and cold. Only few novices
will carry one, but not so with the regular woodsman, The moccasin
is the only suitable footwear for the woods. The “puckaway,” with
extra sole, is known to most woodsmen. A pair of larrigans—ankle-
high moccasins with single sole—are suitable to wear about the
camp.
Each member of the party carries his own knapsack, or ditty bag,
in which such things as brush and comb, toothbrush, razor, towel,
medicines, stationery, etc., are kept. The extra clothing is carried in
its own canvas bag.
Each member of the party carries a pair of woolen blankets. Army
blankets in tan color are serviceable and inexpensive.
A good, tempered knife should be worn at the belt, preferably one
without a hilt and having a blade 5 or 6 in. long.
A small leather pouch containing a few common remedies, such
as quinine, laxative, etc.; and a small first-aid outfit should be
included in each camper’s personal pack. Also a small leather pouch
containing an assortment of needles, darning cotton, buttons, and a
length of heavy silk twist is a handy companion.
A few sheets of paper and as many envelopes, a notebook, pencil,
and a few postal cards, are usually carried, together with an almanac
page of the months covering the intended trip.
The compass is by far the most useful instrument in the woods,
but any reliable and inexpensive watch may be carried.
Many woodsmen carry a small hatchet at the belt, and on trips
when but the few necessities are carried the belt ax takes the place
of the heavier-weight tool. The tomahawk style gives two cutting
edges and is therefore the best tool to carry. A leather or other
covering case is needed to protect the blades.
A small tin box containing an assortment of rivets; tacks; a bit of
string; brass wire; a few nails; a couple of small files; a tool holder
with tools; a sheet of sandpaper; a bit of emery cloth, and any other
small articles which the sportsman fancies will come in handy, may
be carried. It is surprising how often this “what not” is resorted to
while in the woods.
The odds and ends of personal belongings, as a jackknife; pipe
and tobacco; map of the region visited; length of fishing line and
hook; a few loose matches; match box; purse; notebook and pencil;
handkerchief, etc., are, of course, carried in the pocket of the coat.
A Camper’s Salt-and-Pepper Holder

A camper will find a very clever way to carry salt and pepper by
using a piece cut from a joint of bamboo. A piece is selected with the
joint in the center, and the ends are stoppered with corks.
A Simple Self-Contained Motor
To say that the subject of this article is the simplest motor in the
world is not to overestimate it, for the apparatus is not only a motor
reduced to its essential elements, but combines within itself its own
source of electric power, all without the use of a single piece of wire.
The experiment is very interesting and instructive and will well repay
a careful construction along the lines indicated, even though not in
strict accordance with the dimensions given.
The first step is to procure a permanent magnet, about ³⁄₈ in. in
diameter and 6 in. long. If such a magnet cannot be conveniently
secured, a piece of tool steel with flat ends should be hardened by
heating it to a dull red and plunging it in water, and then strongly
magnetized. This may be readily accomplished by slipping a coil of
insulated wire over it through which the current from a storage
battery or set of primary cells is passed. If these are not at hand,
almost any electrical supply store will magnetize the steel.
A square base block with neatly beveled corners is now in order,
which is trimmed up squarely and a hole bored centrally through it to
receive the lower end of the magnet. Procure a neat spool and make
a hole in it large enough to pass over the magnet. Glue the spool to
the base after locating it in the exact center.
The outer and larger cylinder is of copper, or of brass,
copperplated on the inside. It is cup-shaped, with a hole in the
bottom just large enough to permit the magnet to be pushed through
with a close fit, to make a good electrical contact. The magnet may
be held in place by having it closely fit the spool and the copper
cylinder, and by soldering the heads of a couple of small tacks, or
nails, to its under side and driving them into the spool. Coat the
magnet with pitch, or paraffin, from the top down, and around its
connection with the bottom of the cylinder. The small thimble shown
at the top should be of brass or copper, and while one can be easily
formed of sheet metal and soldered, it is not improbable that one
could be made in seamless form from small article of commerce. In
the exact center of the under side of the top of this thimble, make a
good mark with a prickpunch, after which a small steel thumb tack
should be filed to a fine needle point and placed, point up, exactly
central on the upper end of the magnet, to which it is held with a little
wax. The smaller cylinder is simply a piece of sheet zinc bent into a
true cylinder of such a size that it may be sprung over the lower end
of the thimble. This done, it is only necessary to slip the zinc over the
end of the magnet until the thimble rests on the thumb tack, and then
pour some dilute muriatic or sulphuric acid into the outer cylinder
after which the thimble and attached zinc will begin to rotate. The
required strength of the acid and the resulting speed will depend
upon the nicety of suspension and the trueness of the rotating zinc
cylinder. The zinc will have to be changed, but the copper undergoes
no deterioration.
The Tricks of Camping Out
By STILLMAN TAYLOR

PART II—Cooking in the Woods

Cooking in the woods requires more of a knack than equipment, and


while a camp stove is well enough in a permanent camp, its
weight and bulk makes this article of camp furniture unsuited for
transportation by canoe. Patent cooking grates are less bulky, but
the woodsman can learn to do without them very nicely. However,
the important item which few woodsmen care to do without is the
folding baker, or reflector. The baker is folded flat and carried in a
canvas case, including baking pan and a kneading board. The
largest size, with an 18-in. square pan, weighs about 5 lb., and the
smallest, with an 8 by 12-in. pan in aluminum, only 2 lb. In use, the
reflector is placed with the open side close to the fire, and cooking is
accomplished evenly and well in any kind of weather. Bread, fish,
game, or meat are easily and perfectly cooked, and the smaller size
is amply large for a party of two or three.
A Cooking Range Fashioned from Two Green Logs Laid in a V-Shape with a
Few Stones Built Up at the Wide End over Which a Fire is Made of Hard-
Wood Sticks

The camp fire is one of the charms of the open, and if it is built
right and of the best kind of wood, cooking may be done over it as
well as over a forest range. Many woodsmen prefer to build a
second and smaller fire for cooking, and although I have never found
this necessary, excepting in large camps where a considerable
quantity of food must be prepared, the camper can suit himself, for
experimenting is, after all, a large part of the fun of living in and off
the woods.
A satisfactory outdoor cooking range may be fashioned by roughly
smoothing the top and bottom sides of two green logs, and placing
them about 6 in. apart at one end and about 2 ft. apart at the
opposite end. At the wide end a few stones are built up, and across
these, hickory, ash, and other sticks of hard wood are placed. The
reflector is placed close to the coals at this end, and the fire is built
between the logs, the broiling and frying being done at the narrow-
end opening. Woods that burn slowly when green should be used for
backlogs and end logs; chestnut, red oak, butternut, red maple, and
persimmon being best adapted for this purpose.
The hard woods are best for cooking and heating, since they burn
more slowly, and give out considerable heat and burn down to a
body of glowing coals. Soft woods are quick to catch fire, burn
rapidly, and make a hot fire, but burn down to dead ashes. Hickory is
by far the best firewood of the North, in that it makes a hot fire, is
long-burning, and forms a large body of coals that gives an even and
intense heat for a considerable length of time. Next to hickory comes
chestnut; the basket oaks, ironwood, dogwood, and ash are the
woodsman’s favorites. Among the woods that are easy to split are
the red oak, basket oak, white oak, ash, and white birch. Some few
woods split more easily when green than after seasoning, and
among them are hickory, dogwood, beech, sugar maple, birch, and
elm. The most stubborn woods to split are the elder, blue ash,
cherry, sour gum, hemlock, sweet gum, and sycamore. Of the softer
woods, the birches make the best fuel; black birch in particular
makes a fine camp fire, and it is one of the few woods that burns well
when green. The dry bark of the hemlock makes a quick and hot fire,
and white birch takes fire quickly even though moist. Driftwood is
good to start a fire with, and dry pine knots—the limb stubs of a dead
pine tree—are famous kindlers. Green wood will, of course, burn
better in winter when the sap is dormant, and trees found on high
ground make better fuel than those growing in moist bottom lands.
Hard woods are more plentiful on high ground, while the softer
woods are found in abundance along the margins of streams.
A Green Pole Placed in a Forked Stick Provides a Pot Hanger for a Noonday
Meal

For cooking the noonday meal a small fire will suffice to boil the
pot and furnish the heat sufficient to make a fry. Simply drive a
forked stick in the ground and lay a green stick in the fork with the
opposite end on the ground with a rock laid on it to keep it down, and
hang the pot on the projecting stub left for this purpose. A long stick
with projecting stubs, planted in the ground to slant over the fire at
an angle, will serve as well. Let the pot hang about 2 ft. from the
ground, collect an armful of dry twigs and plenty of larger kindling
sticks. Now shave three or four of the larger sticks and leave the
shavings on the ends, stand them up beneath the pot, tripod fashion,
and place the smaller sticks around them to build a miniature
wigwam. While the pot is boiling get a couple of bed chunks, or
andirons, 4 or 5 in. in diameter, set and level these on each side of
the fire, and put the frying pan on them. When the pot has boiled
there will be a nice bed of coals for frying that will not smoke the
meal.
When the woodsman makes “one-night stands,” he will invariably
build the fire and start the kettle boiling while he or a companion
stakes the tent, and as soon as the meal is prepared, a pot of water
is started boiling for dish washing.
For roasting and baking with the reflector, a rather high fire is
needed and a few sticks, a yard or more long, resting upright against
a backlog or rock, will throw the heat forward. When glowing coals
are wanted one can take them from the camp fire, or split uniform
billets of green, or dead, wood about 2 in. thick and pile them in the
form of a hollow square, or crib. The fire is built in the center of the
crib and more parallel sticks are laid on top until it is a foot or more
higher. The crib will act as a chimney, and a roaring fire will result,
which upon burning down will give a glowing mass of coals.
Camp cookery implies the preparation of the more simple and
nutritious foods, and in making up a list it is well to include only the
more staple foodstuffs, which are known to have these qualities.
Personal ideas are certain to differ greatly, but the following list may
be depended upon and will serve as a guide.

Provision List

This list of material will be sufficient for two persons on an outing


of two weeks. Carry in a stout canvas food bag 12 lb. of common
wheat flour. The self-raising kind is good, but the common flour is
better. It is well to bring a little yellow, or white, corn meal, about 6
lb., to be served as a johnny cake, hot, cold, or fried mush. It is fine
for rolling a fish in for frying. Rice is very nutritious, easily digested,
and easy to cook. It is good when boiled with raisins. When cold, it
can be fried in slices. About 3 lb. will be sufficient. Oatmeal is less
sustaining than rice, but it is good for porridge, or sliced when cold
and fried. Take along about 3 lb. About 2 lb. of the self-raising
buckwheat flour should be taken along, as it is the favorite for
flapjacks or griddle cakes. Beans are very nutritious, and about 2 lb.
of the common baking kind will be required, to boil or bake with the
salt pork. For soups, take 2 lb. of split peas. They can also be served
as a vegetable. Salt pork is a stand-by, and 5 lb. of it is provided and
carried in friction-top tins or a grease-proof bag. It should be
parboiled before adding to the beans or when fried like bacon. The
regulation meat of the wilderness is bacon, and 5 lb. of it is carried in
a tin or bag. Carry along 3 lb. of lard in a tin or bag, for bread-making
and frying. About 3 lb. of butter is carried in a friction-top tin. For
making rice puddings, take along 1 lb. of raisins. About 1 lb. of
shredded codfish is good for making fish balls. Other small articles,
such as ¹⁄₂ lb. of tea; 1 lb. of coffee; 3 lb. of granulated sugar; 1 pt. of
molasses; 1 pt. of vinegar; 4 cans of condensed milk; 1 can of milk
powder, a good substitute for fresh milk; 1 can egg powder, good for
making omelets or can be scrambled; 1 lb. salt; 2 oz. pepper; 1
package each of evaporated potatoes, onions, and fruits, and 3
packages of assorted soup tablets.
This list is by no means complete, but it will suffice for the average
person on the average trip, since the occasional addition of a fish or
game will help to replenish the stores. When going very light by
pack, only the most compact and nutritious foods should be
selected, while on short, easy trips the addition of canned goods will
supply a greater variety.

Woodcraft
A Limb Supported at an Angle over the Fire Is Another Means of Hanging the
Pot

While shooting and fishing and camping out are chapters in the
book of woodcraft, the word is generally defined to mean the knack
of using the compass, the map, and in making use of the natural
signs of the woods when traveling in the wilderness. If the camper
keeps to the beaten paths and does not stray far from the frequently
used waterways, he needs no compass, and sufficient knowledge of
the ways of the woods may be acquired from the previous articles,
but if the outer ventures into an unknown region the value of more
intimate knowledge increases as the distance to civilization
lengthens, because it will enable him to keep traveling in the desired
direction and prevent the “insane desire to circle,” should one
discover he has lost the trail.

The Emergency “Snack” and Kit


The woodsman well knows that it is an easy matter to stray farther
from camp than he intended to when starting out, and that it is a
common enough occurrence to lose one’s bearings and become
temporarily lost. To prepare for this possible emergency and spend a
comfortable night away from the camp, he carries in his pocket a
little packet of useful articles and stows away a tiny package
containing a small amount of nutritious food. When leaving camp for
a day’s hunting and fishing, the usual lunch is, of course, included,
but in addition to this, the woodsman should carry a couple of soup
tablets, a piece of summer sausage, and some tea. Wrap this in
oiled silk, and pack it in a flat tin box. It will take up very little room in
the pocket.
The emergency kit is merely a small leather pouch containing a
short fishing line; a few fishing hooks; 1 ft. of surgeon’s adhesive
plaster; needle and thread; a few safety pins, and a small coil of
copper or brass wire. These articles, with the gun and a few spare
cartridges, or rod; a belt knife; match safe; compass; map; a little
money, pipe, and tobacco, make up the personal outfit without which
few woodsmen care to venture far from camp. In addition to the
above, I carry a double-edge, light-weight ax, or tomahawk, in a
leather sheath at the belt and a tin cup strung to the back of the belt,
where it is out of the way and unnoticed until wanted.

The Compass

A small pocket compass affixed to a leather thong should be


carried in the breast pocket and fastened to a button of the shirt. An
instrument costing $1 will be accurate enough for all purposes. Many
of the woodsmen as well as the Indians do not use a compass, but
even the expert woodsman gets lost sometimes, and it may happen
that the sun is obscured by clouds, thus making it more difficult to
read the natural signs of the wilderness. The compass is of little
value if a person does not know how to use it. It will not tell in what
direction to go, but when the needle is allowed to swing freely on its
pivot the blue end always points to the magnetic north. The true
north lies a degree or more to either side. In the West, for instance,
the needle will be attracted a trifle to the east, while on the Atlantic
coast it will swing a trifle to the west of the true north. This magnetic
variation need not be taken into account by the woodsman, who may
consider it to point to the true north, for absolute accuracy is not
required for this purpose. However, I would advise the sportsman to
take the precaution of scratching on the back of the case these
letters, B = N, meaning blue equals north. If this is done, the novice
will be certain to remember and read the compass right no matter
how confused he may become on finding that he has lost his way.
The watch may be used as a compass on a clear day by pointing the
hour hand to the sun, when the point halfway between the hour hand
and 12 will be due south.

You might also like