Professional Documents
Culture Documents
Data Analytics
No part of this publication may be reproduced or distributed in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise or stored in a database or retrieval system without the prior written permission of the publishers.
The program listings (if any) may be entered, stored and executed in a computer system, but they may not be reproduced for
publication.
Print Edition
ISBN-13: 978-93-5260-418-0
ISBN-10: 93-5260-418-0
EBook Edition
ISBN-13: 978-93-5260-419-7
ISBN-10: 93-5260-419-9
Asst. General Manager—Product Management (Higher Education and Professional): Shalini Jha
Product Manager: Kartik Arora
Information contained in this work has been obtained by the McGraw Hill Education (India), from sources believed to
be reliable. However, neither author nor McGraw Hill Education (India) guarantee the accuracy or completeness of any
information published herein, and neither McGraw Hill Education (India) nor its authors shall be responsible for any errors,
omissions, or damages arising out of use of this information. This work is published with the understanding that McGraw Hill
Education (India) and its authors are supplying information but are not attempting to render engineering or other professional
services. If such services are required, the assistance of an appropriate professional should be sought.
Typeset at The Composers, 260, C.A. Apartment, Paschim Vihar, New Delhi 110 063 and printed at
The book has been developed from author’s own class notes. It reflects his two
decades of global IT industry experience, as well as more than a decade of aca-
demic experience. The chapters are organized for a typical one-semester graduate
course. The book contains caselets from real world stories at the beginning of
each chapter. There is a running case study across the chapter exercises. There
are also review questions at the end of each chapter.
The book can be easily read in a short period by anyone who wants to
understand data based decision-making for their business or other or-
ganizations, but has no expertise with software tools. The text is almost
entirely devoid of complex jargon or programming code.
WEB SUPPLEMENTS
For Instructors
For Students
I am grateful to the following reviewers who took out time and gave their valu-
able suggestions on various chapters of the book
Pratosh Bansal
Professor, DAVV, Indore, Madhya Pradesh
Publisher’s Note
Constructive suggestions and criticism always go a long way in enhancing and
endeavor. We request all our readers to email us their valuable comments/
views/feedback for the betterment of the book at info.india@mheducation.
com, mentioning the title and author name in the subject line. Also, please
feel free to report any piracy of the book if spotted by you.
Contents
Preface vii
Acknowledgements xi
SECTION 1
2. Business Intelligence Concepts and Application 21
Introduction 21
BI for Better Decisions 23
Decision Types 23
BI Tools 24
BI Skills 26
BI Applications 26
Conclusion 33
Review Questions 33
True/False 34
3. Data Warehousing 35
Introduction 35
Design Considerations for DW 36
DW Development Approaches 37
DW Architecture 38
Conclusion 41
Review Questions 41
True/False 42
xiv Contents
4. Data Mining 43
Introduction 43
Gathering and Selecting Data 45
Data Cleansing and Preparation 46
Outputs of Data Mining 47
Evaluating Data Mining Results 48
Data Mining Techniques 49
Data Mining Best Practices 53
Myths about Data Mining 55
Data Mining Mistakes 56
Conclusion 57
Review Questions 57
True/False 58
5. Data Visualization 59
Introduction 59
Excellence in Visualization 60
Types of Charts 62
Visualization Example 65
Visualization Example Phase-2 69
Tips for Data Visualization 69
Conclusion 70
Review Questions 70
True/False 71
SECTION 2
6. Decision Trees 75
Introduction 75
Decision Tree Problem 76
Decision Tree Construction 78
Lessons from Constructing Trees 82
Decision Tree Algorithms 84
Conclusion 86
Review Questions 86
True/False 87
Contents xv
7. Regression 89
Introduction 89
Correlations and Relationships 90
Visual Look at Relationships 91
Non-linear Regression Exercise 96
Logistic Regression 98
Advantages and Disadvantages of Regression Models 98
Conclusion 100
Review Questions 100
True/False 100
SECTION 3
11. Text Mining 135
Introduction 135
Text Mining Applications 136
Text Mining Process 138
Term Document Matrix 138
Mining the TDM 140
Comparing Text Mining and Data Mining 142
Text Mining Best Practices 143
Conclusion 143
Review Questions 144
True/False 144
SECTION 4
16. Big Data 177
Introduction 177
Defining Big Data 178
Big Data Landscape 180
Business Implications of Big Data 181
Technology Implications of Big Data 182
Big Data Technologies 182
xviii Contents
Index 248
Wholeness of Data
1 Analytics
Learning Objectives
■ Understand Business Intelligence and Data Mining Cycle
■ Learn about the tools and purpose of Business Intelligence
■ Discover what are patterns, their types, and the process of discovering patterns
■ Understand the data processing chain
■ Learn in brief about the components of the data processing chain
■ Process a simple example dataset through complete data processing chain
■ Learn about datafication and various types of data
■ Learn in brief about key terms and Data Science careers
INTRODUCTION
Business is the act of doing something productive to serve someone’s needs and
thus earn a living and make the world a better place. Business activities are
recorded on paper or using electronic media, and then these records become data.
There is more data from customers’ responses and on the industry as a whole.
All this data can be analyzed and mined using special tools and techniques to
generate patterns and intelligence, which reflect how the business is function-
ing. These ideas can then be fed back into the business so that it can evolve to
become more effective and efficient in serving customer needs; and the cycle goes
on (Figure 1.1).
Business Intelligence
Data Mining
FIGURE 1.1 Business Intelligence and Data Mining (BIDM) Cycle
Another random document with
no related content on Scribd:
is less likely to be localized, and, on the whole, it is not so severe as
the terrible torture of the neoplasm. Irregular but very decided febrile
phenomena are more likely to be present in meningitis than in tumor.
Like brain tumor, tubercular meningitis of the convexity may give
psychical disturbances, palsies, local spasms, general convulsions,
sensory disturbances, peculiar disorders of the special senses, etc.;
but these symptoms in the former usually come on more irregularly
and are accompanied less frequently with paroxysmal exacerbations
of headache, vomiting, vertigo, etc. Tubercular meningitis of the base
can be more readily distinguished from cases of tumor by the fact
that one cranial nerve after another is likely to become involved in
the diffusing inflammatory process. Tubercular meningitis is of
shorter duration than the majority of cases of brain tumor, and in it
delirium and mental confusion come on more frequently and earlier.
A history and physical evidences of more or less generalized
tuberculosis favor the diagnosis of tubercular meningitis. In both
affections the ophthalmoscope may reveal choked disc or
descending neuritis. It will be seen that the differentiation between
the affections is not always very clear, although in some cases the
decision may be quickly reached from a study of the points here
suggested.
Tumors of the motor ganglia of the brain are seldom strictly localized
to one or the other of these bodies. Growths occurring in this region
usually involve one or more of the ganglia and adjacent tracts, and
can only be localized by a process of careful exclusion, assisted
perhaps by a few special symptoms. Paralysis or paresis on the side
opposite to the lesion usually occurs in cases of tumor of either the
caudate nucleus or lenticular nucleus; but whether this symptom is
due to the destruction of the ganglia themselves, or to destruction of
or pressure upon the adjoining capsule, has not yet been clearly
determined. In a case of long-standing osteoma of the left corpus
striatum (Case 49) the patient exhibited the appearance of an
atrophic hemiplegia: his arm and leg, which had been contractured
since childhood, were atrophied and shortened, marked bone-
changes having occurred. Another case showed only paresis of the
face of the opposite side. Clonic spasms were present in two cases,
in one being chiefly confined to the upper extremities of the face. In
this case paralysis was absent. Disturbances of intellect and speech
have been observed in tumors of this region. According to
Rosenthal, aphasic disturbances of speech must be due to lesions of
those fibres which enter the lenticular nucleus from the cortex of the
island of Reil.
This deviation, both of head and eyes, occurs, however, not only
from lesions of the pons and cerebellar peduncles, but also from
disease or injury of various parts of the cerebrum—of the cortex,
centrum ovale, ganglia, capsules, and cerebral peduncles. It is
always a matter of interest, and sometimes of importance, with
reference especially to prognosis, to determine what is the probable
seat of lesion as indicated by the deviation and rotation.
During the life of the patient it was a question whether the case was
not one of oculo-motor monoplegia or monospasm from lesion of
cortical centres. It is probable, as Hughlings-Jackson believes, that
ocular and indeed all other movements are in some way represented
in the cerebral convolutions. In the British Medical Journal for June
2, 1877, Jackson discusses the subject of disorders of ocular
movements from disease of nerve-centres. The right corpus striatum
is damaged, left hemiplegia results, and the eyes and head often
turn to the right for some hours or days. The healthy nervous
arrangement for this lateral movement has been likened by Foville to
the arrangement of reins for driving two horses. What occurs in
lateral deviation is analogous to dropping one rein; the other pulls
the heads of both horses to one side. The lateral deviation shows,
according to Jackson, that after the nerve-fibres of the ocular nerve-
trunks have entered the central nervous system they are probably
redistributed into several centres. The nerve-fibres of the ocular
muscles are rearranged in each cerebral hemisphere in complete
ways for particular movements of both eyeballs. There is no such
thing as paralysis of the muscles supplied by the third nerve or sixth
nerve from disease above the crus cerebri, but the movement for
turning the two eyes is represented still higher than the corpus
striatum.
Tumors anywhere in the middle portion of the base of the brain and
floor of the skull, the region of the origin of the various cranial
nerves, can of course be diagnosticated with comparative ease by a
study of the various forms of paralysis and spasms in the distribution
of these nerves, in connection with other special and general
symptoms. Varieties of alternate hemiplegia are to be looked for, and
also isolated or associated palsies of the oculo-motor, pathetic,
facial, trigeminal, and other cranial nerves. In studying these palsies
it must be borne in mind that although the lesions producing them
are intracranial, the paralyses themselves are peripheral.
In most cases apparent exceptions to the ordinary rules as to
localization are capable of easy explanation; thus, for instance, in a
case of tumor of the occipital lobe (Case 44) numbness and pain
were present in the right arm, although the tumor was situated in the
right hemisphere. The tumor was of considerable size, and may
have affected by pressure the adjoining sensory tracts.