You are on page 1of 9

Index

A columns
preprocessing 43, 44
accuracy 234 comprehension 19
AdaBoost 326 confusion matrix 234
anonymous functions cost function 269
creating, with lambda 34 counter
arange function about 5
about 60 reference link 5
URL 60 CountVectorizer class
arrays about 146
processing, from tabular data 39-42 reference link 146
axioms cross-validation iterators
about 186 reference link 313
URL 186 used, with L1 and L2 shrinkage 301-313
csv library
B URL 42
curse of dimensionality 152
Bagging
about 316, 317
leveraging 317-325
D
BaseEstimator data
URL 82 clustering, k-means method used 196-202
Boosting dimension, reducing with random
about 316, 325 projection 171-174
two-class classification problem 325-340 grouping 95-99
Bootstrapping 317 imputing 117-119
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

box-and-whisker plot preparing, for classification model 228-233


about 114 scaling 122-124
using 114-116 standardizing 125, 126
built-in morphy-function data model
reference link 139 URL 3
data preprocessing 86
C dataset
URL 77
classification
data, preparing 228-233
stochastic gradient descent, using 405-409

411

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
decision trees filters
advantages 256 using 36
building, for multiclass problems 255-265 function
disadvantages 256 behavior altering, with decorators 31-33
reference link 255 embedding, in another function 28
decorators passing, as parameter 29
used, for altering function behavior 31-33 passing, as variable 27
deque returning 30, 31
about 19 functools
URL 19 about 31
derivational patterns URL 31
reference link 138
dictionaries G
about 5
dictionary objects, using 2-5 generator
dictionary of dictionaries, using 6, 7 generating 24, 25
URL 4 Gradient Boosting
dimensionality reduction 152 about 341
distance measures demonstrating 343-354
calculating 187-189 reference link 354
URL 192 simple regression problem 341, 342
working with 186-191 Graphviz package
documents URL 265
classifying, Naïve Bayes used 242-254
dot plots H
using 95-99
heat maps
URL 104
E using 104-108
ensemble methods
Bagging 317-325 I
Boosting 325-340
information gain
Gradient Boosting 341
about 259
error rate 234
reference link 259
Exploratory Data Analysis (EDA) 86
instance-based learning 235
ExtraTreesClassifier class
inverse document frequencies
reference link 374
calculating 147-150
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

Extremely Randomized Trees


iterables
about 369
using 26, 27
implementing 369-375
iterator
generating 24, 25
F reference link 24
feature matrices using 22, 23
decomposing, Non-negative Matrix itertools
Factorization (NMF) used 175-183 about 51
feature test condition 257 URL 53
using 51-53

412

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
Itertools.dropwhile list
reference link 22 list comprehension, creating 19-21
izip sorting 45, 46
using 37-39 writing 15-18
loadtxt method
K about 42
reference link 42
kernel-based perceptron Local Outlier Factor (LOF)
URL 395 used, for discovering outliers 216-225
kernel methods
learning 192-196 M
linear kernel 196
polynomial kernel 196 machine learning
using 192-196 with scikit-learn 75-84
kernel PCA map function
using 160-166 using 35
key matplotlib
used, for sorting 46-51 about 55
k-fold cross-validation 301 plotting with 65-74
k-means method URL 74
cluster evaluation, measures 197 matrix decomposition 152
used, for data clustering 196-202 multiclass problems
K-Nearest Neighbor (KNN) 235 solving, with decision trees 255-265
multivariate data
L scatter plots, using 100-103

L1 shrinkage (LASSO) N
used, with regression 293-300
L2 shrinkage (ridge) Naïve Bayes
used, with regression 283-292 used, for classifying documents 242-254
lambda namedtuple
used, for creating anonymous functions 34 URL 12
Last In, First Out (LIFO) 18 nearest neighbors
Latent Semantic Analysis (LSA) obtaining 234-241
about 170 Non-negative Matrix Factorization (NMF)
reference link 170 reference link 175
lazy learner 235 used, for decomposing feature
Least absolute shrinkage and selection matrices 175-183
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

operator (LASSO) 293 NumPy libraries


least squares object, properties 59
reference link 270 using 55-64
lemmatization 138
linear kernel O
about 196
URL 196 online learning algorithm
perceptron, using 388-395

413

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
OrderedDict container Random Forest
about 4 about 358
URL 4 implementing 359-368
outliers RandomForestClassifier class
discovering, local outlier factor method about 364
used 216-225 reference link 364
finding, in univariate data 208-216 randomization 317
out-of-bag estimation (OOB) random projection
about 368 data dimension, reducing with 171-174
reference link 368 reference link 174
random sampling
P performing 120, 121
real-valued numbers
pairwise_distance method predicting, regression used 268-282
about 192 recursive feature selection 279
URL 192 regression
partial_fit method about 268
about 410 stochastic gradient descent, using 396-405
URL 410 used, for predicting real-valued
percentiles, NumPy numbers 268-282
reference link 92 with L1 shrinkage (LASSO) 293-300
perceptron with L2 shrinkage (ridge) 283-292
reference link 395 ridge regression 284
used, as online learning algorithm 388-395 Rotational Forest
polynomial kernel about 376
about 196 building 376-384
URL 196 rote classifier algorithm 235
polysemy 170
Principal Component Analysis (PCA) 153 S
principal components
extracting 153-159 scatter plots
priority queues used, for multivariate data 100-103
URL 220 scikit-learn
progressive sampling 122 machine learning with 75-84
pyplot URL 84
about 87 SciPy
reference link 87 URL, for documentation 45
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

URL 74 sent_tokenize function


Python NLTK library URL 128
URL 7 using 128
set builder notation
R reference link 22
sets
Radial Basis Function (RBF) kernel using 12-14
about 164 shrinkage 283
reference link 164

414

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
Singular Value Decomposition (SVD) TFIDF transformer
used, for extracting features 166-170 calculating 150
snowball stemmers tokenization
reference link 137 performing 127-130
sparse matrix representation tuples
reference link 143 about 7
standardization 124 creating 8-12
star convex-shaped data points manipulating 8-12
URL 202
stemming, words U
performing 135-137
stochastic gradient descent univariate data
used, for classification 405-409 analyzing, graphically 87-93
used, for regression 396-405 outliers, finding 208-216
stop words
removing 131-134 V
stratified sampling 121
vector quantization 202-208
summary statistics
performing 109-113
plotting 109-113
W
word lemmatization
T performing 138, 139
words
tabular data
stemming 135-137
arrays, processing 39-42
word_tokenize function
term frequencies
URL 129
calculating 147-150
using 129
Term Frequency Inverse Document Frequency
(TFIDF) 169
text
Z
representing, as bag of words 140-146 zip
text mining using 37-39
reference link 133
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

415

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
Thank you for buying
Python Data Science Cookbook

About Packt Publishing


Packt, pronounced 'packed', published its first book, Mastering phpMyAdmin for Effective MySQL
Management, in April 2004, and subsequently continued to specialize in publishing highly focused
books on specific technologies and solutions.
Our books and publications share the experiences of your fellow IT professionals in adapting and
customizing today's systems, applications, and frameworks. Our solution-based books give you the
knowledge and power to customize the software and technologies you're using to get the job done.
Packt books are more specific and less general than the IT books you have seen in the past. Our
unique business model allows us to bring you more focused information, giving you more of what
you need to know, and less of what you don't.
Packt is a modern yet unique publishing company that focuses on producing quality, cutting-edge
books for communities of developers, administrators, and newbies alike. For more information,
please visit our website at www.packtpub.com.

About Packt Open Source


In 2010, Packt launched two new brands, Packt Open Source and Packt Enterprise, in order to
continue its focus on specialization. This book is part of the Packt open source brand, home
to books published on software built around open source licenses, and offering information to
anybody from advanced developers to budding web designers. The Open Source brand also runs
Packt's open source Royalty Scheme, by which Packt gives a royalty to each open source project
about whose software a book is sold.
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

Writing for Packt


We welcome all inquiries from people who are interested in authoring. Book proposals should
be sent to author@packtpub.com. If your book idea is still at an early stage and you would
like to discuss it first before writing a formal book proposal, then please contact us; one of our
commissioning editors will get in touch with you.
We're not just looking for published authors; if you have strong technical skills but no writing
experience, our experienced editors can help you develop a writing career, or simply get some
additional reward for your expertise.

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
Python Data Science
Essentials
ISBN: 978-1-78528-042-9 Paperback: 258 pages

Become an efficient data science practitioner by


thoroughly understanding the key concepts of Python
1. Quickly get familiar with data science
using Python.

2. Save tons of time through this reference book with


all the essential tools illustrated and explained.

3. Create effective data science projects and avoid


common pitfalls with the help of examples and
hints dictated by experience.

Python Data Visualization


Cookbook
ISBN: 978-1-78216-336-7 Paperback: 280 pages

Over 60 recipes that will enable you to learn how to


create attractive visualizations using Python's most
popular libraries
1. Learn how to set up an optimal Python
environment for data visualization.

2. Understand the topics such as importing data for


visualization and formatting data for visualization.

3. Understand the underlying data and how to use


the right visualizations.
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

Please check www.PacktPub.com for information on our titles

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.
Practical Data Science
Cookbook
ISBN: 978-1-78398-024-6 Paperback: 396 pages

89 hands-on recipes to help you complete real-world


data science projects in R and Python
1. Learn about the data science pipeline and use it
to acquire, clean, analyze, and visualize data.

2. Understand critical concepts in data science in


the context of multiple projects.

3. Expand your numerical programming skills


through step-by-step code examples and learn
more about the robust features of R and Python.

Learning Python Data


Visualization
ISBN: 978-1-78355-333-4 Paperback: 212 pages

Master how to build dynamic HTML5-ready SVG charts


using Python and the pygal library
1. A practical guide that helps you break into the
world of data visualization with Python.

2. Understand the fundamentals of building charts


in Python.

3. Packed with easy-to-understand tutorials


for developers who are new to Python or charting
Copyright © 2015. Packt Publishing, Limited. All rights reserved.

in Python.

Please check www.PacktPub.com for information on our titles

Gopi, Subramanian. Python Data Science Cookbook, Packt Publishing, Limited, 2015. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/manchester/detail.action?docID=4191189.
Created from manchester on 2020-07-07 02:46:03.

You might also like