Data Preprocessing

Uploaded by

mavoho1719

0% found this document useful (0 votes)

3 views2 pages

Original Title

data_preprocessing

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

3 views2 pages

Data Preprocessing

Uploaded by

mavoho1719

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Chapter 1

Data Preprocessing

There are different techniques used in learning in order to improve the accuracy of the model
by preprocessing the data:

There are three common forms of data preprocessing a data matrix X, where we will assume
that X is of size [N × D] (N is the number of data, D is their dimensionality).

Figure 1.1: Preprocessing example 1

Figure 1.2: Preprocessing example 2

Mean subtraction Most common form of preprocessing. It involves subtracting the mean
across every individual feature in the data, and has the geometric interpretation of centring
the cloud of data around the origin along every dimension. In NumPy, this operation would
be implemented as: X -= np.mean(X, axis = 0). With images specifically, for convenience it

1
2 Chapter 1 Data Preprocessing

can be common to subtract a single value from all pixels (e.g. X -= np.mean(X)), or to do so
separately across the three color channels.

Normalization Refers to normalizing the data dimensions so that they are of approximately
the same scale. There are two common ways of achieving this normalization. One is to divide
each dimension by its standard deviation, once it has been zero-centered: (X /= np.std(X,
axis = 0)). Another form of this preprocessing normalizes each dimension so that the min and
max along the dimension is -1 and 1 respectively. It only makes sense to apply this preprocessing
if you have a reason to believe that different input features have different scales (or units), but
they should be of approximately equal importance to the learning algorithm. In case of images,
the relative scales of pixels are already approximately equal (and in range from 0 to 255), so it
is not strictly necessary to perform this additional preprocessing step.

PCA The reason why one would want to use PCA is if one expects that many of the features
are in fact dependent. This would be particularly handy for Naive Bayes where independence
is assumed. Most datasets are far too large to use PCA. Attention: PCA complexity is O(n3 ),
so more sophisticated methods are required. But if your dataset is small, and you don’t have
the time to investigate more sophisticated methods, then by all means go ahead and apply an
out-of-box PCA for feature selection.

Whitening Takes the data in the eigenbasis and divides every dimension by the eigenvalue
to normalize the scale. The geometric interpretation of this transformation is that if the input
data is a multi-variable Gaussian, then the whitened data will be a Gaussian with zero mean
and identity covariance matrix.

Deep Learning with images

For Deep Learning for images we will only use: center our data to zero. To do so, for each
pixel, compute its mean across all the dataset and subtract the resulting mean image to all the
training samples. If you have more than one channel (e.g. RGB) do it for each of the channels
separately.

Flight Plan 2028
Document63 pages
Flight Plan 2028
Azhia Kamlon
100% (1)
Neural Networks Study Notes
Document11 pages
Neural Networks Study Notes
pekalu
100% (2)
LV5-1510-20-UL-SLR 1MW/ GFDI/6input/Insul - Monitor: Verdrahtungshinweise Wiring Instructions
Document113 pages
LV5-1510-20-UL-SLR 1MW/ GFDI/6input/Insul - Monitor: Verdrahtungshinweise Wiring Instructions
edvaldo alves pinto
No ratings yet
Error and Uncertainty: General Statistical Principles
Document8 pages
Error and Uncertainty: General Statistical Principles
déborah_rosales
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Coincent - Data Science With Python Assignment
Document23 pages
Coincent - Data Science With Python Assignment
Sai Nikhil Nellore
100% (2)
Time Series Prediction With Recurrent Neural Networks
Document7 pages
Time Series Prediction With Recurrent Neural Networks
ghoshayan1003
No ratings yet
05GENREPX0015 Rev0
Document38 pages
05GENREPX0015 Rev0
anub0025
100% (1)
Pca
Document19 pages
Pca
HJ Consultants
No ratings yet
PRECOMMISSIONING and COMMISSIONING METHOD STATEMENT FOR FIRE HOSE RACKS and FIRE HOSE REEL
Document2 pages
PRECOMMISSIONING and COMMISSIONING METHOD STATEMENT FOR FIRE HOSE RACKS and FIRE HOSE REEL
Humaid Shaikh
No ratings yet
Wiley - Roitt's Essential Immunology, 12th Edition - 978!1!118-23287-3
Document3 pages
Wiley - Roitt's Essential Immunology, 12th Edition - 978!1!118-23287-3
Muhammad Rameez
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 6
Document17 pages
CS231n Convolutional Neural Networks For Visual Recognition 6
Ali Rahimi
No ratings yet
Report On Prediction Model
Document17 pages
Report On Prediction Model
Victor Imeh
No ratings yet
New Bridges Between Deep Learning and Partial Differential Equations
Document5 pages
New Bridges Between Deep Learning and Partial Differential Equations
Aman Jalan
No ratings yet
Lab DigitRecognitionMINST
Document10 pages
Lab DigitRecognitionMINST
techoverlord.contact
No ratings yet
Score-Based Generative Modeling
Document31 pages
Score-Based Generative Modeling
Giovanni Carbinatti
No ratings yet
ML Lab - Sukanya Raja
Document23 pages
ML Lab - Sukanya Raja
Rick Mitra
No ratings yet
Assignment#3 AI
Document5 pages
Assignment#3 AI
Taimoor ali
No ratings yet
Dimensional Reduction in R
Document24 pages
Dimensional Reduction in R
Shil Shambharkar
No ratings yet
Interview Preparing - ML Draft
Document12 pages
Interview Preparing - ML Draft
الريس حمادة
No ratings yet
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
Document11 pages
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
Hanif Awalludin
No ratings yet
It-3031 (DMDW) - CS End Nov 2023
Document23 pages
It-3031 (DMDW) - CS End Nov 2023
21051796
No ratings yet
Manifold Learning Algorithms
Document17 pages
Manifold Learning Algorithms
shatakirti
No ratings yet
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
Document12 pages
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
juan
No ratings yet
Preprocessing Stage
Document4 pages
Preprocessing Stage
Al Busaidi
No ratings yet
Decision Tree
Document18 pages
Decision Tree
Mo Shah
No ratings yet
Bayesian Dark Knowledge: N N I 1 I I N I I N I 1 I D I N N
Document9 pages
Bayesian Dark Knowledge: N N I 1 I I N I I N I 1 I D I N N
mehdaoui fatima ezzahra
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
Document23 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
Kelbie Davidson
No ratings yet
Statistical Learning Master Reports
Document11 pages
Statistical Learning Master Reports
Valentina Mendoza Zamora
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
Document5 pages
DSCI 303: Machine Learning For Data Science Fall 2020
Anonymous Student
No ratings yet
CSE176 Introduction To Machine Learning
Document3 pages
CSE176 Introduction To Machine Learning
ravigobi
No ratings yet
Nonparametric Shape Priors For Active Contour-Based Image Segmentation
Document4 pages
Nonparametric Shape Priors For Active Contour-Based Image Segmentation
Mathi Rengasamy
No ratings yet
Cerra SCC10 Final
Document6 pages
Cerra SCC10 Final
jayashreerao
No ratings yet
Hierarchical Multi Escala
Document11 pages
Hierarchical Multi Escala
Fabian Guisao
No ratings yet
A Simplified Generative Model Based On Gradient Descent and Mean Square Error
Document8 pages
A Simplified Generative Model Based On Gradient Descent and Mean Square Error
Omar Lopez-Rincon
No ratings yet
Support Vector Machine Classification For Large Data Sets Via Minimum Enclosing Ball Clustering
Document9 pages
Support Vector Machine Classification For Large Data Sets Via Minimum Enclosing Ball Clustering
viju001
No ratings yet
EM Algorithms For PCA and SPCA
Document7 pages
EM Algorithms For PCA and SPCA
Mohamad Nasr
No ratings yet
Content-Based Image Retrieval Tutorial
Document16 pages
Content-Based Image Retrieval Tutorial
wenhao zhang
No ratings yet
ISOMAP in ML
Document12 pages
ISOMAP in ML
Vishwa Muthukumar
No ratings yet
Data Mining: A Preprocessing Engine
Document5 pages
Data Mining: A Preprocessing Engine
Manon573
No ratings yet
Top 10 Machine Learning Algorithms With Their Use
Document12 pages
Top 10 Machine Learning Algorithms With Their Use
irma komariah
No ratings yet
3191 Random Projections For Manifold Learning
Document8 pages
3191 Random Projections For Manifold Learning
s_saraf
No ratings yet
Iterative Methods For Image Deblurring: A Matlab Object-Oriented Approach
Document21 pages
Iterative Methods For Image Deblurring: A Matlab Object-Oriented Approach
Navdeep Goel
No ratings yet
Overviewofthresholding
Document5 pages
Overviewofthresholding
Sanjay Shah
No ratings yet
Co-2 ML 2019
Document71 pages
Co-2 ML 2019
UrsTruly Anirudh
No ratings yet
Re Sizable Arrays
Document12 pages
Re Sizable Arrays
Seraphina Nix
No ratings yet
StatsLecture1 Probability
Document4 pages
StatsLecture1 Probability
choi7
No ratings yet
ISOMAP
Document11 pages
ISOMAP
Vishwa Muthukumar
100% (1)
3.2 Preprocessing
Document10 pages
3.2 Preprocessing
ALNATRON GROUPS
No ratings yet
Ipmv Mod 5&6 (Theory Questions)
Document11 pages
Ipmv Mod 5&6 (Theory Questions)
Ashwin A
No ratings yet
First Report
Document14 pages
First Report
dnthrtm3
No ratings yet
MIA Presentation
Document8 pages
MIA Presentation
ahsan puri
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
Document28 pages
Lecture 2: Basics and Definitions: Networks As Data Models
shardapatel
No ratings yet
Wavelet Transform: A Recent Image Compression Algorithm: Shivani Pandey, A.G. Rao
Document4 pages
Wavelet Transform: A Recent Image Compression Algorithm: Shivani Pandey, A.G. Rao
IJERD
No ratings yet
Pca PDF
Document6 pages
Pca PDF
Rohan Bansal
No ratings yet
Parameter Estimation
Document24 pages
Parameter Estimation
Mina Arya
100% (1)
Character Recognition Using Neural Networks: Rókus Arnold, Póth Miklós
Document4 pages
Character Recognition Using Neural Networks: Rókus Arnold, Póth Miklós
Sharath Jagannathan
No ratings yet
An Initial Seed Selection Algorithm
Document11 pages
An Initial Seed Selection Algorithm
hamzarash090
No ratings yet
Pca Paras Joshi
Document2 pages
Pca Paras Joshi
anirudh chaudhary
No ratings yet
Understanding Principal Component PDF
Document10 pages
Understanding Principal Component PDF
Francisco Jácome Sarmento
No ratings yet
1 s2.0 S1877050923018549 Main
Document5 pages
1 s2.0 S1877050923018549 Main
jefferyleclerc
No ratings yet
ML Practical File
Document43 pages
ML Practical File
Pankaj Singh
100% (1)
Satellite Image Segmentation With Convolutional Neural Networks (CNN)
Document4 pages
Satellite Image Segmentation With Convolutional Neural Networks (CNN)
Mattia Martinelli
100% (1)
Normalization A Preprocessing Stage
Document5 pages
Normalization A Preprocessing Stage
Tanzeel Hassan
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Part Learning
Document2 pages
Part Learning
mavoho1719
No ratings yet
Activation F
Document4 pages
Activation F
mavoho1719
No ratings yet
SimplifyingCFGs Examples
Document3 pages
SimplifyingCFGs Examples
mavoho1719
No ratings yet
Substitution Example
Document1 page
Substitution Example
mavoho1719
No ratings yet
PERT7 ACT1 MarcellinoDanis 13119534
Document12 pages
PERT7 ACT1 MarcellinoDanis 13119534
zekri fitra ramadhan
No ratings yet
2020 04 29T19 24 03 - R3dlog
Document9 pages
2020 04 29T19 24 03 - R3dlog
ConfusTrefle
No ratings yet
3D Breakout: Masaryk University
Document21 pages
3D Breakout: Masaryk University
SLnger
No ratings yet
Instructions For Taylor & Francis Author Template Microsoft Word Mac 2008
Document3 pages
Instructions For Taylor & Francis Author Template Microsoft Word Mac 2008
samuel mech
No ratings yet
Annexure Step by Step Process Involved in The Ph.D. Semester Registration
Document1 page
Annexure Step by Step Process Involved in The Ph.D. Semester Registration
Anu Graphics
No ratings yet
Roll No.: 2024000010 Enrolment No.: UU2024000010
Document1 page
Roll No.: 2024000010 Enrolment No.: UU2024000010
ijaz
No ratings yet
Porsche 996TT Order Guide 2002
Document20 pages
Porsche 996TT Order Guide 2002
bgilot
No ratings yet
Caterham Buildpage
Document4 pages
Caterham Buildpage
rat_520
No ratings yet
KORANDO Data List Motor
Document48 pages
KORANDO Data List Motor
crisprusch1gmailcom
No ratings yet
Configuring Network Connectivity
Document26 pages
Configuring Network Connectivity
angelito
No ratings yet
Automotive Industry Job Description: Job Description of Ceo in 3 Different Sectors Ceo
Document15 pages
Automotive Industry Job Description: Job Description of Ceo in 3 Different Sectors Ceo
Dương Thanh Huyền
No ratings yet
Template-CV Organize
Document2 pages
Template-CV Organize
eminie
No ratings yet
Dot Map
Document7 pages
Dot Map
Rohamat Mandal
No ratings yet
F - Cpu: #Define #Include #Include #Include
Document8 pages
F - Cpu: #Define #Include #Include #Include
Phenias Manyasha
No ratings yet
Titan Installation Guide v1 - 0
Document12 pages
Titan Installation Guide v1 - 0
Alexandre
No ratings yet
QW-QAL-626 - (Rev-00) - Prod. and QC Process Flow Chart - Ventura Motor No. 13 (35804)
Document6 pages
QW-QAL-626 - (Rev-00) - Prod. and QC Process Flow Chart - Ventura Motor No. 13 (35804)
Toso Batam
No ratings yet
Gmail - FW - Receipt For Your E-Ticket Request 14391559 - DINITHI - ALAGIYADURAGEMS, PNR# KJGLYJ, CAT A SUBLOAD
Document3 pages
Gmail - FW - Receipt For Your E-Ticket Request 14391559 - DINITHI - ALAGIYADURAGEMS, PNR# KJGLYJ, CAT A SUBLOAD
Danidu Kaveen
No ratings yet
Huwaluldi
Document30 pages
Huwaluldi
Nik Mohd Fikri
No ratings yet
Destination B2 Unit 8 2
Document8 pages
Destination B2 Unit 8 2
Anna Kharatishvili
No ratings yet
Focus4 2E Review Test 2 Units1 4 Writing ANSWERS
Document2 pages
Focus4 2E Review Test 2 Units1 4 Writing ANSWERS
Антон Тімофєєв
No ratings yet
NetWorker 9.0.x Server Disaster Recovery and Availability Best Practices Guide
Document166 pages
NetWorker 9.0.x Server Disaster Recovery and Availability Best Practices Guide
Karol N
No ratings yet
Tu 05
Document23 pages
Tu 05
Yang ElvisQU
No ratings yet
ENSC3003 Asst2 S1 13
Document5 pages
ENSC3003 Asst2 S1 13
louis_parker_5553
No ratings yet
Performance Simulation of FTTH Devices PDF
Document10 pages
Performance Simulation of FTTH Devices PDF
Juan Carlos
No ratings yet
Budget Breakdown
Document2 pages
Budget Breakdown
wondimu
No ratings yet