Welcome to Scribd!

Programming Presentation

Uploaded by

0% found this document useful (0 votes)

5 views8 pages

Data transformation is the process of changing the format, structure, or values of data into another format. It typically involves converting raw data into a cleaned, validated, and ready-to-use format. Data transformation is crucial for data management processes like integration, migration, warehousing, and preparation. Examples include converting speech to text, word files to PDFs, and raw data into an accessible form.

Original Description:

Original Title

Programming presentation

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views8 pages

Programming Presentation

Uploaded by

SARMAD ALI

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 8

Search inside document

“Data transformation

Data is the process of changing

the format, structure, or
Transformation values of data. Such as a
database file, XML
document or excel spread-
sheet, into another.”
Transformations typically involve converting a raw data source into
a cleaned, validated and ready-to-use format. Data transformation is
crucial to data management processes that include data integration,
data migration, data warehousing and data preparation.
Example:
 Conversion of speech or an audio into text by using an appropriate
software.
 Conversion of a word file into PDF file.
 Conversion of raw data into usable and accessible form.
Data Modeling
“Data modeling is the process of creating a visual
representation of either a whole information system or parts
of it to communicate connections between data points and
structures.”
Levels of Data Modeling:
 A physical model is a schema or framework for how data is
physically stored in a database.
 A conceptual model identifies the high-level, user view of
data.
 A logical data model sits between the physical and conceptual levels
and allows for the logical representation of data to be separate from
its physical storage.
Examples:
 Employee management system of accompany or organization.
 Online Shopping App
 Simple Library System
 Hotel Reservation System
Data Cleaning
Data cleaning, data cleansing, or data scrubbing is the act
of first identifying any issues or bad data, then systematically
correcting these issues. If the data is unfixable, we will need to
remove the bad elements to properly clean our data.
Steps those taken for data cleaning are known as data
cleaning techniques
Data Cleaning Techniques
1. Remove Duplicates:
When we collect your data from a range of different places, or scrape
our data, it’s likely that we will have duplicated entries. These duplicates
could originate from human error where the person inputting the data or
filling out a form made a mistake. Duplicates will inevitably skew our data
and/or confuse our results. They can also just make the data hard to read
when we want to visualize it, so it’s best to remove them right away.
2. Remove Irrelevant Data:
Irrelevant data will slow down and confuse any analysis that we want
to do. So, deciphering what is relevant and what is not is necessary before
we begin your data cleaning. For Example, if we are analyzing the age
range of our customers, we don’t need to include their email addresses.

3. Standardize Capitalization:
Within our data, we need to make sure that the text is consistent. If we
have a mixture of capitalization, this could lead to different erroneous
categories being created. It could also cause problems when we need to
translate before processing as capitalization can change the meaning. For
Example, Bill is a person's name whereas a bill or to bill is something else
entirely.
4. Convert Data Types:
Numbers are the most common data type that we will need to convert
when cleaning our data. Often numbers are imputed as text, however, in order
to be processed, they need to appear as numerals. For example, if we have
an entry that reads September 24th 2021, we’ll need to change that to read
09/24/2021.
5. Clear Formatting
Machine learning models can’t process our information if it is heavily
formatted. If we are taking data from a range of sources, it’s likely that there
are a number of different document formats. This can make our data
confusing and incorrect. We should remove any kind of formatting that has
been applied to your documents, so we can start from zero. This is normally
not a difficult process, both excel and google sheets, for example, have a
simple standardization function to do this.

(Book) The Art of Software Architecture Design Methods and Techniques PDF
Document286 pages
(Book) The Art of Software Architecture Design Methods and Techniques PDF
Jack Liu
No ratings yet
BPM Bok
Document17 pages
BPM Bok
Tigre Amazonico
No ratings yet
What Is Data Preprocessing?
Document4 pages
What Is Data Preprocessing?
Gagandeep Singh
No ratings yet
5 - Explore Concepts of Data Analytics
Document16 pages
5 - Explore Concepts of Data Analytics
GustavoLadino
No ratings yet
Lecture 5 - Data Transformation
Document7 pages
Lecture 5 - Data Transformation
Avneet Singh
No ratings yet
Math211101020
Document12 pages
Math211101020
Saba Shaheen
No ratings yet
Data and Its Types
Document40 pages
Data and Its Types
Abhishek Upadhyay
No ratings yet
Chapter Two Overview For Data Science: 2.2 What Is Data and Information What Is Data?
Document7 pages
Chapter Two Overview For Data Science: 2.2 What Is Data and Information What Is Data?
Hawi Berhanu
No ratings yet
LESSON 1 Information Management
Document5 pages
LESSON 1 Information Management
venusdorado
No ratings yet
What Is Data Modelling
Document12 pages
What Is Data Modelling
ravikumar lanka
No ratings yet
Unit - 1 Bca Iii Sem
Document20 pages
Unit - 1 Bca Iii Sem
22. Saurabh Pal
No ratings yet
CHAPTER 2 Emerging
Document8 pages
CHAPTER 2 Emerging
Jiru Alemayehu
No ratings yet
CMP 312 - 5
Document4 pages
CMP 312 - 5
vyktoria
No ratings yet
In-Memory Processing
Document7 pages
In-Memory Processing
ayodele.matthew.oluremi
No ratings yet
Data Structures: Prepared and Compiled by S. M. KHALID
Document13 pages
Data Structures: Prepared and Compiled by S. M. KHALID
Usama Hashmi
No ratings yet
Chapter 11 Database
Document28 pages
Chapter 11 Database
tayyabashahhx
No ratings yet
Data Modeling
Document8 pages
Data Modeling
Shibly
No ratings yet
DBMS 2
Document7 pages
DBMS 2
Sha Mat
No ratings yet
Data Processing, Data Transformation and Data Analysis
Document31 pages
Data Processing, Data Transformation and Data Analysis
Babar Khan
No ratings yet
DMDW 03
Document25 pages
DMDW 03
Harsh Nag
No ratings yet
U1
Document30 pages
U1
laket64875
No ratings yet
Reading Material Mod 4 Data Integration - Data Warehouse
Document33 pages
Reading Material Mod 4 Data Integration - Data Warehouse
Prasad Tikam
No ratings yet
Database Design and Data Minging
Document33 pages
Database Design and Data Minging
Surendra Pokhrel
No ratings yet
Ict Its4 04 0811 Monitor and Support Data Conversion
Document5 pages
Ict Its4 04 0811 Monitor and Support Data Conversion
api-303095570
57% (7)
Report On Best Practices of Data Handling, Presentation and Formatting
Document5 pages
Report On Best Practices of Data Handling, Presentation and Formatting
Atimango jackline
No ratings yet
Da - 100 - 2MT01 - Act#1 - Bernas, Marc Carlson F.
Document3 pages
Da - 100 - 2MT01 - Act#1 - Bernas, Marc Carlson F.
MARC CARLSON BERNAS
No ratings yet
CMP 214
Document66 pages
CMP 214
Ezekiel James
No ratings yet
Organizing Data and Information
Document11 pages
Organizing Data and Information
Aditya Singh
No ratings yet
Data Cleaning Research Paper
Document5 pages
Data Cleaning Research Paper
zeiqxsbnd
100% (1)
Database Systems 1
Document35 pages
Database Systems 1
Katelyn Rellita
No ratings yet
Chapter - 2 - Data Science
Document32 pages
Chapter - 2 - Data Science
aschalew woldeyesus
No ratings yet
Data Quality and Data Preproccessing
Document4 pages
Data Quality and Data Preproccessing
Aishwarya Jagtap
No ratings yet
Chapter 2 - Intro To Data Sciences
Document41 pages
Chapter 2 - Intro To Data Sciences
tewobestaalemayehu
No ratings yet
Chapter 2-Data Science 2.1.1. What Are Data and Information?
Document12 pages
Chapter 2-Data Science 2.1.1. What Are Data and Information?
CherianXavier
No ratings yet
Chapter 4
Document20 pages
Chapter 4
You
No ratings yet
Knowledge Discovery in Databases
Document17 pages
Knowledge Discovery in Databases
Sarvesh Dharme
No ratings yet
DWM Notes
Document27 pages
DWM Notes
pankajji.090969
No ratings yet
Monitor and Support Data Conversion
Document5 pages
Monitor and Support Data Conversion
Anwar Seid
No ratings yet
DOC-20231118-WA0008new Unit 3
Document15 pages
DOC-20231118-WA0008new Unit 3
facoj84692
No ratings yet
Data Migration Strategies
Document6 pages
Data Migration Strategies
ypravi
No ratings yet
Data MGMT - Some Points
Document9 pages
Data MGMT - Some Points
phaltu_
No ratings yet
Chapter-2 DATA WAREHOUSE PDF
Document28 pages
Chapter-2 DATA WAREHOUSE PDF
Ramesh K
100% (1)
Manual Approach: Chapter One
Document10 pages
Manual Approach: Chapter One
Black Panda
No ratings yet
Technology Terms
Document14 pages
Technology Terms
Wickedadonis
No ratings yet
Semantic Networks Standardisation
Document67 pages
Semantic Networks Standardisation
eumine
No ratings yet
Database 1
Document15 pages
Database 1
Redemta Tanui
No ratings yet
Lesson2 Database Models
Document15 pages
Lesson2 Database Models
Samuel Ndirangu
No ratings yet
Dbms Imp Questions
Document5 pages
Dbms Imp Questions
HANWANT SINGH
No ratings yet
DWH Notes
Document30 pages
DWH Notes
subhabirajdar
No ratings yet
Research Methodology (Data Analysis)
Document7 pages
Research Methodology (Data Analysis)
Masood Shaikh
No ratings yet
Dbms
Document5 pages
Dbms
bestasever
No ratings yet
Lecture 1.2.3
Document8 pages
Lecture 1.2.3
Ashutosh mishra
No ratings yet
Paper 2 Database-Chapter A.1
Document9 pages
Paper 2 Database-Chapter A.1
suparn saluja
No ratings yet
Data Governance Book
Document11 pages
Data Governance Book
Abhishek Prasad
No ratings yet
Data Modelling
Document6 pages
Data Modelling
almightyfavourite
No ratings yet
Big Data and Data Science
Document6 pages
Big Data and Data Science
Aishwarya Jagtap
No ratings yet
File Based Apporach and Database Apporach
Document12 pages
File Based Apporach and Database Apporach
Hafsa Khan
No ratings yet
Data Modeling and Database Design: Turn Your Data into Actionable Insights
From Everand
Data Modeling and Database Design: Turn Your Data into Actionable Insights
Brian Murray
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Big Data for Beginners: Book 1 - An Introduction to the Data Collection, Storage, Data Cleaning and Preprocessing
From Everand
Big Data for Beginners: Book 1 - An Introduction to the Data Collection, Storage, Data Cleaning and Preprocessing
Brian Murray
No ratings yet
Data Preprocessing: Optimizing Data Quality and Structure for Effective Analysis and Machine Learning
From Everand
Data Preprocessing: Optimizing Data Quality and Structure for Effective Analysis and Machine Learning
Brian Murray
No ratings yet
Data Warehousing: Unlocking the Power of Data for Strategic Insights and Informed Decisions
From Everand
Data Warehousing: Unlocking the Power of Data for Strategic Insights and Informed Decisions
Brian Murray
No ratings yet
Making The Move From Oracle Warehouse Builder To Oracle Data Integrator 12
Document19 pages
Making The Move From Oracle Warehouse Builder To Oracle Data Integrator 12
sam
No ratings yet
Chapter Two: Database System Concepts and Architecture
Document12 pages
Chapter Two: Database System Concepts and Architecture
Black Panda
No ratings yet
(CSUR '17) Optimization of Complex Dataflows With User-Defined Functions
Document39 pages
(CSUR '17) Optimization of Complex Dataflows With User-Defined Functions
Yuyao Wang
No ratings yet
Illness As A Burden in Anglo-Saxon England
Document18 pages
Illness As A Burden in Anglo-Saxon England
Jack Dong
No ratings yet
AIAA - S - 102 - 2 - 2 - 2009 Performance-Based System Reliability Modeling Requirements
Document38 pages
AIAA - S - 102 - 2 - 2 - 2009 Performance-Based System Reliability Modeling Requirements
Dani
No ratings yet
Engineering Hydrology: (WREN3201)
Document57 pages
Engineering Hydrology: (WREN3201)
Bekan
No ratings yet
What Is Data Warehouse?
Document9 pages
What Is Data Warehouse?
bittuankit
No ratings yet
Poor Reading Comprehension: Its Effect To The Academic Performance of Grade 7 Students in Agdangan National High School
Document34 pages
Poor Reading Comprehension: Its Effect To The Academic Performance of Grade 7 Students in Agdangan National High School
JOY CHAVEZ
No ratings yet
Research Process On Software Development Model
Document9 pages
Research Process On Software Development Model
Ruth amelia G
No ratings yet
Chapter 5 - Requirement Validation
Document31 pages
Chapter 5 - Requirement Validation
Nardos Kumlachew
No ratings yet
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Document101 pages
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
sambashivarao
100% (3)
Assignment 3 NPTEL DBMS January 2024
Document10 pages
Assignment 3 NPTEL DBMS January 2024
no.reply15203
No ratings yet
Data Management
Document7 pages
Data Management
Deepak Chachra
No ratings yet
Review 5 Short Term Electric Load Forecasting Using Neural
Document8 pages
Review 5 Short Term Electric Load Forecasting Using Neural
habte gebreial shrashr
No ratings yet
Smart Building Ems Uml PDF
Document400 pages
Smart Building Ems Uml PDF
James Cool
100% (1)
Chapter 1 5 Development of Wiper System Sensor For Light Vehicles
Document61 pages
Chapter 1 5 Development of Wiper System Sensor For Light Vehicles
Ruby Jane Orio
No ratings yet
Determinants of Capital Structure A Case Study of Automobile Sector of Pakistan
Document547 pages
Determinants of Capital Structure A Case Study of Automobile Sector of Pakistan
tome44
0% (1)
C++ Srs For Advertising System
Document29 pages
C++ Srs For Advertising System
Saiteja puli544
No ratings yet
Management Information System: UNIT-2
Document66 pages
Management Information System: UNIT-2
Akash Shukla
No ratings yet
Postgraduate (Government of Ireland)
Document17 pages
Postgraduate (Government of Ireland)
sar - e -chowk
No ratings yet
123 4 Wonde
Document67 pages
123 4 Wonde
Abinet Zewdu
No ratings yet
Tangombe Dissertation Rev2
Document58 pages
Tangombe Dissertation Rev2
TUMUHIMBISE LUCKY
No ratings yet
Jack Data
Document45 pages
Jack Data
Akshay C A
No ratings yet
Introsw
Document134 pages
Introsw
chirag_d8460
No ratings yet
Assignment 1 - EER Modeling
Document2 pages
Assignment 1 - EER Modeling
Haniya Elahi
0% (1)
Principles of Database Exam
Document5 pages
Principles of Database Exam
Gregory Odhiambo
100% (1)
NoSQL and SQL Data Modeling Bringing Together Data, Semantics, and Software (Hills, Ted)
Document217 pages
NoSQL and SQL Data Modeling Bringing Together Data, Semantics, and Software (Hills, Ted)
OswaldoRobles
No ratings yet
Bharati Vidyapeeth (Deemed To Be University), Pune (India) : Faculty of Management Studies
Document104 pages
Bharati Vidyapeeth (Deemed To Be University), Pune (India) : Faculty of Management Studies
Sam Jebadurai
No ratings yet