You are on page 1of 21

Understanding Data

Data Types
Structured Data
Data in a Organized form & confirm to data model

Semi Structured Data


Does not confirm to data model but has some structure

Unstructured Data
Data not in a organized form & does not confirm to data model

Digital Data

10% 10% structured data Semi structured data 80% Unstructured data

Structured Data
Conforms to a data model

Similar entities are grouped STRUCTURED DATA Attributes in a group are same

Data is stored in rows and columns

Data resides in fixed fields in a record or file

Dfn,format and meaning of data is explicitly known

Structured Data- Sources


Databases
Spreadsheets SQL OLTP Systems

Ease with Structured Data


Storage
Scalability Security Update and Delete

Retrieving Information
Indexing and searching Mining Data BI Operations

Semi-structured Data
Does not Conforms to any data model but contains tags and elements Similar entities are grouped Cannot be stored in rows and columns

Semi-structured Data
Attributes in a group may not be the same

Not sufficient meta data

The tags and elements describe data is stored

Semi-Structured Data Sources


Email XML Zipped Files TCP/IP Packets Mark-Up Languages Integration of data from heterogeneous sources

Semi-Structured Data
Challenges Storage Cost RDBMS Irregular and partial Structure Evolving Schemas Schemas and Data Possible Solutions XML RDBMS Special Purpose RDBMS OEM

Unstructured Data
Does not Conforms to any data model Has no easily identifiable structure UNSTRUCTURED DATA Does not follow any rules or semantics Not in any particular format or sequence Cannot be stored in rows and columns

Not easily usable by a program

Unstructured Data Sources


Web Pages Memos Videos Images Content of Mail Surveys Word Doc PPTs Chats Reports White Papers Etc.,

Unstructured Data Challenges


Challenges Storage Space Scalability Retrieve Information Security Update and Delete Indexing and Searching Solutions Change Formats New hardware RDBMS/BLOBS XML CAS

Measurements-Properties
Distinctness
Different objects receive different scores ( = or )

Order
Ordering of the numbers reflects ordering of the variable ( ,)

Addition &Subtraction
The difference in each situation is identical (+ or -)

Multiplication & Division


Assigning a value of zero indicates absence of the variable being measured ( X & )

Nominal Scale
Numbers assigned as labels or tags for identifying and classifying objects Numbers do not reflect the amount of characteristic possessed by the objects Permissible operation on nominal scale is counting as the numbers are arbitrary Limited statistical summarize measures can be used to

Ordinal Scale
Numbers are assigned to objects to indicate the relative position These numbers also indicate whether an object has more or less characteristic than some other object Orders categories logically Few statistical measures used to summarize these numbers

Interval scale
Will have equal intervals in the scale

Arbitrary zero point


Almost all the descriptive statistical measures can be used to summarize and analyze

Ratio Scale
Possess all the properties of nominal, ordinal and
Interval scale
It has an absolute zero point indicating the absence of the variable Categories has equal intervals Almost all the descriptive measures can be used to summarize and analyze

Scales of Measurement
Nominal Examples Ethnicity Religion, ZIP Gender, ID Distinctness Properties Ordinal Interval Ratio Weight, age, Height, Time , Temp in Kelvin Distinctness , Order , difference and Multiplication All arithmetic operations Class rank, Temperature in Letter Grade, Celsius or Mineral grading Fahrenheit Marks Distinctness & Order Distinctness , Order and difference

None Mathematical Operations Statistical Measures Mode, Contingency table, Chi square

Rank Order ( , )

Add Subtract

Weighted Mean, Median , rank correlation , percentile

All Descriptive Measures ( Mean, SD, Pearsons correlation & t

All Descriptive Measures ( GM, Percent Variation etc)

Data Availability & Preparation


Data Availability , Seeking Permission & Third party data

Data Understanding Phases


Collect Initial Data
Locating ,assessing and obtaining data

Describe Data
Properties of data Amount of data, data type etc

Explore Data
Cross tabulations, Associations & Descriptive Statistics

Verify Data Quality


Complete data, data on all variables & missing values

Data Preparation
Select Data
Selection of variables and data

Clean Data
Handle Missing values

Construct Data
Transformation of Variables

Integrate Data
Merging Data from different sources

Format Data
Structure of Variables

Explore
The Data Warehouse- Digital Chapter , HBSP SPSS Introduction spss.co.in Review of BRM/MR course

You might also like