You are on page 1of 6

Data Cleansing Using MS.

Excel

1. Removed Duplicates(1 row)


2. Removed rows with junk values for DOB(3 rows)
3. Removed rows with no values for Location Details(1 row)
4. Changed Relationship status from NA to Self for 3 rows
5. Replaced Blank Space and NA for Numeric Fields with 0.
6. Splitted the csv into 3 csv
1. For Location Dimension(Removed Duplicates after creating a separate csv)
2. For Other Details Dimension(Removed Duplicates after creating a separate csv)
3. For Fact and Factless Tables.

Data Loading according to Data Model.

Used Microsoft SQL Server Management Studio.

1. Created a new DB.


2. Created a DataBase Diagram and established Primary Key,Foreign Key associations.
Also Created Surrogate Key Fields for Dimension Tables.
CLAIMS_TABLE (dbo)
Claim_Amount
Claim_Type
Approved_Amount
Deducted_Amount
Claim_Intimation_Date_Key
Claim_Intimation_Time_Key
ClaimIncident_Accident_Date
ClaimIncident_Discharge_Date
Charges_Accommodation
Charges_Nursing
Charges_Package
Charges_Consultant
Charges_Investigation
Charges_Surgery
Charges_Procedure
Charges_Misc
Charges_Pharmacy
Charges_Ambulance
Charges_Maternity
Charges_Others
Other_Details_ID
Location_ID
Cover_Start_Date
Cover_End_Date

OTHERDETAILS_TABLE (dbo)
Other_Details_ID
[Gender(BK)]

LOCATION_TABLE (dbo)
Location_ID
[District(BK)]
DATE_TABLE (dbo)
DateKey
DateFullName
DateFull
Year
TIME_TABLE (dbo) Quarter
TimeKey
CLAIMSSTATUS_TABLE (dbo) QuarterName
Hour24 Claim_Intimation_Date_Key QuarterKey
Hour12 Claim_Intimation_Time_Key Month
AmPmString Other_Details_ID MonthKey
Minute Location_ID MonthName
Second Claim_Status DayOfMonth
FullTimeString24
NumberOfDaysInTheMonth
FullTimeString12
DayOfYear
WeekOfYear
WeekOfYearKey
ISOWeek
ISOWeekKey
WeekDay
WeekDayName
FiscalYear
FiscalQuarter
FiscalQuarterKey
FiscalMonth
FiscalMonthKey
IsWorkDayKey

Once Database Diagram is saved successfully. The Data Tables are automatically created by SSMS.

In the Table Design, For Dimension Tables (Location and Other Details) set the Surrogate Keys as
Indexed seed based Incrementers.
3. Loaded DATE DIMENSION TABLE BASED ON A SQL SCRIPT

4. Loaded TIME DIMENSION TABLE


5. Imported Data into Location_Table from CSV file separated earlier.

Surrogate Keys got generated


6. Imported Other_Details Table from earlier split CSV.

Surrogate Keys got generated.

7. Now exported both the location and other details back as csvs.
8. Used VLOOKUP function in Excel to map the fact table Foreign Keys of these Surrogate Keys.
9. Loaded Factless Fact Table. Claim Status Table.

10. Loaded Fact Table Claim Table.

This successfully completes the Data Loading Process.

You might also like