SQL Server SSIS Interview questions and answers: - How to check quality of data using SSIS?

Taken from my book SQL Server interview questions by Shivprasad Koirala http://www.flipkart.com/sqlserver-interview-questions-8183331033/p/itmdyuqz2a6tzhjw Many times you get raw data (as the one shown below) and you would like to understand what kind of quality does this data have?. For example for the below data you would probably like to know:• • • • How many null values exist in the name field? What are the types of contact information, email, phone, address etc. What kind of salary range exists? Etc etc…

Name Shiv Raju Ajay Kumar Neeraj Vishal sharma Yadav Dinesh

Contact shiv_koirala@yahoo.com 91-022-2130928933 shaam@yahoo.com ajay@yahoo.in kumar@gmail.in neeraj@yahoo.com suraj@yahoo.com vishal@gmail.in sharma@yahoo.com 91-022-2130928933 dinesh@yahoo.com

DOB 3/12/1980 11/2/1975 3/16/1988 5/22/1986 9/24/1977 4/16/1971 2/19/1973 6/24/1978 3/26/1976 8/13/1983 1/17/1966

Salary 1000 1500 1000 1000 6000 4000 8000 3000 2000 1000 5000

Country IND IND NEP IND USA USA IND USA IND IND IND

EMP Code E001 E002 E003 E004 E005 E006 E006 AMDK E005 AQPR E007

Pan card D001 D002 D003 D004 D005 D006 D007 D008 D009 D010 D011

CountryTaxcode IND IND NEP IND USA USA IND USA IND IND IND

Tax% 5 5 2 5 6 6 5 6 5 5 5

This can be achieved by using data profiling task. Data profiling task is available in the control flow toolbox. Following steps needs to be followed:• • • Create profile request in data profiling task. Once you run the data profiling task it creates a XML output. You can then view the XML output using data profile viewer. Data profile viewer exists in “C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn” directory.

What kind of profile requests exists in SSIS?
There are 8 ways by which you can profile requests.

Below are more details of what kind of data analysis is performed by these 8 profile requests.
Type of data analysis How many NULL values exist? Detects what kind of pattern does the data have email address , website URL etc. What are the minimum, maximum, average values in column? What are the distinct lengths of string values?. Profile request Column null ratio profile request Column pattern profile request. Column Statistics profile request.

For

Column Length Distribution Profile

instance you have a country code column you would like to ensure that the length should be equal to 3 (IND , USA ). In case there are some other
lengths you would like to take necessary actions ahead. Finds out how many distinct values exists for a column. How much does one columns depend on other column?. It helps you to find out at how many places the dependency has been violated. Which columns are good candidates for primary keys?. Checks if there is overlap of values between two columns?. Helps to detect a likely foreign key column?.

Column Value Distribution Profile Functional Dependency Profile Candidate Key Profile Value Inclusion Profile

Here’s an awesome SQL Server interview question: - How does index makes your search faster ? http://youtu.be/rtmeNwn4mEg?hd=1 Do not forget to see our .NET interview question videos and SQL Server interview questions from www.questpond.com