New – Added links to each topic, click on each topic to get to appropriate links 1, 2 Denotes more links learndatamodeling

.com
This is the best site where you can learn everything about Data Warehouse technology. Highly recommended.

 Explain in terms of data warehousing terminology        Please refer to Data Warehouse Concepts

Data Warehouse vs. Data Mart Kimball vs. Inmon 1 3NF vs. Denormalized Star vs. Snowflake 1 Dimensional Modeling 1 Definition of the following  Fact, Dimension, Conformed Dimension, Staging Area, Time Dimension 1 and 2, Junk Dimension, Slowly Changing Dimension, Factless fact, Conformed fact, Surrogate key, Hierarchy, Time variant, subject oriented, OLTP, ODS, DSS, OLAP, ROLAP, MOLAP  Development Methodology (Top down, bottom up) and their relative benefits  Environment/Process you are working/worked on:                     Size of your Data Warehouse/Data Mart Version of DB, Type(Oracle, DB2, SQL Server, Teradata) Transactions per day (Extraction, Loading). How many facts and dimensions? Error handling and Support Data Profiling. Data Cleansing tools External scripts, PL/SQL, Shell Scripts Scheduling software (Informatica, Cron or any other s/w). Number of mappings/mapplets, transformation. Complexity like looping, external procedures (C++, Java) Workflows, Worklets(if any), Sessions Performance tuning methods Scripting in terms of calling pmcmd, pmrep(for backup etc) FTP Methods(ftp, sftp, ssh, rsync etc) In case SAP PowerConnect, FTP or Stream, overriding SAP ABAP code(if have you done so) Team based development  Global Repository, Shortcuts Configuration management tools like (CCM, Clearcase, PVCS etc) Unit, Volume and Integration testing. Migration Compliance, Test/Production Rollout Metadata Management

 Estimate timelines.  Return of Investment (ROI).  Identify critical project milestones  Identify task owner.  Net Present Value(NPV)  Payback period  Tool Selection  ETL  Reporting  Internal Rate of Return(IRR)  Enterprise Deployment Map. the delivery dates and the project tasks in the WBS  Construct a Project Schedule  Critical Success Factors  Common data definitions or Metadata  User Training  User Expectations  Schedule  Budget  Scope  Performance  Availability  Simplicity (ease of use)  Tool functionality  Data cleanliness  Users’ roles and responsibilities  User Involvement  Realistic Schedule  QA Process  Change Control Procedures .  Roles and Responsibilities.  Identify the quantity required. Project Management  Business Case Assessment  Cash flow analysis.  Potential Benefits Analysis. Costs and Benefits Adjustments  Service Level Agreements (SLA)  Work Breakdown Structure  Develop task list and description  Identify any key internal and external project dependencies.  Risk Analysis.  Create a timeline.  Resources Identification  Identify the generic resources required to complete each task.

Realistic.  Customer Loyalty  Number of service calls  Total units sold  Key Performance Indicators(KPI)  SMART (Specific. Timely)  Quantitative indicators which can be presented as a number. Achievable.  Actionable indicators are sufficiently in an organization's control to effect change .  Directional indicators specifying whether an organization is getting better or not. Computer Manufacturing Unit)  Net revenue generated  Net costs related to the production and distribution. Roles and Responsibilities      Business User Database Administrator Data Warehouse Project Manager DW Architect Support Team(Production.g.Migration)  Metrics to evaluate the success of the project/Scorecard  Pre and Post BI Implementation  Operational Metrics  Latency tolerance  Granularity  Availability  BI Related Metrics  Functional quality  Data quality  System performance  Network performance  User satisfaction  ETL Process Time (Extraction and Loading)  Post processing time  Number of Reports/queries  What data is accessed  Satisfies scope agreement  Benefits achieved  Scalability  Enterprise Metrics (e.  Practical indicators that interface with existing company processes. Measurable.

Hint – (use awk. Kimball or Inmon and why? What are the advantages of the Corporate Information Factory 1 2 3(CIF) architecture vs.. What are factless fact tables and give a scenario where you will use it? What type of optimization techniques applied at the data model level and why did you choose them? How will you handle data cleansing. You have a data warehouse setup and your company did a major acquisition/merger and how will you go about handling the new data flow into your setup.uniq)  How do I identify a blocked port? (Hint –listening)  I uploaded a file via ftp. How will you handle this situation? For e. validation? What is Master Data Management? How will you handle data rejects in your Data warehouse? Describe common techniques for loading from the staging area/OLTP to the warehouse when you only have a small window. the bus architecture with conformed dimensions? What suits best for your environment and why? What is subject area? What is difference between normalization and de-normalization? Can we have a normalized data warehouse schema? Why do we need an ODS? How will you plan to load your data into your data warehouse from an ODS vs. The script should ignore processes created by the user kimberly. 2 dimensions How would you model unbalanced hierarchies How would you model cyclic relations What major elements would you include in an audit model? How would you implement traceability? What steps would you take to improve reporting performance?              Unix and Shell script  Write one shell script to kill all process spawned by user. an OLTP environment? Assume that you are designing a data warehouse for an insurance company. you need to add couple of dimension keys to existing fact tables.g. I want to find out if there are any duplicates in the list. Questions for Data Warehouse Architect            What is dimensional modeling and how it is different from ERD? What are the benefits and pitfalls of snowflaking? What methodology did you adopt. where agent keeps moving across many regions/states. Hint – (use awk. xargs)  My crontab list is pretty big. What went wrong? . grep. It is garbled. ‘kim’. What type of SCD you will design for an insurance agent table. How will go about designing star schema for your business? In your organization business requirement/practices have changed dramatically and warrants for major schema refactoring of your existing data warehouse. How do you load type 1 dimensions.

It should be a one -line sql? Hint –on sqlerror. vmstat and netstat. what will happen if you run this sql in production. Typical Questions (Most of these questions are meant for experienced professionals. Now if you run ‘delete from tableB’ and then execute a rollback. Drilling Across and Handling Time  Why can’t we have just one single fact table instead of more than one fact table?  How would you go about deleting 10 million records from 100 million records table in a production after an erroneous load? Hint – Try alternate method other than delete. 1=2. How many rows will be in tableA in each case?  Is full table good or bad? Why does CBO decide FTS is better than Index Scan?  What happens behind when a cursor is a) Opened b) Fetched and c) Closed?  Name a few implicit cursor attributes?  Explain iostat.blogspot.  Primary key updateYour company XYZ has acquired a new company. tableA. how do you generate this error?  Why do we use dimensional modeling instead of ER modeling for data warehousing applications?  Why can’t we use a copy of our transactional system to meet our data warehousing needs? If we are unable to use a copy of transaction system as DW. tableB and tableC. and you need to add a column with a not-null constraint and a predefined default value to this column.  How does PL/SQL syntax error differ from a runtime error?  Oracle hard-parse vs.)  General Questions  Please refer for more help .  You have a table named ‘customer’ with 100 million rows. emp_v on emp with a condition where emp_salary <100000 and creates a public synonym as emp on emp_v. 1=1 2.com  When you make index ‘unusable’ and now issue a truncate command on the table.http://devisql. What is the best approach. Will it usable or unusable?  User A has a table ‘emp’ and he creates a view. ALTER TABLE customer ADD (cust_credit_rating number default 1 not null). Fresh candidates please do not get overwhelmed with these questions. whereas . How many rows will be there in each of these tables. soft parse  You have two tables A and B and you run ‘insert into tableA select * from tableB’ and then run ‘create table tableC as select * from tableA’. You are asked to load their customer information into the data warehouse. Your company’s cust_id is numeric.Drilling Down. create tableA as select * from tableB 1. then why don’t we use it as a staging environment?  What are the three most important themes in a data warehouse? Give examples? Partial Answer .  Your sql script must exit with an error should anyone else run other the specified user. o What will user A see when runs select * from emp o What will user B see when he runs select * from emp?  tableB has 100 rows and when you use in the where clause. like create table xyz as select * from or if it is a partitioned table. try dropping the partition. What will happen to status of index.

select emp_no.Take advantage of Cartesian product and union How do you perform debugging in Stored Procedure? How can you transform a sub-query involving the IN clause to a Join? E. select emp_no. name from emp where dept_id in (select dept_id where dept_name=’HR’) How can you transform a statement involving an OR condition to a UNION ALL? Eg. How can you handle this situation? How can you handle the customer records that already exist in your data warehouse? Hint – Use of surrogate key What is Changed Data Capture and how will handle them in case of OLTP environment like SAP. try dropping the partition.g. name. ROWNUM-1 ). title from emp where title=’MANAGER’ or ‘DIRECTOR’ How to get count of the different data values in a column? How can you get count/sum ‘ranges of data’ values in a column? Give me sql for each of the following 1) TOP N Rows from a table 2) EVERY Nth row from a table 3) Rows X to Y from a table What does this sql do? SELECT TO_CHAR (ADD_MONTHS (DATE '2010-01-01'. Oracle Apps? What is ODS? Why do you need them? Where would you place them? What do you mean by Drilling Up and Drilling around? Hint – subtract and sharing of related fact How would you go about deleting 10 million records from 100 million records table in a production after an erroneous load? Hint – Try alternate method other than delete. your extract process keep failing with “snapshot too old” error? How will fix this issue? Hint – your DBA has provided with alter session privilege What is the difference between ‘drop index’ and ‘alter index unusable’ and rebuild? Why do we prefer second method? How do you handle duplicate records from Relational Database? You are required create one million records (sequence numbers) into a table using one sql command? Hint:.               theirs is a string. like create table xyz as select * from or if it is a partitioned table. When you extract data from source systems during night.'YYYYMM' ) FROM dual CONNECT BY ROWNUM < 13 How can I use the above sql to fill in the values in a fact that does not have data for certain months? .

CHR(10)  How to delete duplicate records from source database/Flat Files? Can we use post sql to delete these records. for to run version 7 ‘pmcmd’ command on version 8. there are several mappings in version 7 and these mappings use external procedures (AEP).infa file?  Can we create a mapping without using a source qualifier? Hint .  You want to attach a file as an email attachment from a particular directory using ‘email task’ in Informatica.CHR(13). How can you create a workflow that will send you email for sessions running more than 30 minutes.. instead of using DD_INSERT I put the value as 1. How can you achieve this by means of workflow control?  You have a requirement to alert you of any long running sessions in your workflow. 8500.e. commission_formula*salary) to calculate commission. You wish to run one set and you want to selectively ‘turn-off’ the second set of mappings by means of ‘just’ one parameter so that these sessions/mappings would just run and exit with no data. What type of transformations you will use?  What is the use of indirect file list in PowerCenter?  In your environment. You need to move these mappings to version 8. but there are many new line characters that cause your application to fail. deptid and 12 columns for salaries of each month (jan-dec). 123.5/2*1. emp_id.salary_date. How will you do it? Hint –%a  You have two sets of sessions (mappings) in a workflow. How would handle these characters before they are loaded inside flat file? Hint – Ascii equivalent of CR. Scott. emp_name. dept_id. procedure or Informatica mapping or workflow control.com/ or http://leandatamodeling. 01/31/2006 You need to evaluate the commission formula and calculate the commission with the salary (i. salary. Now you requirement is to create one record for each month and also running total of the salary.  What is purpose of ‘infacmd’ command and domains.Normalizer  Can we use a flat file as a lookup?  How can you transpose rows to columns and vice versa and what transformations would you choose?  In the update strategy. How can you manage this using a parameter file? Hint 1=2  One of your source file you load contains a formula column. shell script. 4. What steps do you take?  Your customer has a requirement that in a day you should not run workflow more than once. d01. Will this work?  If you check all the ports as group by in the Aggregator transformation. 01/20/2005 453. commission_formula (formula_column) . how many rows would be output?  How can you limit number of running sessions in a workflow?  What is the difference between Union and Joiner Transformation?  Your source data is a flat file and it has empid. 2*3. King. d02. how can you delete duplicates before it . 4500. In case of flat file. You can use any method. Informatica PowerCenter related:  http://learnbi.4. How can you accomplish this?  Your mapping is loading the target data in a flat file.  What changes do we need to make.com  What are key differences over PowerCenter 7 vs 8? Give one significant change version 8 is offering.

What approach would you take? Hint – Export XML file and use of regular expressions?  When you export a workflow from Repositor Manager. what does this xml contain? Workflow only? Hint – Question itself contains the hint?  What is difference between Mapping parameter and variable?  How do you set a mapping variable during session run? How do you reset them?  What are the pitfalls of using Sequence Generator transformation or in general. why do we avoid?  What precautions do you need take when you use reusable Sequence generator transformation for concurrent sessions?  Tell me what are alternative methods of Sequence Generator transformation? How will go about using the same when you use them in concurrent sessions?  Is it possible negative increment in Sequence Generator? If yes. You are supposed to change variable names from variable(x) to myvariable(x+1). how many columns will show up in the generated SQL?  You are asked to document source to target dependency of all Informatica mappings and you know there are some repository views (REP_TBL_MAPPING) provided by Informatica can do this job. You have specified variables in the sessions. How will identify the exact issue? Hint – reject records. How would do trace the error? What log file would you seek for? Hint – What other types of logs are available in that server?  Your business user says revenue report does not tally with SAP (source system) report though the ETL process did not fail today. what change would do to load OPB_TARG_TBL_EXPR which provides a crucial link to REP_TBL_MAPPING?  You are required to perform “bulk loading” using Informatica on Oracle. pull out one column from the source qualifier into an Expression transformation and then do a "generate SQL" in the source qualifier.MX Data. path?  What is a parameter file and how would dynamically create them using a mapping and what transformation would you choose? Hint – Normalizer  What is the Difference between connected & unconnected lookup . worklets and workflows. But these views return no rows. Environment SQL (where is it?)  Your session failed and when you try to open a log file. it complains that the session details are not available.starts loading?  Join two or more tables and then pull out two columns from each table into the source qualifier. what action would perform at Informatica + Oracle level for a successful load? Hint –alter session privilege. How can you rectify this issue? Hint . These variables are all over the place. primary key  There are around 100 sessions in a workflow. how would you accomplish it? Hint – Expression  How can you handle sequence generation over 2 billion records?  Which directory Informatica looks for parameter file and what happens if it is missing when start the session? Does session stop after it starts? Hint – Does your source qualifier have dependency on mapping parameter?  What happens when a particular parameter is missing?  Informatica is complaining about the server could not be reached? What steps would you take? Hint – Hosts file  You observe lately some sluggishness in getting repository objects. Now. what action would you perform? Hint –Repository objects  Informatica server suddenly stops after starting? How can you rectify this issue? Hint – Server variables.

which port should be the output? What is Parameter file and explain scenario when u use? What is the difference between surrogate key and primary key? . If you have more than two sources. col3). How can you manage the lookup? Hint – Persistence                  What are the type of caches are available in lookups? How will you improve the performance of a lookup? What is a sorted Input option in Source Qualifier? How will you increase the performance of an aggregator transformation other than using sorted input? What is an Incremental Aggregation? What is an Incremental Loading? What will happen if you copy the mapping from one repository to another repository and if there is no identical source? Difference between router & filter transformation and where do you need to place filter transformation to get better performance? What is joiner? How do you perform recovery? What happens internally? Which process writes the information into repository tables? What is difference between an abort and stop in a session? What is DTM? What is update strategy? What is data driven? An Aggregate transformation has 4 ports (l sum (col 1). how will use the joiner transformation to join these sources?  If you have more than one pipeline in your mapping how will change the order of load?  What approach would you take so that your source qualifier SQL override is database independent? Hint – Next question  What is an ANSI SQL?  What is a mapplet?  What are an active and a passive transformation?  Can we use an active transformation inside a mapplet?  What is the difference between Source Qualifier and Joiner?  How do you override SQL in Lookup?  How do you change order by clause in lookups?  How many types of lookups are there?  What is a dynamic lookup and what is the significance of NewLookupRow? How will use them for rejecting duplicate records?  What are benefits over connected vs. unconnected or vice versa?  In an unconnected lookup can it only have an output port and will the mapping work if you don’t check return port and why? Hint – Try un-checking Return port  You have more five mappings use the same lookup. group by col 2.

Query Rewrite. INCLUDE=TABLE:''LIKE 'EMP%'"  Restartability 13) Bigfile Tablespaces  BFT varies based on the database block size. removing unused space thus resetting the high-water mark 6) Automated collection of statistics. Oracle 10g New Features:1) DB configuration assistant 2) Renaming Tablespaces 3) Drop Database 4) Automatic undo retention 5) Allows you to manually shrink the overall size of a table. and Summary Management DDL Features 17) Partitioning New Features 18) SQL and PL/SQL regular expressions 19) Partition outer join . collection of data dictionary statistics 7) Automatic SGA Tuning 8) Sorted Hash Clusters 9) Virtual Private Database with New features  Column-level privacy  New VPD policies  Support for parallel query 10) Database recovery  Easier recovery through the resetlogs command  Changes to the alter database archivelog command  New and changed Oracle Database 10g backup commands 11) Flashback Database 12) Oracle Data Pump previously exp and imp  Data Pump provides new methods of moving data between databases  Importing Specific Database Object Types. It can range anywhere from 8 terabytes to 128 terabytes 14) Cross-Platform Transportable Tablespaces 15) Enhanced Merge Functionality  Optional delete Clause  on Condition 16) New Materialized View.

and change data and structures in that database 4) Datapump New Features  Deprecation of Export utility  Compression of dump file sets  Encryption improvement 5) Oracle Database Advisors 6) Partitioning a. Composite list-hash partitioning iv. Oracle 11g New Features:1) Automatic Memory Management a. Composite list-range partitioning iii. The memory_target Parameter b. Interval partitioning b. . Composite range-range partitioning ii. The memory_max_target Parameter 2) Automatic Diagnostic Repository (ADR) for fault diagnosability 3) Snapshot Databases – Allow you to open a physical standby database. Extended composite partitioning i. Reference partitioning i. form a parent-child relationship and logically equi-partition these tables d. System-managed domain indexes 7) Virtual Columns 8) The Pivot and Unpivot Clauses 9) Invisible Indexes  An invisible index is an index that the optimizer cannot see and therefore it will not consider while it generates the execution plans. Composite list-list partitioning c. System partitioning e.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times