ASSIGNMENT OF DATAWAREHOUSE AND DATA MINING

CSE-501

SUBMITTED BY:ABHINAV MAHAJAN ROLL NO- RB27T2A22 REG.NO-3070070116

SUBMITTED TO:Ms. NEHA BHATEJA

Q1: Data warehouse tends to be as much as 4 times as large as operational databases. The Data Warehouse is designed to support robust query and analysis. o Advances in technology in terms of network. o An ongoing training program for business analysts. How can you manage this amount of data? Ans: The Data Warehouse manages the data in the SQL Server and OLAP databases. So they take advantage of Microsoft SQL Server 2008 R2 Parallel Data Warehouse and its massively parallel processing (MPP) architecture to gain scalable performance. Example: Organizations need actionable and timely business insights from rapidly growing data. for example. to identify user trends or to analyze the effectiveness of a campaign. Parallelism can be performed in two ways: • Horizontal Parallelism • Vertical Parallelism . You use the Analysis modules in Commerce Server Business Desk to analyze the data in the Data Warehouse. o Follow parallel processing: When the server on which the data is being loaded is short of memory space then that data is being loaded to the multiple servers so that the total system throughput is increased through parallel processing of the data in multiple servers. These databases are then used to produce reports and to analyze and view population segments. o By compression: Extra amount of data present in the dataware house is compressed using some different techniques and tools so that only relevant data that is of use is kept in the data warehouse. and then update your site to target content to specific user groups or to sell specific products. and hardware choices with the most comprehensive data warehouse solution available. flexibility. There are many different ways by which we can manage large amount of data: o Ad hoc query access grows over time and must be carefully monitored as new inexperienced users continue to run requests against base tables rather than summary or aggregate tables to produce totals. hardware and software require more rapid release changes to be applied. executives and decision support tool programmers keeps everyone informed as how to use the current version pf the data warehouse or mart and find the information they need.

o Distribution on the basis of probability of access: In this the data in the data in the server is physically separated on the basis of probability of data being accessed usually. Q2:. very fast query response time because data is mostly pre-calculated There is a practical limit on the size because the time taken to calculate the database and the space required to hold these pre-calculated values. Data that is accessed mostly and is important is at the top and then the moderate and at last is the data that is used rarely. . Ans: Multidimensional OLAP • • • The database is stored in a special. structure that is optimized for multidimensional analysis.How multidimensional is different from multirelational OLAP. usually proprietary.

The database is a standard relational database and the database model is a multidimensional model. . often referred to as a star or snowflake model or schema. The data is stored more efficiently as it is stored in the multidimensional arrays.• • • • When the requirement is only to access summarized data then it fulfills the request. It requires large amount of the disk space available. The storage issue is the main problem being faced by the Multidimensional OLAP. It is suitable for the implementation of standalone databases and small data warehouse. performance of the queries will be largely governed by the complexity of the SQL and the number and size of the tables being joined in the query There can be a linkage between ROLAP databases and data warehouse databases. The maintenance issue is the main problem being faced by the Multidimensional OLAP. In ROLAP its takes more time while retrieving the data. more scalable solution. Multirelational OLAP • • • • • • • • It stores all the aggregations and data within the relational database. MOLAP databases and data warehouse databases have no linkage among them to be used for query purposes.

Shared nothing architecture: In Shared. All the data that is being distributed across multiple local disks can be accessed equally by all the processors. Each of the processor there in the system has its own unique feature that is memory and data over interconnection network which also differentiate the processors. Each of the RDBMS servers present have authority or permission to make changes to the database that is being shared.Q3: How can we map the DW to Multi Processor architecture? Elaborate. Shared-disk architecture: In this architecture the entire database is accessed across multiple RDBMS servers. memory and the entire database can be utilized by a single RDBMS. .nothing architecture RDBMS the execution of the queries is parallelized across the multiple processing nodes. It results in decrement in the performance as there is an uneven distribution of the data. Ans: We can map the DW to Multi Processor architecture with the help of three DBMS software architectures: Shared-memory architecture: Through this architecture all the major components such as processors.

where charge is the fare that a spectator pays when watching a game on a given date.Part B: Q4: Suppose that a data warehouse consists of 4 dimensions. or seniors with each category having its own charge rate. count and charge . game and the two measures. adults. Draw a star schema diagram for the data warehouse. spectator. date. Spectators may be student. Ans: Dimensions of Spectator Spectator id Spectator category Spectator name Dimensions of Date \ Date id Day Month Year Face table Date id Spectator id Location id Game id Count Charge Spectator Address Dimensions of Game Game id Game name Game Description No of players Dimensions of Location Location id Colony City Country . location.

Tool 2 Tool 1 Tool Profile Tool 3 Tool Profile Tool 4 Tool 5 Tool Profile . Metadata metamodel: It consist of set of objects that metadata interchange standard can be used to describe. Two metamodels are defined by it: 1. Ans: Metadata interchange standard framework was designed to handle certain issues such as exchanging.  The Standard Access Framework: It gives information about the no of Application Programming Interface that must be supported by the vendor for proper exchange of metadata. The main components of metadata interchange standard framework are:  The Standard Metadata model: The ASCII file format that is being used for the representation of metadata being exchanged among different data sources is described by it. sharing and managing of metadata. Application metamodel : It contain the tables to hold the metadata in the tabular form.  The User Configuration: It is kind of file that provides a useful facility to its customer so that the metadata that is being propagated from one tool to the another can be constrained and also helps in finding out whether the metadata model file is being imported by any of the tools.  Tool Profile: It gives detail about the aspects that is being supported by each tool for interchange standard model. 2.Q5: Discuss the components of metadata interchange standard framework.

maintenance and presentation of information that describe the organization’s data and processes.Tool Profile Tool Profile Q6: Discuss the benefits of metadata repository Ans: Metadata repository manages the metadata. . through the capture. It consists of tools that increase business and technical understanding of data.

control and reliability of application development process. Cost Reduction  Less project hours required for legacy discovery. It increases the flexibility. and business uses of our data.  Maximize development efficiencies to create a more nimble and responsive organization  Improved decision making through better understanding of information and processes. end user interfaces. processes. structures.  Reduced rework due to poor or inaccessible information. • • • • • • • .  Provides architecture & infrastructure to maintain & present the data quality expectations. This will allow you to start quickly without too much complexity and spending too much time on development.  Increased data quality reduces data scrap and rework  Reduced dependency on undocumented associate knowledge Shortened Delivery  Project legacy discovery can be automated or accessed electronically. Competitive Advantage  Deeper insight into our customer (and ourselves) through improved data quality. You can customize it to your requirements because your technicians have full control over design and functionality. definitions.The benefits are: • Improve Quality:  Metadata is the input needed to build the data profiles that are the foundation of data quality. flow thru the enterprise. meta tutorials and any other usage of the metadata repository.  Reduced time spent verifying the quality of information. Ability to customize your reports. You can build and deploy the metadata repository capabilities in increments over time.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.