You are on page 1of 9

ASSIGNMENT NO 1

AIM: Import the legacy data from different sources such as (Excel, Sql Server, Oracle etc.) and
load in the target system. (You can download sample database such as Adventure works,
Northwind, foodmart etc.)
Objective: To load the legacy data into the target system
Software/hardware used: POWER BI
Operating System used: Open Source Operating System (Ubuntu/Linux)
Theory:
Data from external sources, generated as CSV flies, are loaded to the legacy data container. User
can select the mode (load from server or load from client) and it is recommended to use server
loading, for large data bulk. For each data load a load sequence will be generated. In order to be
used in the Mapping, it must be locked. User will be able to define date format, thousand
separator format and decimal format and these will override the formats define in the source.

Legacy Load

The first step is to Load the Legacy Data from outside Data Migration Manager. This is done by
first define the Legacy Data Header, which is based on the Migration Project ID, the Source from
where the data originates from and finally the Legacy Data Table Name.

When the Header is defined, the next step is to define how the Data File is structured, that is, what
Character set, Field Delimiter, String Delimiter, Date and Number format is used in the Data File
that will be imported. Last step is to define where the data file is located and how to import if,
from Server or Client.

When the file is imported, it creates a Load Sequence number value, so it is possible to keep track
on different imports of the same Table over time.

Loading from Client

 Loading from client will read the whole file to the buffer using the default encoding in the
user’s machine and based on the delimiter provided it will break it into lines.
 Later it will read line by line and break the line into columns using the string delimiter
provided.

Loading from Server


 For better performance data is loaded using the Oracle’s external tables concept
 Database directories must be set at the time of DMM initial installation
 In case of loading failure, error message logged in bad file created is read and added to the
legacy load message column.
 When the same file is loaded next time all the temporary files will be deleted and
regenerated.

Lock Legacy Load

When the File is imported the data Load needs to be Locked before it can be used. Only one Data
Load can be locked at the same time for a Legacy Table. This means that, if the Legacy Table
has the same structure for each Load, Data Migration Manager can use the same Mapping to
handle different loads from the Legacy Table. To make sure that the Data is structured the same
way a Legacy Table Definition is created when the Load is locked. Then Data Migration
Manager, can compare the different Loads Table Structure and create a warning if they are not
the same.

Conclusion: We have successfully loaded the data into the target system.
ASSIGNMENT NO 2
AIM: Perform the Extraction Transformation and Loading (ETL) process to construct the
database in the Sql server.
Objective: To construct the database in the SQL server
Software Used: Power BI, SQL Server 2022, Visual Studio 2022, SQL Server Management
Studio 18, Microsoft analysis service
Operating System: Open Source Operating System
THEORY:
What is ETL?
ETL is a process that extracts the data from different source systems, then transforms the data
(like applying calculations, concatenations, etc.) and finally loads the data into the Data Warehouse
system. Full form of ETL is Extract, Transform and Load.
It’s tempting to think a creating a Data warehouse is simply extracting data from multiple sources
and loading into database of a Data warehouse. This is far from the truth and requires a complex
ETL process. The ETL process requires active inputs from various stakeholders including
developers, analysts, testers, top executives and is technically challenging.

In order to maintain its value as a tool for decision-makers, Data warehouse system needs to change
with business changes. ETL is a recurring activity (daily, weekly, monthly) of a Data warehouse
system and needs to be agile, automated, and well documented.

 It helps companies to analyze their business data for taking critical business decisions.
 Transactional databases cannot answer complex business questions that can be answered
by ETL example.
 A Data Warehouse provides a common data repository
 ETL provides a method of moving the data from various sources into a data warehouse.
 As data sources change, the Data Warehouse will automatically update.
 Well-designed and documented ETL system is almost essential to the success of a Data
Warehouse project.
 Allow verification of data transformation, aggregation and calculations rules.
 ETL process allows sample data comparison between the source and the target system.
 ETL process can perform complex transformations and requires the extra area to store the
data.
 ETL helps to Migrate data into a Data Warehouse. Convert to the various formats and
types to adhere to one consistent system.
 ETL is a predefined process for accessing and manipulating source data into the target
database.
 ETL in data warehouse offers deep historical context for the business.
 It helps to improve productivity because it codifies and reuses without a need for
technical skills.
ETL Tools
There are many ETL tools are available in the market. Here, are some most prominent one:

1. MarkLogic:

MarkLogic is a data warehousing solution which makes data integration easier and faster using
an array of enterprise features. It can query different types of data like documents, relationships,
and metadata.

https://www.marklogic.com/product/getting-started/

2. Oracle:

Oracle is the industry-leading database. It offers a wide range of choice of Data Warehouse
solutions for both on-premises and in the cloud. It helps to optimize customer experiences by
increasing operational efficiency.

https://www.oracle.com/index.html

3. Amazon RedShift:

Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all types
of data using standard SQL and existing BI tools. It also allows running complex queries against
petabytes of structured data.

https://aws.amazon.com/redshift/?nc2=h_m1

Here is a complete list of useful Data Warehouse Tools.

Best practices ETL process


Following are the best practices for ETL Process steps:

Never try to cleanse all the data:

Every organization would like to have all the data clean, but most of them are not ready to pay to
wait or not ready to wait. To clean it all would simply take too long, so it is better not to try to
cleanse all the data.
Never cleanse Anything:

Always plan to clean something because the biggest reason for building the Data Warehouse is to
offer cleaner and more reliable data.

Determine the cost of cleansing the data:

Before cleansing all the dirty data, it is important for you to determine the cleansing cost for every
dirty data element.

To speed up query processing, have auxiliary views and indexes:

To reduce storage costs, store summarized data into disk tapes. Also, the trade-off between the
volume of data to be stored and its detailed usage is required. Trade-off at the level of granularity
of data to decrease the storage costs.

Conclusion: Thus we have performed the Extraction Transformation and Loading (ETL)
process to construct the database in the Sql server.
ASSIGNMENT NO 3
AIM:
Create the cube with suitable dimension and fact tables based on ROLAP, MOLAP and HOLAP model.
OBJECTIVES:
To create the fact tables based on ROLAP, MOLAP and HOLAP model.
SOFTWARE REQUIREMENT:

Power BI, SQL Server 2022, Visual Studio 2022, SQL Server Management Studio 18,
Microsoft analysis service
Operating System: Open Source Operating System
THEORY:

What is OLAP?

OLAP (Online Analytical Processing) was introduced into the business intelligence (BI) space
over 20 years ago, in a time where computer hardware and software technology weren’t nearly as
powerful as they are today. OLAP introduced a groundbreaking way for business users (typically
analysts) to easily perform multidimensional analysis of large volumes of business data.

Aggregating, grouping, and joining data are the most difficult types of queries for a relational
database to process. The magic behind OLAP derives from its ability to pre-calculate and pre-
aggregate data. Otherwise, end users would be spending most of their time waiting for query results
to be returned by the database. However, it is also what causes OLAP-based solutions to be
extremely rigid and IT-intensive.

What is ROLAP?

ROLAP stands for Relational Online Analytical Processing. ROLAP stores data in columns and
rows (also known as relational tables) and retrieves the information on demand through user
submitted queries. A ROLAP database can be accessed through complex SQL queries to calculate
information. ROLAP can handle large data volumes, but the larger the data, the slower the
processing times.
Because queries are made on-demand, ROLAP does not require the storage and pre-computation
of information. However, the disadvantage of ROLAP implementations are the potential
performance constraints and scalability limitations that result from large and inefficient join
operations between large tables. Examples of popular ROLAP products include Metacube by
Stanford Technology Group, Red Brick Warehouse by Red Brick Systems, and AXSYS Suite by
Information Advantage.

What is MOLAP?

MOLAP stands for Multidimensional Online Analytical Processing. MOLAP uses a


multidimensional cube that accesses stored data through various combinations. Data is pre-
computed, pre-summarized, and stored (a difference from ROLAP, where queries are served on-
demand).

A multicube approach has proved successful in MOLAP products. In this approach, a series of
dense, small, precalculated cubes make up a hypercube. Tools that incorporate MOLAP include
Oracle Essbase, IBM Cognos, and Apache Kylin.

Its simple interface makes MOLAP easy to use, even for inexperienced users. Its speedy data
retrieval makes it the best for “slicing and dicing” operations. One major disadvantage of MOLAP
is that it is less scalable than ROLAP, as it can handle a limited amount of data.

What is HOLAP?

HOLAP stands for Hybrid Online Analytical Processing. As the name suggests, the HOLAP
storage mode connects attributes of both MOLAP and ROLAP. Since HOLAP involves storing
part of your data in a ROLAP store and another part in a MOLAP store, developers get the benefits
of both.

With this use of the two OLAPs, the data is stored in both multidimensional databases and
relational databases. The decision to access one of the databases depends on which is most
appropriate for the requested processing application or type. This setup allows much more
flexibility for handling data. For theoretical processing, the data is stored in a multidimensional
database. For heavy processing, the data is stored in a relational database.

Microsoft Analysis Services and SAP AG BI Accelerator are products that run off HOLAP.

Conclusion: Thus we have created the cube with suitable dimension and fact tables based on ROLAP,
MOLAP and HOLAP model.
ASSIGNMENT NO 4
AIM: Import the data warehouse data in Microsoft Excel and create the Pivot table and Pivot
Chart.
OBJECTIVE: To import the data in excel and create the pivot table and chart.
SOFTWARE REQUIREMENT:
Power BI, SQL Server 2022, Visual Studio 2022, SQL Server Management Studio 18,
Microsoft analysis service
Operating System: Open Source Operating System
THEORY:
What is Pivot Chart?
Pivot Chart is an in-built programmed tool introduced to summarize your Pivot table data in an
interactive dataset in the excel spreadsheet. It's the visual representation of a pivot table in Excel,
or in other words, you can say that the Pivot table and Pivot Charts are linked with each other.

A pivot chart is a useful tool, especially when the user is dealing with large amounts of
data. For instance, an XYZ company has 200 employees. The HR has maintained each
candidate's working hours in Excel. Now they want to find the employee name who has
taken the minimum leave in the entire year so you can reward him for his sincerity and
devotion towards the company. If you manually browse through the entire list, it would
be time-consuming or fetched results could also be inaccurate. Don't worry, because
Microsoft Excel has provided a built-in feature named "pivot table" or a "pivot chart" to
cater to such tasks. Pivot Charts enable instant reorganization and understanding of
your data visually, facilitating the complete process.

Advantages of Pivot Charts

1. Pivot charts are a powerful way of interpreting data pictorially.


2. Pivot charts make the process of visualization of data effortless.
3. Pivot Charts are widely used for Data Analysis.
4. Pivot charts effectively facilitate various data conclusions and determine the basis
of statistical calculations.
5. Pivot Chars efficiently handle massive unsized raw data by correlating them
through Pivot filtering and Pivot slicing.
Limitations of Pivot Charts

o Pivot Charts doesn't allow you to create reports based on Multi-select / Checkbox
field types.
o Whenever you insert a new field in an existing Pivot Table for which a Pivot Chart
has already been created, Excel will automatically add the new field in the last
column. It is impossible to change this order or make the added field (column)
appear somewhere in the middle of the remaining columns.

Conclusion: Thus we have imported the data warehouse data in Microsoft Excel and create the
Pivot table and Pivot Chart.

You might also like