DW Layers

1. Source layer A data warehouse system uses heterogeneous sources of data.
That data is originally stored to
corporate relational databases or legacy databases, or it may come from information systems outside the
corporate walls.
2. Data staging The data stored to sources should be extracted, cleansed to remove inconsistencies and fill gaps,
and integrated to merge heterogeneous sources into one common schema. The so-called Extraction,
Transformation, and Loading tools (ETL) can merge heterogeneous schemata, extract, transform, cleanse,
validate, filter, and load source data into a data warehouse (Jarke et al., 2000). Technologically speaking, this
stage deals with problems that are typical for distributed information systems, such as inconsistent data
management and incompatible data structures (Zhuge et al., 1996). Section 1.4 deals with a few points that are
relevant to data staging.
3. Data warehouse layer Information is stored to one logically centralized single repository: a data warehouse. The
data warehouse can be directly accessed, but it can also be used as a source for creating data marts, which
partially replicate data warehouse contents and are designed for specific enterprise departments. Meta-data
repositories (section 1.6) store information on sources, access procedures, data staging, users, data mart
schemata, and so on.
4. Analysis In this layer, integrated data is efficiently and flexibly accessed to issue reports, dynamically analyze
information, and simulate hypothetical business scenarios. Technologically speaking, it should feature aggregate
data navigators, complex query optimizers, and user-friendly GUIs. Section 1.7 deals with different types of
decision-making support analyses.
What is ETL?
ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data from different RDBMS
source systems then transforms the data like applying calculations, concatenations, etc. and then load the data into the Data
Warehouse system.
It's tempting to think a creating a Data warehouse is simply extracting data from multiple sources and loading into database
of a Data warehouse. This is far from the truth and requires a complex ETL process. The ETL process requires active inputs
from various stakeholders including developers, analysts, testers, top executives and is technically challenging.
In order to maintain its value as a tool for decision-makers, Data warehouse system needs to change with business
changes. ETL is a recurring activity (daily, weekly, monthly) of a Data warehouse system and needs to be agile, automated,
and well documented.
In this tutorial, you will learn-
 What is ETL?
 Why do you need ETL?
 ETL Process in Data Warehouses
 Step 1) Extraction
 Step 2) Transformation
 Step 3) Loading
 ETL tools
 Best practices ETL process
Why do you need ETL?

There are many reasons for adopting ETL in the organization:
 It helps companies to analyze their business data for taking critical business decisions.
 Transactional databases cannot answer complex business questions that can be answered by ETL.
 A Data Warehouse provides a common data repository
 ETL provides a method of moving the data from various sources into a data warehouse.
 As data sources change, the Data Warehouse will automatically update.
 Well-designed and documented ETL system is almost essential to the success of a Data Warehouse project.
 Allow verification of data transformation, aggregation and calculations rules.
 ETL process allows sample data comparison between the source and the target system.
 ETL process can perform complex transformations and requires the extra area to store the data.
 ETL helps to Migrate data into a Data Warehouse. Convert to the various formats and types to adhere to one
consistent system.
 ETL is a predefined process for accessing and manipulating source data into the target database.
 ETL offers deep historical context for the business.
 It helps to improve productivity because it codifies and reuses without a need for technical skills.
ETL Process in Data Warehouses

ETL is a 3-step process
Step 1) Extraction
In this step, data is extracted from the source system into the staging area. Transformations if any are done in staging area
so that performance of source system in not degraded. Also, if corrupted data is copied directly from the source into Data
warehouse database, rollback will be a challenge. Staging area gives an opportunity to validate extracted data before it
moves into the Data warehouse.
Data warehouse needs to integrate systems that have different
DBMS, Hardware, Operating Systems and Communication Protocols. Sources could include legacy applications like
Mainframes, customized applications, Point of contact devices like ATM, Call switches, text files, spreadsheets, ERP, data
from vendors, partners amongst others.
Hence one needs a logical data map before data is extracted and loaded physically. This data map describes the
relationship between sources and target data.
Three Data Extraction methods:
1. Full Extraction
2. Partial Extraction- without update notification.
3. Partial Extraction- with update notification
Irrespective of the method used, extraction should not affect performance and response time of the source systems. These
source systems are live production databases. Any slow down or locking could effect company's bottom line.
Some validations are done during Extraction:
 Reconcile records with the source data

 Make sure that no spam/unwanted data loaded
 Data type check
 Remove all types of duplicate/fragmented data
 Check whether all the keys are in place or not
Step 2) Transformation
Data extracted from source server is raw and not usable in its original form. Therefore it needs to be cleansed, mapped and
transformed. In fact, this is the key step where ETL process adds value and changes data such that insightful BI reports can
be generated.
In this step, you apply a set of functions on extracted data. Data that does not require any transformation is called as direct
move or pass through data.
In transformation step, you can perform customized operations on data. For instance, if the user wants sum-of-sales
revenue which is not in the database. Or if the first name and the last name in a table is in different columns. It is possible to
concatenate them before loading.
Following are Data Integrity Problems:
1. Different spelling of the same person like Jon, John, etc.

2. There are multiple ways to denote company name like Google, Google Inc.
3. Use of different names like Cleaveland, Cleveland.
4. There may be a case that different account numbers are generated by various applications for the same customer.
5. In some data required files remains blank
6. Invalid product collected at POS as manual entry can lead to mistakes.
Validations are done during this stage
 Filtering – Select only certain columns to load

 Using rules and lookup tables for Data standardization
 Character Set Conversion and encoding handling
 Conversion of Units of Measurements like Date Time Conversion, currency conversions, numerical conversions,
etc.
 Data threshold validation check. For example, age cannot be more than two digits.
 Data flow validation from the staging area to the intermediate tables.
 Required fields should not be left blank.
 Cleaning ( for example, mapping NULL to 0 or Gender Male to "M" and Female to "F" etc.)
 Split a column into multiples and merging multiple columns into a single column.
 Transposing rows and columns,
 Use lookups to merge data
 Using any complex data validation (e.g., if the first two columns in a row are empty then it automatically reject the
row from processing)
Step 3) Loading
Loading data into the target datawarehouse database is the last step of the ETL process. In a typical Data warehouse, huge
volume of data needs to be loaded in a relatively short period (nights). Hence, load process should be optimized for
performance.
In case of load failure, recover mechanisms should be configured to restart from the point of failure without data integrity
loss. Data Warehouse admins need to monitor, resume, cancel loads as per prevailing server performance.
Types of Loading:
 Initial Load — populating all the Data Warehouse tables

 Incremental Load — applying ongoing changes as when needed periodically.
 Full Refresh —erasing the contents of one or more tables and reloading with fresh data.
Load verification
 Ensure that the key field data is neither missing nor null.
 Test modeling views based on the target tables.
 Check that combined values and calculated measures.
 Data checks in dimension table as well as history table.
 Check the BI reports on the loaded fact and dimension table.
ETL tools
There are many Data Warehousing tools are available in the market. Here, are some most prominent one:
1. MarkLogic:
MarkLogic is a data warehousing solution which makes data integration easier and faster using an array of enterprise
features. It can query different types of data like documents, relationships, and metadata.
http://developer.marklogic.com/products
2. Oracle:
Oracle is the industry-leading database. It offers a wide range of choice of Data Warehouse solutions for both on-premises
and in the cloud. It helps to optimize customer experiences by increasing operational efficiency.
https://www.oracle.com/index.html
3. Amazon RedShift:
Amazon Redshift is Datawarehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard
SQL and existing BI tools. It also allows running complex queries against petabytes of structured data.
https://aws.amazon.com/redshift/?nc2=h_m1
Here is a complete list of useful Data warehouse Tools.
Best practices ETL process

Never try to cleanse all the data:
Every organization would like to have all the data clean, but most of them are not ready to pay to wait or not ready to wait.
To clean it all would simply take too long, so it is better not to try to cleanse all the data.
Never cleanse Anything:
Always plan to clean something because the biggest reason for building the Data Warehouse is to offer cleaner and more
reliable data.
Determine the cost of cleansing the data:
Before cleansing all the dirty data, it is important for you to determine the cleansing cost for every dirty data element.
To speed up query processing, have auxiliary views and indexes:
To reduce storage costs, store summarized data into disk tapes. Also, the trade-off between the volume of data to be stored
and its detailed usage is required. Trade-off at the level of granularity of data to decrease the storage costs.
Summary:
 ETL is an abbreviation of Extract, Transform and Load.

 ETL provides a method of moving the data from various sources into a data warehouse.
 In the first step extraction, data is extracted from the source system into the staging area.
 In the transformation step, the data extracted from source is cleansed and transformed .
 Loading data into the target datawarehouse is the last step of the ETL process.
ODS is the part of data warehouse architecture.where you are collecting and integrating the data and ensures the completeness and accuracy of the data,and it
provides the near data of warehouse data, ‫شركه تنظيف منازل بالرياضنقل عفش‬it is like "Instant mix food" for hungry people:-D. ODS provides the data for impatient
business analyst for analysis.
‫مؤسسات ن‬ Operational Data Store (ODS)
Definition - What does Operational Data Store (ODS) mean?
An operational data store (ODS) is a type of database that collects data from multiple sources for processing, after which it sends
the data to operational systems and data warehouses.
It provides a central interface or platform for all operational data used by enterprise systems and applications.
Techopedia explains Operational Data Store (ODS)

An ODS is used to store short term data or data currently in use by operational systems or applications, prior to storage in a data
warehouse or data repository. Thus, it serves as an intermediate database.
An ODS helps clean and organize data and ensure that it meets business and regulatory requirements. It only supports low level
data and allows for the application of limited queries.
Logical Extraction Methods

There are two kinds of logical extraction:
 Full Extraction
 Incremental Extraction
Full Extraction
The data is extracted completely from the source system. Since this extraction reflects all the data currently available on the source
system, there’s no need to keep track of changes to the data source since the last successful extraction. The source data will be
provided as-is and no additional logical information (for example, timestamps) is necessary on the source site. An example for a full
extraction may be an export file of a distinct table or a remote SQL statement scanning the complete source table.
Incremental Extraction
At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted. This event
may be the last time of extraction or a more complex business event like the last booking day of a fiscal period. To identify this delta
change there must be a possibility to identify all the changed information since this specific time event. This information can be
either provided by the source data itself like an application column, reflecting the last-changed timestamp or a change table where
an appropriate additional mechanism keeps track of the changes besides the originating transactions. In most cases, using the latter
method means adding extraction logic to the source system.
Full Extraction
In this method, data is completly extracted from the source system. The source data will be provided as-is and no additional
logical information is necessary on the source system. Since it is complete extraction, so no need to track source system for
changes.
For example, exporting complete table in the form of flat file.
Incremental Extraction
In incremental extraction, the changes in source data need to be tracked since the last successful extraction. Only these changes in
data will be extracted and then loaded. Identifying the last changed data itself is the complex process and involve many lo
1. Full load: entire data dump that takes place the first time a data source is loaded into the warehouse
2. Incremental load: delta between target and source data is dumped at regular intervals. The last extract date is stored so that only records added
after this date are loaded.

INSERT INTO table2
SELECT * FROM table1
WHERE condition;
INSERT INTO table2 (column1, column2, column3, ...)

SELECT column1, column2, column3, ...
FROM table1
WHERE condition;
CREATE TABLE new_table_name AS

SELECT column1, column2,...
FROM existing_table_name
WHERE ....;
INSERT INTO suppliers (supplier_id, supplier_name) VALUES (1000, 'IBM');
INSERT INTO suppliers (supplier_id, supplier_name) VALUES (2000, 'Microsoft');
INSERT INTO suppliers (supplier_id, supplier_name) VALUES (3000, 'Google');
Example - Insert into Multiple Tables

You can also use the INSERT ALL statement to insert multiple rows into more than one table in one command.
For example, if you wanted to insert into both the suppliers and customers table, you could run the following SQL statement:
INSERT ALL
INTO suppliers (supplier_id, supplier_name) VALUES (1000, 'IBM')
INTO suppliers (supplier_id, supplier_name) VALUES (2000, 'Microsoft')
INTO customers (customer_id, customer_name, city) VALUES (999999, 'Anderson Construction', 'New York')
SELECT * FROM dual;
This example will insert 2
insert all
into T1 (Id) values (1)

SELECT * FROM dual;



 WEB HOSTING
 PRO HOSTING
 VPS & CLOUD
 DOMAINS
 DEDICATED SERVERS
 COLOCATION
 Home
 >
 kb
 >
 Linux Cat Command Usage with Examples
 ABOUT US
 BLOG
 LOGIN
 SUPPORT
 CONTACT
Linux Cat Command Usage with Examples

The cat command (short for “concatenate “) is one of the most frequently used command in Linux/Unix, Apple Mac OS X operating
systems. cat command allows us to create single or multiple files, view contain of file, concatenate files and redirect output in terminal or
files. It is a standard Unix program used to concatenate and display files. The cat command display file contents to a screen. Cat command
concatenate FILE(s), or standard input, to standard output. With no FILE, or when FILE is -, it reads standard input. Also, you can use cat
command for quickly creating a file. The cat command can read and write data from standard input and output devices. It has three main
functions related to manipulating text files: creating them, displaying them, and combining them.
The cat command is used for:
Display text file on screen
Create a new text file
Read text file
Modifying file
File concatenation
The basic syntax of cat command is as follows:
$ cat filename
OR
$ cat > filename
OR
$ cat [options] filename
Cat Command Examples
1) To view a file using cat command, you can use the following command.
$ cat filename
2) You can create a new file with the name file1.txt using the following cat command and you can type the text you want to insert in the file.
Make sure you type ‘Ctrl-d’ at the end to save the file.
$ cat > file1.txt
This is my new file in Linux.
The cat command is very useful.
Thanks
Now you can display the contents of the file file1.txt by using the following command.
$ cat file1.txt
This is my new file in Linux.
The cat command is very useful.
Thanks
3) To create two sample files and you need to concatenate them, use the following command.
$ cat smaple1.txt
This is my first sample text file
$ cat sample2.txt
This is my second sample text file
Now you can concatenate these two files and can save to another file named sample3.txt. For this, use the below given command.
$ cat sample1.txt sample2.txt > sample3.txt
$ cat sample3.txt
4) To display contents of all txt files, use the following command.
$ cat *.txt
5) To display the contents of a file with line number, use the following command.
$ cat -n file1.txt
1 This is my new file in Linux.
2 The cat command is very useful.
3 Thanks
6) To copy the content of one file to another file, you can use the greater than ‘>’ symbol with the cat command.
$ cat file2.txt> file1.txt
7) To append the contents of one file to another, you can use the double greater than ‘>>’ symbol with the cat command.
$ cat sample1.txt >> sample2.txt
If you need any further assistance please contact our support department. Specifying the search string as a regular expression pattern.
grep "^[0-9].*" file.txt
This will search for the lines which starts with a number. Regular expressions is huge topic and I am not covering it here. This example is just for
providing the usage of regular expressions.
6. Checking for the whole words in a file.
By default, grep matches the given string/pattern even if it found as a substring in a file. The -w option to grep makes it match only the whole words.
grep -w "world" file.txt
7. Displaying the lines before the match.
Some times, if you are searching for an error in a log file; it is always good to know the lines around the error lines to know the cause of the error.
grep -B 2 "Error" file.txt
This will prints the matched lines along with the two lines before the matched lines.
8. Displaying the lines after the match.
grep -A 3 "Error" file.txt

This will display the matched lines along with the three lines after the matched lines.
9. Displaying the lines around the match
grep -C 5 "Error" file.txt
This will display the matched lines and also five lines before and after the matched lines.
10. Searching for a sting in all files recursively
You can search for a string in all the files under the current directory and sub-directories with the help -r option.
grep -r "string" *
11. Inverting the pattern match
You can display the lines that are not matched with the specified search sting pattern using the -v option.
grep -v "string" file.txt
12. Displaying the non-empty lines
You can remove the blank lines using the grep command.
grep -v "^$" file.txt
13. Displaying the count of number of matches.
We can find the number of lines that matches the given string/pattern
grep -c "sting" file.txt
14. Display the file names that matches the pattern.
We can just display the files that contains the given string/pattern.
grep -l "string" *
15. Display the file names that do not contain the pattern.
We can display the files which do not contain the matched string/pattern.
grep -L "string" *
16. Displaying only the matched pattern.

By default, grep displays the entire line which has the matched string. We can make the grep to display only the matched string by using the -o option.
grep -o "string" file.txt
17. Displaying the line numbers.
We can make the grep command to display the position of the line which contains the matched string in a file using the -n option
grep -n "string" file.txt
18. Displaying the position of the matched string in the line
The -b option allows the grep command to display the character position of the matched string in a file.
grep -o -b "string" file.txt
19. Matching the lines that start with a string
The ^ regular expression pattern specifies the start of a line. This can be used in grep to match the lines which start with the given string or pattern.
grep "^start" file.txt
20. Matching the lines that end with a string
The $ regular expression pattern specifies the end of a line. This can be used in grep to match the lines which end with the given string or pattern.
grep "end$" file.txt
If you like this post, please share it on google by clicking on the +1 button.
3 comments:
1.
Hemanth Gantepalli19 March, 2013 23:43
!grep command is not working on the HP-UX''s KSH.

Could you please write any alternative to this.
Reply
Replies
1.
Ashutosh Prasad02 April, 2013 23:24
You can use 'r grep', it will re run the last grep executed.
Reply
2.
Sumeet Kotwal01 April, 2017 04:53
This is a fantastic site to look for useful options for unix commands and useful concepts.
One correction:
13. Displaying the count of number of matches.
grep -c "sting" file.txt
It will count no of lines containing pattern, rather than counting no of matches.

Consider the below text file as an input.
>cat file.txt
unix is great os. unix is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.
Sed Command Examples
1. Replacing or substituting string
Sed command is mostly used to replace the text in a file. The below simple sed command replaces the word "unix" with "linux" in the file.
>sed 's/unix/linux/' file.txt
linux is great os. unix is opensource. unix is free os.
linuxlinux which one you choose.
Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string.
By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line.
2. Replacing the nth occurrence of a pattern in a line.
Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word
"unix" with "linux" in a line.
>sed 's/unix/linux/2' file.txt
unix is great os. linux is opensource. unix is free os.

3. Replacing all the occurrence of the pattern in a line.
The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.
>sed 's/unix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
4. Replacing from nth occurrence to all occurrences in a line.
Use the combination of /1, /2 etc and /g to replace all the patterns from the nth occurrence of a pattern in a line. The following sed command replaces
the third, fourth, fifth... "unix" word with "linux" word in a line.
>sed 's/unix/linux/3g' file.txt
unix is great os. unix is opensource. linux is free os.
5. Changing the slash (/) delimiter
You can use any delimiter other than the slash. As an example if you want to change the web url to another url as
>sed 's/http:\/\//www/' file.txt
In this case the url consists the delimiter character which we used. In that case you have to escape the slash with backslash character, otherwise the
substitution won't work.
Using too many backslashes makes the sed command look awkward. In this case we can change the delimiter to another character as shown in the
below example.
>sed 's_http://_www_' file.txt
>sed 's|http://|www|' file.txt
6. Using & as the matched string
There might be cases where you want to search for the pattern and replace that pattern by adding some extra characters to it. In such cases & comes
in handy. The & represents the matched string.
>sed 's/unix/{&}/' file.txt
{unix} is great os. unix is opensource. unix is free os.

{unix}linux which one you choose.
>sed 's/unix/{&&}/' file.txt
{unixunix} is great os. unix is opensource. unix is free os.
{unixunix}linux which one you choose.
7. Using \1,\2 and so on to \9
The first pair of parenthesis specified in the pattern represents the \1, the second represents the \2 and so on. The \1,\2 can be used in the
replacement string to make changes to the source string. As an example, if you want to replace the word "unix" in a line with twice as the word like
"unixunix" use the sed command as below.
>sed 's/$unix$/\1\1/' file.txt
unixunix is great os. unix is opensource. unix is free os.
unixunixlinux which one you choose.
The parenthesis needs to be escaped with the backslash character. Another example is if you want to switch the words "unixlinux" as "linuxunix", the
sed command is
>sed 's/$unix$$linux$/\2\1/' file.txt
linuxunix which one you choose.
Another example is switching the first three characters in a line
>sed 's/^$.$$.$$.$/\3\2\1/' file.txt
inux is great os. unix is opensource. unix is free os.
aelrn operating system.
inuxlinux which one you choose.
8. Duplicating the replaced line with /p flag
The /p print flag prints the replaced line twice on the terminal. If a line does not have the search pattern and is not replaced, then the /p prints that line
only once.
>sed 's/unix/linux/p' file.txt

9. Printing only the replaced lines
Use the -n option along with the /p print flag to display only the replaced lines. Here the -n option suppresses the duplicate rows generated by the /p
flag and prints the replaced lines only one time.
>sed -n 's/unix/linux/p' file.txt
If you use -n alone without /p, then the sed does not print anything.
10. Running multiple sed commands.
You can run multiple sed commands by piping the output of one sed command as input to another sed command.
>sed 's/unix/linux/' file.txt| sed 's/os/system/'
linux is great system. unix is opensource. unix is free os.
linuxlinux which one you chosysteme.
Sed provides -e option to run multiple sed commands in a single sed command. The above output can be achieved in a single sed command as shown
below.
>sed -e 's/unix/linux/' -e 's/os/system/' file.txt
linux is great system. unix is opensource. unix is free os.
linuxlinux which one you chosysteme.
11. Replacing string on a specific line number.
You can restrict the sed command to replace the string on a specific line number. An example is
>sed '3 s/unix/linux/' file.txt

The above sed command replaces the string only on the third line.
12. Replacing string on a range of lines.
You can specify a range of line numbers to the sed command for replacing a string.
>sed '1,3 s/unix/linux/' file.txt
Here the sed command replaces the lines with range from 1 to 3. Another example is
>sed '2,$ s/unix/linux/' file.txt
Here $ indicates the last line in the file. So the sed command replaces the text from second line to last line in the file.
13. Replace on a lines which matches a pattern.
You can specify a pattern to the sed command to match in a line. If the pattern match occurs, then only the sed command looks for the string to be
replaced and if it finds, then the sed command replaces the string.
>sed '/linux/ s/unix/centos/' file.txt
centoslinux which one you choose.
Here the sed command first looks for the lines which has the pattern "linux" and then replaces the word "unix" with "centos".
14. Deleting lines.
You can delete the lines a file by specifying the line number or a range or numbers.
>sed '2 d' file.txt
>sed '5,$ d' file.txt

1. Delete first line or header line
The d option in sed command is used to delete a line. The syntax for deleting a line is:
> sed 'Nd' file
Here N indicates Nth line in a file. In the following example, the sed command removes the first line in a file.
> sed '1d' file
unix
fedora
debian
ubuntu
2. Delete last line or footer line or trailer line
The following sed command is used to remove the footer line in a file. The $ indicates the last line of a file.
> sed '$d' file
linux
unix
fedora
debian
3. Delete particular line
This is similar to the first example. The below sed command removes the second line in a file.
> sed '2d' file
linux
fedora
debian
ubuntu
4. Delete range of lines
The sed command can be used to delete a range of lines. The syntax is shown below:
> sed 'm,nd' file
Here m and n are min and max line numbers. The sed command removes the lines from m to n in the file. The following sed command deletes the lines ranging from
2 to 4:
> sed '2,4d' file
linux
ubuntu
5. Delete lines other than the first line or header line
Use the negation (!) operator with d option in sed command. The following sed command removes all the lines except the header line.
> sed '1!d' file
linux
6. Delete lines other than last line or footer line
> sed '$!d' file
ubuntu
7. Delete lines other than the specified range
> sed '2,4!d' file
unix
fedora
debian
Here the sed command removes lines other than 2nd, 3rd and 4th.
8. Delete first and last line
You can specify the list of lines you want to remove in sed command with semicolon as a delimiter.
> sed '1d;$d' file
unix
fedora
debian
9. Delete empty lines or blank lines
> sed '/^$/d' file
The ^$ indicates sed command to delete empty lines. However, this sed do not remove the lines that contain spaces.
Sed Command to Delete Lines - Based on Pattern Match
In the following examples, the sed command deletes the lines in file which match the given pattern.
10. Delete lines that begin with specified character
> sed '/^u/d' file
linux
fedora
debian
^ is to specify the starting of the line. Above sed command removes all the lines that start with character 'u'.
11. Delete lines that end with specified character
> sed '/x$/d' file
fedora
debian
ubuntu
$ is to indicate the end of the line. The above command deletes all the lines that end with character 'x'.
12. Delete lines which are in upper case or capital letters
> sed '/^[A-Z]*$/d' file
13. Delete lines that contain a pattern
> sed '/debian/d' file
linux
unix
fedora
ubuntu
14. Delete lines starting from a pattern till the last line
> sed '/fedora/,$d' file
linux
unix
Here the sed command removes the line that matches the pattern fedora and also deletes all the lines to the end of the file which appear next to this matching line.
15. Delete last line only if it contains the pattern
> sed '${/ubuntu/d;}' file
linux
unix
fedora
debian
Here $ indicates the last line. If you want to delete Nth line only if it contains a pattern, then in place of $ place the line number.
Note: In all the above examples, the sed command prints the contents of the file on the unix or linux terminal by removing the lines. However the sed command
does not remove the lines from the source file. To Remove the lines from the source file itself, use the -i option with sed command.
> sed -i '1d' file
If you dont wish to delete the lines from the original source file you can redirect the output of the sed command to another file.
sed '1d' file > newfile
Recommended Reading:
Sed Command in Unix

Replace String With Awk / Sed Command in Unix
Grep Command in Unix
Awk Command
SSH Command in Unix
If you like this article, then please share it or click on the google +1 button.
9 comments: ) What is fact? What are the types of facts?
It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to
dimensions.
Types of facts are
 Additive Facts
 Semi-additive Facts
Non-additive FactsData mining is a analytical process designed to explore data.There are four main types in data mining.
 Regression (predictive)
 Association Rule Discovery (descriptive)
 Classification (predictive)
 Clustering (descriptive)
Business Intelligence is conversion for data into usable information for companies.
Data Mining is commonly defined as the analysis of data for relationships and
patterns that have not previously been discovered by applying statistical and
mathematical methods. Business intelligence (BI) describes processes and
procedures for systematically gathering, storing, analyzing, and providing access
to data to help enterprises in making better operative and strategic business
decisions. BI applications include the activities of decision support systems,
management information systems, query and reporting, online analytical
processing (OLAP), statistical analysis, forecasting, and data mining.
A Key Performance Indicator (KPI) is a measurable value that demonstrates how effectively a company is achieving key business objectives.
Organizations use KPIs to evaluate their success at reaching targets. Learn more: What is a key performance indicator (KPI)?
rstanding the MicroStrategy architecture

A MicroStrategy system is built around a three-tier or four-tier structure. The diagram
below illustrates a four-tier system.
 The first tier, at the bottom, consists of two databases: the data warehouse, which contains
the information that your users analyze; and the MicroStrategy metadata, which contains
information about your MicroStrategy projects. For an introduction to these databases,
see Storing information: the data warehouse and Indexing your data: MicroStrategy
metadata.
 The second tier consists of MicroStrategy Intelligence Server, which executes your
reports against the data warehouse. For an introduction to Intelligence Server,
see Processing your data: Intelligence Server.
If MicroStrategy Developer users connect via a two-tier project source (also called a direct
connection), they can access the data warehouse without Intelligence Server. For more
information on two-tier project sources, see Tying it all together: projects and project sources.
 The third tier in this system is MicroStrategy Web or Mobile Server, which delivers the reports
to a client. For an introduction to MicroStrategy Web, see Administering MicroStrategy Web
and Mobile.
 The last tier is the MicroStrategy Web client or MicroStrategy Mobile app, which
provides documents and reports to the users.
MicroStrategy Intelligence Server

MicroStrategy Intelligence Server is an analytical server optimized for enterprise querying, reporting, and OLAP analysis. The important
functions of MicroStrategy Intelligence Server are:
• Sharing objects
• Sharing data
• Managing the sharing of data and objects in a controlled and secure environment
• Protecting the information in the metadata
 2-Tier Architecture (Direct Mode)

 In the two tier architecture the client connects directly to the metadata and data
Warehouse. A 2-tier connection mode connects the project to the metadata via an
Open Database Connectivity (ODBC) data source name (DSN). The following
diagram shows the 2-tier architecture.


 3-Tier Architecture (Server Mode)
 In the 3-tier architecture the client connects to the metadata and the data
Warehouse through the Intelligence Server. Following diagram shows the 3-tier
architecture.



 Multi-Tier Architecture (Server Mode)
 In the Multi-tier architecture the client connects to the Intelligence Server (IS)
through the Web server fr
 Multi-Tier Architecture (Server Mode)
 In the Multi-tier architecture the client connects to the Intelligence Server (IS)
through the Web server from the web browser. The IS in turn connects to the
metadata and the data Warehouse. Following diagram shows the 3-tier
architecture.



 Difference between 2, 3, 4 tier connection?
 In 2 tier architecture, the Micro Strategy Desktop itself queries against the Data
warehouse and the Metadata without the Intermediate tier of the Intelligence
server.

 The 3 Tier architecture comprises an Intelligence server between Micro Strategy
Desktop and the data Warehouse and the Metadata.

 The 4 tier architecture is same as 3 tier except it has an additional component of
Microstate Web.

 Intelligence Server is the architectural foundation of the Micro Strategy platform.
It serves as a central point for the Micro Strategy metadata so you can manage
thousands of end user requests.
 You are very limited in what you can do with 2-tier architecture. Things like
clustering, mobile, distribution services, report services, OLAP services,
scheduling, governing, I cubes, and project administration are only available via
Intelligence Server.


MicroStrategy Intelligence Server
MicroStrategy Intelligence Server is an analytical server optimized for
enterprise querying, reporting, and OLAP analysis. The important functions
of MicroStrategy Intelligence Server are:
• Sharing objects
• Sharing data
• Managing the sharing of data and objects in a controlled and secure

environment
Ad by MongoDB
MongoDB Atlas: Built for your most sensitive workloads.
Automated MongoDB service built for the most sensitive workloads at any scale. Get started free.
Free trial at mongodb.com
3 Answers
A primary key is a special constraint on a column or set of columns. A primary key constraint ensures
that the column(s) so designated have no NULL values, and that every value is unique. Physically, a
primary key is implemented by the database system using a unique index, and all the columns in the
primary key must have been declared NOT NULL. A table may have only one primary key, but it may
be composite (consist of more than one column).
A surrogate key is any column or set of columns that can be declared as the primary key instead of a
"real" or natural key. Sometimes there can be several natural keys that could be declared as the
primary key, and these are all called candidate keys. So a surrogate is a candidate key. A table could
actually have more than one surrogate key, although this would be unusual. The most common type
of surrogate key is an increment integer, such as an auto_increment column in MySQL, or a
sequence in Oracle, or an identity column in SQL Server.
59Natural Keys
If a key’s attribute is used for identification independently of the database scheme, it is called a
Natural Key. In layman’s language, it means keys are natural if people use them, for example, SSN,
Invoice ID, Tax ID, Vehicle ID, person unique identifiers, etc. The attributes of a natural key always
exist in real world.
Pros:
 No additional Index is required.
 It can be used as a search key.
Cons:
 While using strings, joins are a bit slower as compared to the int data-type joins, storage is
more as well. Since storage is more, less data-values get stored per index page. Also, reading
strings is a two step process in some RDBMS: one to get the actual length of the string and
second to actually perform the read operation to get the value.
 Locking contentions can arise while using application driven generation mechanism for the
key.
 You can’t enter a record until the value is known since the value has some meaning.
Surrogate Keys
Surrogate keys have no “business” meaning and their only purpose is to identify a record in the
table. They are always generated independently of the current row data. Their generation can be
managed by the database system or the server itself.
Pros:
 Business Logic is not in the keys.
 Small 4-byte key (the surrogate key will most likely be an integer and SQL Server for example
requires only 4 bytes to store it, if a bigint, then 8 bytes).
 Joins are very fast.
 No locking contentions because of unique constraint (this refers to the waits that get
developed when two sessions are trying to insert the same unique business key) as the
surrogates get generated by the Database and are cached.
Cons:
 An additional index is needed.
 They cannot be used as a search key.
 If it is database controlled, for products that support multiple databases, different

implementations are needed, example: identity in SS2k, before triggers and sequences in
Oracle, identity/sequence in DB2 UDB.
 Always requires a join when browsing the child tables.
RecommendedAll
Sometimes the primary key is made up of real data and these are normally referred to as natural
keys, while other times the key is generated when a new record is inserted into a table.
When a primary key is generated at runtime, it is called a surrogate key. A surrogate key is typically a
numeric value.
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an
object in the database. The surrogate key is not derived from application data, unlike a natural (or
business) key which is derived from application data.
The surrogate is internally generated by the system but is nevertheless visible to the user or
application.The value contains no semantic meaning
Learn basics of SQL from this below video
370 Views
Upvote
urrogate keys are keys that have no business meaning and are solely used
to identify a record in the table.
Such keys are either database generated (example: Identity in SQL Server,
Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system
generated values (like generated via a table in the schema).
Surrogate keys are keys that have no business meaning and are solely used
to identify a record in the table.
Such keys are either database generated (example: Identity in SQL Server,
Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system
generated values (like generated via a table in the schema).
Surrogate key = Artificial key generated internally that has no real
meaning outside the Db (e.g. a UniqueIdentifier or Int with Identity
property set etc.). Implemented in SQL Sevrver by Primary Key
Constraints on a column(s).
Natural key = uniquely identifies an instance (or record) using real

meaningful data as provided to the database ( e.g. email address might
qualify as a natural key) . A natural key is not a database object in itself
but a column(s) that is a natural key can be enforced using unique
constraints, unique indexes or can be a primary key. Or it can jsut be
described a natural key in a data model with nothing specificly
implemented.
Candidate keys = A key that uniquely identifies an instance bu
Surrogate Key Overview

A surrogate key is a system generated (could be GUID, sequence, etc.)
value with no business meaning that is used to uniquely identify a record
in a table.
Types of Data Models

There are mainly three different types of data models:
1. Conceptual: This Data Model defines WHAT the system contains.
This model is typically created by Business stakeholders and Data
Architects. The purpose is to organize, scope and define business
concepts and rules.
2. Logical: Defines HOW the system should be implemented
regardless of the DBMS. This model is typically created by Data
Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.
3. Physical: This Data Model describes HOW the system will be
implemented using a specific DBMS system. This model is typically
created by DBA and developers. The purpose is actual
implementation of the database.
Conceptual Model
The main aim of this model is to establish the entities, their attributes, and
their relationships. In this Data modeling level, there is hardly any detail
available of the actual Database structure.
The 3 basic tenants of Data Model are
Entity: A real-world thing
Attribute: Characteristics or properties of an entity
Relationship: Dependency or association between two entities
For example:
 Customer and Product are two entities. Customer number and name
are attributes of the Customer entity
 Product name and price are attributes of product entity
 Sale is the relationship between the customer and product
Characteristics of a conceptual data model

 Offers Organisation-wide coverage of the business concepts.
 This type of Data Models are designed and developed for a business
audience.
 The conceptual model is developed independently of hardware
specifications like data storage capacity, location or software
specifications like DBMS vendor and technology. The focus is to
represent data as a user will see it in the "real world."
Conceptual data models known as Domain models create a common
vocabulary for all stakeholders by establishing basic concepts and scope.
Logical Data Model

Logical data models add further information to the conceptual model
elements. It defines the structure of the data elements and set the
relationships between them.
The advantage of the Logical data model is to provide a foundation to form
the base for the Physical model. However, the modeling structure remains
generic.
At this Data Modeling level, no primary or secondary key is defined. At
this Data modeling level, you need to verify and adjust the connector
details that were set earlier for relationships.
Characteristics of a Logical data model

 Describes data needs for a single project but could integrate with
other logical data models based on the scope of the project.
 Designed and developed independently from the DBMS.
 Data attributes will have datatypes with exact precisions and length.
 Normalization processes to the model is applied typically till 3NF.
Physical Data Model

A Physical Data Model describes the database specific implementation of
the data model. It offers an abstraction of the database and helps generate
schema. This is because of the richness of meta-data offered by a Physical
Data Model.
This type of Data model also helps to visualize database structure. It helps
to model database columns keys, constraints, indexes, triggers, and other
RDBMS features.
Characteristics of a physical data model:

 The physical data model describes data need for a single project or
application though it maybe integrated with other physical data
models based on project scope.
 Data Model contains relationships between tables that which
addresses cardinality and nullability of the relationships.
 Developed for a specific version of a DBMS, location, data storage
or technology to be used in the project.
 Columns should have exact datatypes, lengths assigned and default
values.
 Primary and Foreign keys, views, indexes, access profiles, and
authorizations, etc. are defined.
Advantages and Disadvantages of Data Model:

Advantages of Data model:
 The main goal of a designing data model is to make certain that data
objects offered by the functional team are represented accurately.
 The data model should be detailed enough to be used for building the
physical database.
 The information in the data model can be used for defining the
relationship between tables, primary and foreign keys, and stored
procedures.
 Data Model helps business to communicate the within and across
organizations.
 Data model helps to documents data mappings in ETL process
 Help to recognize correct sources of data to populate the model
Disadvantages of Data model:

 To developer Data model one should know physical data stored
characteristics.
 This is a navigational system produces complex application
development, management. Thus, it requires a knowledge of the
biographical truth.
 Even smaller change made in structure require modification in the
entire application.
 There is no set data manipulation language in DBMS.
Conclusion
 Data modeling is the process of developing data model for the data
to be stored in a Database.
 Data Models ensure consistency in naming conventions, default
values, semantics, security while ensuring quality of the data.
 Data Model structure helps to define the relational tables, primary
and foreign keys and stored procedures.
 There are three types of conceptual, logical, and physical.
 The main aim of conceptual model is to establish the entities, their
attributes, and their relationships.
 Logical data model defines the structure of the data elements and set
the relationships between them.
 A Physical Data Model describes the database specific
implementation of the data model.
 The main goal of a designing data model is to make certain that data
objects offered by the functional team are represented accurately.
 The biggest drawback is that even smaller change made in structure
require modification in the entire application.
 Prev
Report a Bug
 Next
YOU MIGHT LIKE:

DATA WAREHOUSING
An OLAP cube is a method of storing data in a multidimensional form,

generally for reporting purposes. In OLAP cubes, data (measures) are
categorized by dimensions. OLAP cubes are often pre-summarized across
dimensions to drastically improve query time over relational databases. he
FUNCTIONAL difference between these is how they information is stored. In all cases, the users see the data as a cube of dimensions and
facts.
ROLAP - detailed data is stored in a relational database in 3NF, star, or snowflake form. Queries must summarize data on the fly.
MOLAP - data is stored in multidimensional form - dimension and facts stored together. You can think of this a a persistent cube. Level of
detail is determined by the intersection of the dimension hierarchies.
HOALP - data is stored using a combination of relational and multi-dimensional storage. Summary data might persist as a cube, while detail
data is stored relationally, but transitioning between the two is invisible to the end-user.
ROLAP
Advatages -
The advantages of this model is it can handle a large amount of data and can leverage all the functionalities of the relational
database.
Disadvantages -
The disadvantages are that the performance is slow and each ROLAP report is an SQL query with all the limitations of the
genre. It is also limited by SQL functionalities
ROLAP vendors have tried to mitigate this problem by building into the tool out-of-the-box complex functions as well as
providing the users with an ability to define their own functions.
MOLAP
Advantages -
The advantages of this mode is that it provides excellent query performance and the cubes are built for fast data retrieval. All
calculations are pre-generated when the cube is created and can be easily applied while querying data.
Disadvantages -
The disadvantages of this model are that it can handle only a limited amount of data. Since all calculations have been pre-built
when the cube was created, the cube cannot be derived from a large volume of data. This deficiency can be bypassed by
including only summary level calculations while constructing the cube. This model also requires huge additional investment as
cube technology is proprietary and the knowledge base may not exist in the organization.
Key Differences Between ROLAP and MOLAP

1. ROLAP stands for Relational Online Analytical Processing whereas; MOLAP stands for
Multidimensional Online Analytical Processing.
2. In both the cases, ROLAP and MOLAP data is stored in the main warehouse. In ROLAP
data is directly fetched from the main warehouse whereas, in MOLAP data is fetched
from the proprietary databases MDDBs.
3. In ROLAP, data is stored in the form of relational tables but, in MOLAP data is stored in
the form of a multidimensional array made of data cubes.
4. ROLAP deals with large volumes of data whereas, MOLAP deals with limited data
summaries kept in MDDBs.
5. ROLAP engines use complex SQL to fetch data from the data warehouse. However,
MOLAP engine creates prefabricated and precalculated datacubes to present
multidimensional view of data to a user and to manage data sparsity in data cubes,
MOLAP uses Sparse matrix technology.
6. ROLAP engine creates a multidimensional view of data dynamically whereas, MOLAP
statically stores multidimensional view of data in proprietary databases MDDBs for a
user to view it from there.
7. As ROLAP creates a multidimensional view of data dynamically, it is slower than MOLAP
which do not waste time in creating a multidimensional view of data.
Conclusion:
Which one to opt between ROLAP and MOLAP depends upon the performance and complexity
of the query. MOLAP becomes the choice of a user if it wants the faste
MOLAP – Multidimensional OnLine Analytical Processes

(not to be confused with Mobile OnLine Analytical Processes)
MOLAP is the more traditional OLAP type. In MOLAP, both the source data and the aggregation calculations are stored in a multidimensional format. This
type is the fastest option for data retrieval, but it also requires the most storage space.
MOLAP systems are more optimized for fast query performance and retrieval of summarized data. The limitations in MOLAP are that it is not very scalable
and can only handle limited amounts of data since calculations are predefined in the cube.
ROLAP – Relational OnLine Analytical Processes

(not to be confused with Remote OnLine Analytical Processes)
ROLAP stores all data, including aggregations, in the source relational database. This type of storage is good for enterprises that need larger data
warehousing. ROLAP uses an SQL reporting tool to query data directly from the data warehouse.
ROLAP’s advantages include better scalability, enabling it to handle huge amounts of data, and the ability to efficiently manage both numeric and textual
data.
HOLAP – Hybrid OnLine Analytical Processes (Combination of MOLAP & ROLAP)

HOLAP attempts to combine the best features of MOLAP and ROLAP in a single system. HOLAP systems store larger amounts of data in relational tables,
and aggregations are stored in the pre-calculated cubes, offering better scalability, quick data processing and flexibility in accessing data sources.
Slice:-
Slice operation performs a selection on one dimension of the given cube, thus creates subset a cube. Below example depicts
how slice operation works-
Dice:-
Dice operation performs a selection on two or more dimension from a given cube and creates a sub-cube. Below example
depicts how slice operation works-
Roll-up:-
The roll-up operation (also called drill-up or aggregation operation) performs aggregation on a data cube, either by climbing
up a concept hierarchy for a dimension or by climbing down a concept hierarchy, i.e. dimension reduction. Below example
depicts how slice operation works-
Drill-down:-
Drill-down is the reverse operation of roll-up. It allows users to navigate among different levels of data i.e. most summarized
(up) to most details (down). Below example depicts how slice operation works-
Drill-down refers to the process of viewing data at a level of increased detail,
while roll-up refers to the process of viewing data with decreasing detail.Pivot:-
Pivot also known as rotation changes the dimensional rotation of the cube, i.e. rotates the axes to view the data from different
perspectives. The below cubes shows 2D representation of Pivot-
drill-down'' and ``roll-up'' operations. Drill-down refers to the process of viewing

data at a level of increased detail, while roll-up refers to the process of viewing
data with decreasing detail.
 Categories
o
o
o
o
o
o
 About
Oracle Scalar Functions
This SQL Tutorial provides you a summary of some of the most common
Oracle Scalar Functions. For this lesson’s exercises, use this link.
Oracle Scalar Functions allow you to perform different calculations on data values.
These functions operate on single rows only and produce one result per row. There
are different types of Scalar Functions, this tutorial covers the following:
 String functions – functions that perform operations on character values.

 Numeric functions – functions that perform operations on numeric values.
 Date functions – functions that perform operations on date values.
 Conversion functions – functions that convert column data types.
 NULL-related Functions – functions for handling null values.
Note: this tutorial focuses on Oracle Scalar Functions; for more details about
Group Functions, use this link.
Oracle String Functions
Syntax Description Function

Returns text strings CONCAT
SELECT CONCAT('Hello' , concatenated
1
'World')
2
FROM dual
3
-- Result: 'HelloWorld'
Returns the location of a INSTR
1 SELECT INSTR('hello' , 'e') substring in a string.
2 FROM dual
3 -- Result: 2
Returns the number of LENGTH
1 SELECT LENGTH('hello') characters of the specified
string expression.
2 FROM dual
3 -- Result: 5
Returns a character string after RTRIM

1 SELECT RTRIM(' hello ') truncating all trailing blanks.
2 FROM dual
3 -- Result: ' hello'

Returns a character expression LTRIM
1 SELECT LTRIM(' hello ') after it removes leading blanks.
2 FROM dual
3 -- Result: 'hello '

Replaces all occurrences of a REPLACE
SELECT REPLACE('hello' , 'e' , specified string value with
1
'$') another string value.
2 FROM dual
3 -- Result: 'h$llo'
Returns the reverse order of a REVERSE
1 SELECT REVERSE('hello') string value.
2 FROM dual
3 -- Result: 'olleh'
Returns part of a text. SUBSTR
1 SELECT SUBSTR('hello' , 2,3)
2 FROM dual
3 -- Result: 'ell'
Returns a character expression LOWER
1 SELECT LOWER('HELLO') after converting uppercase
character data to lowercase.
2 FROM dual
3 -- Result: 'hello'
Returns a character expression UPPER

1 SELECT UPPER('hello') with lowercase character data
converted to uppercase.
2 FROM dual
3 -- Result: 'HELLO'
Returns a character expression, INITCAP

1 SELECT INITCAP('hello') with the first letter of each
word in uppercase, all other
2 FROM dual letters in lowercase.
3 -- Result: 'Hello'
Oracle Date Functions

Returns a ADD_NONTH
1 SELECT ADD_MONTHS('05-JAN-2001' , 4) specified date S
with
2 FROM dual additional nmont
hs
3 -- Result : '05-MAY-2001'
Returns the value EXTRACT
SELECT EXTRACT (DAY FROM of a specified
1
SYSDATE) date.
2
FROM dual
3
-- Result : 16
Returns a LAST_DAY
1 SELECT LAST_DAY('15-AUG-2014') date representing
the last day of
2 FROM DUAL the month for
specified date.
3 -- Result: '31-AUG-2014'
Returns the count MONTHS_BE

SELECT MONTHS_BETWEEN('01-MAY- of months TWEEN
1
2010', '01-JAN-2010') between the
2 specified
FROM dual startdate and
3 enddate
-- Result : 4
returns the first NEXT_DAY

SELECT NEXT_DAY('30-AUG-2014' , weekday that is
1
'Sunday') greater than the
2 specified date.
FROM dual
3
-- Result: '31-AUG-2014'
Returns the SYSDATE()
1 SELECT SYSDATE current database
system date. This
2 FROM dual value is derived
from the
3 -- Result: (current date)
operating system
of the computer
on which the
instance of
Oracle is
.running
Oracle Numeric Functions

Returns an integer that is TRUNC
1 SELECT TRUNC(59.9) less than or equal to the
specified numeric
2 FROM dual
expression.
3 -- Result: 59
Returns an integer that is CEIL
1 SELECT CEIL(59.1) greater than, or equal to,
the specified numeric
2 FROM dual expression.
3 -- Result: 60
Returns a numeric value, ROUND
1 SELECT ROUND(59.9) rounded to the specified
length or precision.
2 FROM dual
3 -- Result: 60
5 SELECT ROUND(59.1)
6 FROM dual
7 -- Result: 59
Oracle Conversion Functions

Converts a date TO_CHAR
0 or number to a
1 string
0
2
0 SELECT TO_CHAR(1506)
3
FROM dual
0
4 -- Result : The string value '98'
0
5
SELECT TO_CHAR(1507, '$9,999')
0
FROM dual
6
-- Result : The string value '$1,507'
0
7
0 SELECT TO_CHAR(sysdate, 'dd/mm/yyyy')

8
FROM dual
0
9 -- Result : The string value '01/01/2015'
1
0
1
1
Converts a TO_DATE
SELECT TO_DATE('01-MAY-2015') string value to a
1
date
FROM dual
2
-- Result : The date value '01-MAY-2015'
3
SELECT TO_DATE('01/05/2015' ,
4
'dd/mm/yyyy')
5
FROM dual
6
-- Result : The date value : '01-MAY-2015'
Converts a TO_NUMB
1 SELECT TO_NUMBER('9432') string value to a ER
number
2 FROM dual
3 -- Result : The numeric value : 9432
5 SELECT TO_NUMBER('$9,324' , '$9,999')
6 -- Result : The numeric value 9324
Oracle NULL-Related Functions

Replaces NULL with the NVL
1 SELECT NVL(NULL,'Somevalue') specified replacement
value.
2 -- Result: Somevalue
Share this:
 Google
 Facebook
 LinkedIn
 Twitter
 Print
 Email

Leave a Reply
Login
Database Testing Checklist for test engineers

3
Some time back I thought to create a checklist for database testing to ensure we are covering every aspect in Testing. With the help of my colleagues, Neha and Ankit,we prepared this list. Add one more
column to this list for the result of each check.
Database Testing Checklist
# Check Point
1) Data Integrity
Is the complete data in the database is

1 stored in tables
2 Is the data well logically organized
3 Is the data stored in tables is correct
4 Is there any unnecessary data present
5 Is the data present in the correct table
Is the data present in correct field within

6 the table
Is the data stored correct with respect to

7 Front End updated data
Is LTRIM and RTRIM performed on

8 data before inserting data into database
2) Field Validations
Is ‘Allow Null’ condition removed at

database level for mandatory fields on
1 UI
Is ‘Null’ as value not allowed in

2 database
Is Field not mandatory while allowing

3 NULL values on that field
Is the Field length specified on UI same

as field length specified in table to store
4 same element from UI into database.
Is the length of each field of sufficient

5 size
Is the Data type of each field is as per

6 specifications
Does all similar fields have same names
7 across tables
Is there any computed field in the

8 Database
3) Constraints
Is required Primary key constraints are

1 created on the Tables
Is required Foreign key constraints are

2 created on the Tables
Are valid references are done for foreign

3 key
Is Data type of Primary key and the

corresponding foreign key same in two
4 tables
Does Primary key’s ‘Allow Null’

5 condition not allowed
Does Foreign key contains only not

6 NULL values
4) Stored Procedures/ Functions
1 Is proper coding conventions followed
2 Is proper handling done for exceptions
Are all conditions/loops covered by

3 input data
Does TRIM is applied when data is

4 fetched from Database
Does executing the Stored Procedure

5 manually gives the correct result
Does executing the Stored Procedure

manually updates the table fields as
6 expected
7
Does execution of the Stored Procedure
fires the required triggers
Are all the Stored Procedures/Functions

used by the application (i.e. no unused
8 stored procedures present)
Does Stored procedures have ‘Allow

Null’ condition checked at data base
9 level
Are all the Stored Procedures and

Functions successfully executed when
10 Database is blank
5) Triggers
Is proper coding conventions followed in

1 Triggers
Are the triggers executed for the

2 respective DML transactions
Does the trigger updates the data

3 correctly once executed
6) Indexes
Are required Clustered indexes created

1 on the tables
Are required Non Clustered indexes

2 created on the tables
7) Transactions
1 Are the transactions done correct
Is the data committed if the transaction

2 is successfully executed
Is the data rollbacked if the transaction is

3 not executed successfully
4 Is the data rollbacked if the transaction is

not executed successfully and multiple
Databases are invlolved in the
transaction
Are all the transactions executed by

5 using TRANSACTION keyword
8) Security
Is the data secured from unauthorized

1 access
Are different user roles created with

2 different permissions
3 Do all the users have access on Database
9) Performance
Does Database perform as expected

(within expected time) when query is
1 executed for less number of records

(within expected time) when query is
2 executed for large number of records

(within expected time) when multiple
3 users access same data
4 Is Performance Profiling done
5 Is Performance Benchmarking done
6 Is Query Execution Plan created
Is Database testing done when server is

7 behind Load Balancer
8 Is Database normalized
10
) Miscellaneous
1
Are the log events added in database for
all login events
Is it verified in SQL Profiler that queries

are executed only in Stored Procedures
2 (i.e. no direct visible query in Profiler)
3 Does scheduled jobs execute timely
4 Is Test Data Dev Tool available
11
) SQL Injection
1 Is the Query parameterized
2 Does URL contain any query details
12
) Backup and Recovery
1 Is timely backup of Database taken
Advertisements
NOV
21
ETL Test Scenarios and Test Cases

Based on my experience I prepared maximum test scenarios and test cases to validate the
ETL process. I will keep on update this content. Thanks
Test Scenario Test Cases
 Mapping doc Verify mapping doc whether corresponding ETL

validation information is provided or not. Change log should
maintain in every mapping doc.
Define the default test strategy If mapping docs are
missed out some optional information. Ex: data types
length etc
 Structure validation 1. Validate the source and target table structure against
corresponding mapping doc.
2. Source data type and Target data type should be

same.
3. Length of data types in both source and target should

be equal.
4. Verify that data field types and formats are specified
5. Source data type length should not less than the

target data type length.
6. Validate the name of columns in table against

mapping doc.
 Constraint Ensure the constraints are defined for specific table as

Validation expected.
 Data Consistency 1. The data type and length for a particular attribute
Issues may vary in files or tables though the semantic definition
is the same.
Example: Account number may be defined as:
Number (9) in one field or table and Varchar2(11) in
another table
2. Misuse of Integrity Constraints: When referential

integrity constraints are misused, foreign key values
may be left “dangling” or
inadvertently deleted.
Example: An account record is missing but dependent
records are not deleted.
 Data Completeness Ensures that all expected data is loaded in to target
Issues table
1. Compare records counts between source and target.
Check for any rejected records.
2. Check Data should not be truncated in the column of

target table.
3. Check boundary value analysis (ex: only >=2008 year

data has to load into the target)
4. Comparing unique values of key fields between source

data and data loaded to the warehouse. This is a
valuable technique that points out a variety of possible
data errors without doing a full validation on all fields.
 Data Correctness 1. Data that is misspelled or inaccurately recorded.

Issues
2. Null, non-unique, or out of range data may be stored
when the integrity constraints are disabled.
Example: The primary key constraint is disabled during
an import function. Data is entered into the existing
data with null unique identifiers.
 Data 1. Create a spread sheet of scenarios of input data and

Transformation expected results and validate these with the business
customer. This is an excellent requirements elicitation
step during design and could also be used as part of
testing.
2. Create test data that includes all scenarios. Utilize an

ETL developer to automate the entire process of
populating data sets with the scenario spread sheet to
permit versatility and mobility for the reason that
scenarios are likely to change.
3. Utilize data profiling results to compare range and

submission of values in each field between target and
source data.
4. Validate accurate processing of ETL generated fields;

for example, surrogate keys.
5. Validate that the data types within the warehouse are

the same as was specified in the data model or design.
6. Create data scenarios between tables that test
referential integrity.
7. Validate parent-to-child relationships in the data.

Create data scenarios that test the management of
orphaned child records.
 Data Quality 1. Number check: if in the source format of numbering

the columns are as xx_30 but if the target is only 30
then it has to load not pre_fix(xx_). we need to validate.
2. Date Check: They have to follow Date format and it

should be same across all the records. Standard format:
yyyy-mm-dd etc..
3. Precision Check: Precision value should display as

expected in the target table.
Example: In source 19.123456 but in the target it should
display as 19.123 or round of 20.
4. Data Check: Based on business logic, few record

which does not meet certain criteria should be filtered
out.
Example: only record whose date_sid >=2008 and
GLAccount != ‘CM001’ should only load in the target
table.
5. Null Check: Few columns should display “Null” based

on business requirement.
Example: Termination Date column should display null
unless & until if his “Active status” Column is “T” or
“Deceased”.
Note: Data cleanness will be decided during design
phase only.
 Null Validation Verify the null values where "Not Null" specified for
specified column.
 Duplicate check 1. Needs to validate the unique key, primary key and
any other column should be unique as per the business
requirements are having any duplicate rows.
2. Check if any duplicate values exist in any column

which is extracting from multiple columns in source and
combining into one column.
3. Some time as per the client requirements we needs

ensure that no duplicates in combination of multiple
columns within target only.
Example: One policy holder can take multiple polices

and multiple claims. In this case we need to verify the
CLAIM_NO, CLAIMANT_NO, COVEREGE_NAME,
EXPOSURE_TYPE, EXPOSURE_OPEN_DATE,
EXPOSURE_CLOSED_DATE, EXPOSURE_STATUS,
PAYMENT
 DATE Validation Date values are using many areas in ETL development
for:
1. To know the row creation date ex: CRT_TS
2. Identify active records as per the ETL development

perspective Ex: VLD_FROM, VLD_TO
3. Identify active records as per the business

requirements perspective Ex: CLM_EFCTV_T_TS,
CLM_EFCTV_FROM_TS
4. Sometimes based on the date values the updates and

inserts are generated.
Possible Test scenarios to validate the Date values:
a. From_Date should not greater than To_Date

b. Format of date values should be proper.
c. Date values should not any junk values or null values
 Complete Data 1. To validate the complete data set in source and target
Validation table minus query is best solution.
(using minus and
intersect)
2. We need to source minus target and target minus
source.
3. If minus query returns any value those should be

considered as mismatching rows.
4. And also we needs to matching rows among source

and target using Intersect statement.
5. The count returned by intersect should match with

individual counts of source and target tables.
6. If minus query returns o rows and count intersect is

less than source count or target table count then we can
considered as duplicate rows are exists.
 Some Useful test 1. Verify that extraction process did not extract duplicate
scenarios data from the source (usually this happens in repeatable
processes where at point zero we need to extract all
data from the source file, but the during the next
intervals we only need to capture the modified, and new
rows.)
2. The QA team will maintain a set of SQL statements

that are automatically run at this stage to validate that
no duplicate data have been extracted from the source
systems.
 Data cleanness Unnecessary columns should be deleted before loading

into the staging area.
Example2: If a column have name but it is taking extra

space , we have to “trim” space so before loading in the
staging area with the help of expression transformation
space will be trimmed.
Example1: Suppose telephone number and STD code in

different columns and requirement says it should be in
one column then with the help of expression
transformation we will concatenate the values in one
column.
Posted 21st November 2012 by satish kumar
Add a comment
ETL Testing
ETL Testing , Reporting Testing, Automation of ETL
Testing and PL/SQL Scripts
 Classic
 Flipcard
 Magazine
 Mosaic
 Sidebar
 Snapshot
 Timeslide
1.
2.
NOV
21
Upcoming Posts
1. Automation of ETL Testing using SQL and PL/SQL scripts.
2. How to test the complete data set during SCD type 2 implementation?
3. Effective methodologies to detect defects in the initial stage of ETL process.

Labels: Upcoming
View comments
3.
NOV
21
ETL Test Scenarios and Test Cases

Based on my experience I prepared maximum test scenarios and test cases to validate the
ETL process. I will keep on update this content. Thanks
Test Scenario Test Cases
 Mapping doc Verify mapping doc whether corresponding ETL

validation information is provided or not. Change log should
maintain in every mapping doc.
Define the default test strategy If mapping docs are
missed out some optional information. Ex: data types
length etc
 Structure validation 1. Validate the source and target table structure against
corresponding mapping doc.
2. Source data type and Target data type should be

same.
3. Length of data types in both source and target should

be equal.
4. Verify that data field types and formats are specified
5. Source data type length should not less than the

target data type length.
6. Validate the name of columns in table against

mapping doc.
 Constraint Ensure the constraints are defined for specific table as
Validation expected.
 Data Consistency 1. The data type and length for a particular attribute
Issues may vary in files or tables though the semantic definition
is the same.
Example: Account number may be defined as:
Number (9) in one field or table and Varchar2(11) in
another table
2. Misuse of Integrity Constraints: When referential

integrity constraints are misused, foreign key values
may be left “dangling” or
inadvertently deleted.
Example: An account record is missing but dependent
records are not deleted.
 Data Completeness Ensures that all expected data is loaded in to target

Issues table
1. Compare records counts between source and target.
Check for any rejected records.
2. Check Data should not be truncated in the column of

target table.
3. Check boundary value analysis (ex: only >=2008 year

data has to load into the target)
4. Comparing unique values of key fields between source

data and data loaded to the warehouse. This is a
valuable technique that points out a variety of possible
data errors without doing a full validation on all fields.
 Data Correctness 1. Data that is misspelled or inaccurately recorded.

Issues
2. Null, non-unique, or out of range data may be stored
when the integrity constraints are disabled.
Example: The primary key constraint is disabled during
an import function. Data is entered into the existing
data with null unique identifiers.
 Data 1. Create a spread sheet of scenarios of input data and
Transformation expected results and validate these with the business
customer. This is an excellent requirements elicitation
step during design and could also be used as part of
testing.
2. Create test data that includes all scenarios. Utilize an

ETL developer to automate the entire process of
populating data sets with the scenario spread sheet to
permit versatility and mobility for the reason that
scenarios are likely to change.
3. Utilize data profiling results to compare range and

submission of values in each field between target and
source data.
4. Validate accurate processing of ETL generated fields;

for example, surrogate keys.
5. Validate that the data types within the warehouse are

the same as was specified in the data model or design.
6. Create data scenarios between tables that test

referential integrity.
7. Validate parent-to-child relationships in the data.

Create data scenarios that test the management of
orphaned child records.
 Data Quality 1. Number check: if in the source format of numbering
the columns are as xx_30 but if the target is only 30
then it has to load not pre_fix(xx_). we need to validate.
2. Date Check: They have to follow Date format and it

should be same across all the records. Standard format:
yyyy-mm-dd etc..
3. Precision Check: Precision value should display as

expected in the target table.
Example: In source 19.123456 but in the target it should
display as 19.123 or round of 20.
4. Data Check: Based on business logic, few record

which does not meet certain criteria should be filtered
out.
Example: only record whose date_sid >=2008 and
GLAccount != ‘CM001’ should only load in the target
table.
5. Null Check: Few columns should display “Null” based

on business requirement.
Example: Termination Date column should display null
unless & until if his “Active status” Column is “T” or
“Deceased”.
Note: Data cleanness will be decided during design
phase only.
 Null Validation Verify the null values where "Not Null" specified for
specified column.
 Duplicate check 1. Needs to validate the unique key, primary key and
any other column should be unique as per the business
requirements are having any duplicate rows.
2. Check if any duplicate values exist in any column

which is extracting from multiple columns in source and
combining into one column.
3. Some time as per the client requirements we needs

ensure that no duplicates in combination of multiple
columns within target only.
Example: One policy holder can take multiple polices
and multiple claims. In this case we need to verify the
CLAIM_NO, CLAIMANT_NO, COVEREGE_NAME,
EXPOSURE_TYPE, EXPOSURE_OPEN_DATE,
EXPOSURE_CLOSED_DATE, EXPOSURE_STATUS,
PAYMENT
 DATE Validation Date values are using many areas in ETL development
for:
1. To know the row creation date ex: CRT_TS
2. Identify active records as per the ETL development

perspective Ex: VLD_FROM, VLD_TO
3. Identify active records as per the business

requirements perspective Ex: CLM_EFCTV_T_TS,
CLM_EFCTV_FROM_TS
4. Sometimes based on the date values the updates and

inserts are generated.
Possible Test scenarios to validate the Date values:
a. From_Date should not greater than To_Date

b. Format of date values should be proper.
c. Date values should not any junk values or null values
 Complete Data 1. To validate the complete data set in source and target
Validation table minus query is best solution.
(using minus and
intersect)
2. We need to source minus target and target minus
source.
3. If minus query returns any value those should be

considered as mismatching rows.
4. And also we needs to matching rows among source

and target using Intersect statement.
5. The count returned by intersect should match with

individual counts of source and target tables.
6. If minus query returns o rows and count intersect is

less than source count or target table count then we can
considered as duplicate rows are exists.
 Some Useful test 1. Verify that extraction process did not extract duplicate
scenarios data from the source (usually this happens in repeatable
processes where at point zero we need to extract all
data from the source file, but the during the next
intervals we only need to capture the modified, and new
rows.)
2. The QA team will maintain a set of SQL statements

that are automatically run at this stage to validate that
no duplicate data have been extracted from the source
systems.
 Data cleanness Unnecessary columns should be deleted before loading

into the staging area.
Example2: If a column have name but it is taking extra

space , we have to “trim” space so before loading in the
staging area with the help of expression transformation
space will be trimmed.
Example1: Suppose telephone number and STD code in

different columns and requirement says it should be in
one column then with the help of expression
transformation we will concatenate the values in one
column.
Add a comment

Loading
Dynamic Views theme. Powered by Blogger.

DW Layers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DW Layers

Uploaded by

Copyright:

Available Formats

1. Source layer A data warehouse system uses heterogeneous sources of data.

That data is originally stored to

relevant to data staging.

schemata, and so on.

decision-making support analyses.

In this tutorial, you will learn-

Why do you need ETL?

ETL Process in Data Warehouses

Data warehouse needs to integrate systems that have different

Three Data Extraction methods:

Some validations are done during Extraction:

 Reconcile records with the source data

Following are Data Integrity Problems:

1. Different spelling of the same person like Jon, John, etc.

Validations are done during this stage

 Filtering – Select only certain columns to load

 Initial Load — populating all the Data Warehouse tables

Here is a complete list of useful Data warehouse Tools.

Best practices ETL process

Never cleanse Anything:

Determine the cost of cleansing the data:

To speed up query processing, have auxiliary views and indexes:

 ETL is an abbreviation of Extract, Transform and Load.

Techopedia explains Operational Data Store (ODS)

Logical Extraction Methods

For example, exporting complete table in the form of flat file.

after this date are loaded.

INSERT INTO table2 (column1, column2, column3, ...)

CREATE TABLE new_table_name AS

INSERT INTO suppliers (supplier_id, supplier_name) VALUES (1000, 'IBM');

INSERT INTO suppliers (supplier_id, supplier_name) VALUES (2000, 'Microsoft');

INSERT INTO suppliers (supplier_id, supplier_name) VALUES (3000, 'Google');

Example - Insert into Multiple Tables

This example will insert 2

into T1 (Id) values (1)

into T1 (Id) values (1)

into T1 (Id) values (0)

into T1 (Id) values (1)

into T1 (Id) values (1)

into T1 (Id) values (1)

into T1 (Id) values (0)

into T1 (Id) values (1)

Linux Cat Command Usage with Examples

The cat command is used for:

Display text file on screen

Create a new text file

Read text file

The basic syntax of cat command is as follows:

$ cat [options] filename

Cat Command Examples

$ cat > file1.txt

This is my new file in Linux.

The cat command is very useful.

This is my new file in Linux.

The cat command is very useful.

This is my first sample text file

This is my second sample text file

$ cat sample1.txt sample2.txt > sample3.txt

This is my first sample text file

This is my second sample text file

4) To display contents of all txt files, use the following command.

This is my second sample text file

1 This is my new file in Linux.