You are on page 1of 70

1. Source layer A data warehouse system uses heterogeneous sources of data.

That data is originally stored to

corporate relational databases or legacy databases, or it may come from information systems outside the

corporate walls.

2. Data staging The data stored to sources should be extracted, cleansed to remove inconsistencies and fill gaps,

and integrated to merge heterogeneous sources into one common schema. The so-called Extraction,

Transformation, and Loading tools (ETL) can merge heterogeneous schemata, extract, transform, cleanse,

validate, filter, and load source data into a data warehouse (Jarke et al., 2000). Technologically speaking, this

stage deals with problems that are typical for distributed information systems, such as inconsistent data

management and incompatible data structures (Zhuge et al., 1996). Section 1.4 deals with a few points that are

relevant to data staging.

3. Data warehouse layer Information is stored to one logically centralized single repository: a data warehouse. The

data warehouse can be directly accessed, but it can also be used as a source for creating data marts, which

partially replicate data warehouse contents and are designed for specific enterprise departments. Meta-data

repositories (section 1.6) store information on sources, access procedures, data staging, users, data mart

schemata, and so on.

4. Analysis In this layer, integrated data is efficiently and flexibly accessed to issue reports, dynamically analyze

information, and simulate hypothetical business scenarios. Technologically speaking, it should feature aggregate

data navigators, complex query optimizers, and user-friendly GUIs. Section 1.7 deals with different types of

decision-making support analyses.

What is ETL?
ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data from different RDBMS
source systems then transforms the data like applying calculations, concatenations, etc. and then load the data into the Data
Warehouse system.

It's tempting to think a creating a Data warehouse is simply extracting data from multiple sources and loading into database
of a Data warehouse. This is far from the truth and requires a complex ETL process. The ETL process requires active inputs
from various stakeholders including developers, analysts, testers, top executives and is technically challenging.

In order to maintain its value as a tool for decision-makers, Data warehouse system needs to change with business
changes. ETL is a recurring activity (daily, weekly, monthly) of a Data warehouse system and needs to be agile, automated,
and well documented.

In this tutorial, you will learn-

 What is ETL?
 Why do you need ETL?
 ETL Process in Data Warehouses
 Step 1) Extraction
 Step 2) Transformation
 Step 3) Loading
 ETL tools
 Best practices ETL process

Why do you need ETL?


There are many reasons for adopting ETL in the organization:

 It helps companies to analyze their business data for taking critical business decisions.
 Transactional databases cannot answer complex business questions that can be answered by ETL.
 A Data Warehouse provides a common data repository
 ETL provides a method of moving the data from various sources into a data warehouse.
 As data sources change, the Data Warehouse will automatically update.
 Well-designed and documented ETL system is almost essential to the success of a Data Warehouse project.
 Allow verification of data transformation, aggregation and calculations rules.
 ETL process allows sample data comparison between the source and the target system.
 ETL process can perform complex transformations and requires the extra area to store the data.
 ETL helps to Migrate data into a Data Warehouse. Convert to the various formats and types to adhere to one
consistent system.
 ETL is a predefined process for accessing and manipulating source data into the target database.
 ETL offers deep historical context for the business.
 It helps to improve productivity because it codifies and reuses without a need for technical skills.

ETL Process in Data Warehouses


ETL is a 3-step process
Step 1) Extraction
In this step, data is extracted from the source system into the staging area. Transformations if any are done in staging area
so that performance of source system in not degraded. Also, if corrupted data is copied directly from the source into Data
warehouse database, rollback will be a challenge. Staging area gives an opportunity to validate extracted data before it
moves into the Data warehouse.

Data warehouse needs to integrate systems that have different

DBMS, Hardware, Operating Systems and Communication Protocols. Sources could include legacy applications like
Mainframes, customized applications, Point of contact devices like ATM, Call switches, text files, spreadsheets, ERP, data
from vendors, partners amongst others.

Hence one needs a logical data map before data is extracted and loaded physically. This data map describes the
relationship between sources and target data.

Three Data Extraction methods:

1. Full Extraction
2. Partial Extraction- without update notification.
3. Partial Extraction- with update notification

Irrespective of the method used, extraction should not affect performance and response time of the source systems. These
source systems are live production databases. Any slow down or locking could effect company's bottom line.

Some validations are done during Extraction:

 Reconcile records with the source data


 Make sure that no spam/unwanted data loaded
 Data type check
 Remove all types of duplicate/fragmented data
 Check whether all the keys are in place or not

Step 2) Transformation
Data extracted from source server is raw and not usable in its original form. Therefore it needs to be cleansed, mapped and
transformed. In fact, this is the key step where ETL process adds value and changes data such that insightful BI reports can
be generated.

In this step, you apply a set of functions on extracted data. Data that does not require any transformation is called as direct
move or pass through data.

In transformation step, you can perform customized operations on data. For instance, if the user wants sum-of-sales
revenue which is not in the database. Or if the first name and the last name in a table is in different columns. It is possible to
concatenate them before loading.

Following are Data Integrity Problems:

1. Different spelling of the same person like Jon, John, etc.


2. There are multiple ways to denote company name like Google, Google Inc.
3. Use of different names like Cleaveland, Cleveland.
4. There may be a case that different account numbers are generated by various applications for the same customer.
5. In some data required files remains blank
6. Invalid product collected at POS as manual entry can lead to mistakes.

Validations are done during this stage

 Filtering – Select only certain columns to load


 Using rules and lookup tables for Data standardization
 Character Set Conversion and encoding handling
 Conversion of Units of Measurements like Date Time Conversion, currency conversions, numerical conversions,
etc.
 Data threshold validation check. For example, age cannot be more than two digits.
 Data flow validation from the staging area to the intermediate tables.
 Required fields should not be left blank.
 Cleaning ( for example, mapping NULL to 0 or Gender Male to "M" and Female to "F" etc.)
 Split a column into multiples and merging multiple columns into a single column.
 Transposing rows and columns,
 Use lookups to merge data
 Using any complex data validation (e.g., if the first two columns in a row are empty then it automatically reject the
row from processing)

Step 3) Loading
Loading data into the target datawarehouse database is the last step of the ETL process. In a typical Data warehouse, huge
volume of data needs to be loaded in a relatively short period (nights). Hence, load process should be optimized for
performance.

In case of load failure, recover mechanisms should be configured to restart from the point of failure without data integrity
loss. Data Warehouse admins need to monitor, resume, cancel loads as per prevailing server performance.

Types of Loading:

 Initial Load — populating all the Data Warehouse tables


 Incremental Load — applying ongoing changes as when needed periodically.
 Full Refresh —erasing the contents of one or more tables and reloading with fresh data.

Load verification

 Ensure that the key field data is neither missing nor null.
 Test modeling views based on the target tables.
 Check that combined values and calculated measures.
 Data checks in dimension table as well as history table.
 Check the BI reports on the loaded fact and dimension table.

ETL tools
There are many Data Warehousing tools are available in the market. Here, are some most prominent one:

1. MarkLogic:

MarkLogic is a data warehousing solution which makes data integration easier and faster using an array of enterprise
features. It can query different types of data like documents, relationships, and metadata.
http://developer.marklogic.com/products

2. Oracle:

Oracle is the industry-leading database. It offers a wide range of choice of Data Warehouse solutions for both on-premises
and in the cloud. It helps to optimize customer experiences by increasing operational efficiency.

https://www.oracle.com/index.html

3. Amazon RedShift:

Amazon Redshift is Datawarehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard
SQL and existing BI tools. It also allows running complex queries against petabytes of structured data.

https://aws.amazon.com/redshift/?nc2=h_m1

Here is a complete list of useful Data warehouse Tools.

Best practices ETL process


Never try to cleanse all the data:

Every organization would like to have all the data clean, but most of them are not ready to pay to wait or not ready to wait.
To clean it all would simply take too long, so it is better not to try to cleanse all the data.

Never cleanse Anything:

Always plan to clean something because the biggest reason for building the Data Warehouse is to offer cleaner and more
reliable data.

Determine the cost of cleansing the data:

Before cleansing all the dirty data, it is important for you to determine the cleansing cost for every dirty data element.

To speed up query processing, have auxiliary views and indexes:

To reduce storage costs, store summarized data into disk tapes. Also, the trade-off between the volume of data to be stored
and its detailed usage is required. Trade-off at the level of granularity of data to decrease the storage costs.

Summary:

 ETL is an abbreviation of Extract, Transform and Load.


 ETL provides a method of moving the data from various sources into a data warehouse.
 In the first step extraction, data is extracted from the source system into the staging area.
 In the transformation step, the data extracted from source is cleansed and transformed .
 Loading data into the target datawarehouse is the last step of the ETL process.

ODS is the part of data warehouse architecture.where you are collecting and integrating the data and ensures the completeness and accuracy of the data,and it
provides the near data of warehouse data, ‫شركه تنظيف منازل بالرياضنقل عفش‬it is like "Instant mix food" for hungry people:-D. ODS provides the data for impatient
business analyst for analysis.
‫مؤسسات ن‬ Operational Data Store (ODS)
Definition - What does Operational Data Store (ODS) mean?
An operational data store (ODS) is a type of database that collects data from multiple sources for processing, after which it sends
the data to operational systems and data warehouses.
It provides a central interface or platform for all operational data used by enterprise systems and applications.

Techopedia explains Operational Data Store (ODS)


An ODS is used to store short term data or data currently in use by operational systems or applications, prior to storage in a data
warehouse or data repository. Thus, it serves as an intermediate database.
An ODS helps clean and organize data and ensure that it meets business and regulatory requirements. It only supports low level
data and allows for the application of limited queries.

Logical Extraction Methods


There are two kinds of logical extraction:

 Full Extraction
 Incremental Extraction

Full Extraction

The data is extracted completely from the source system. Since this extraction reflects all the data currently available on the source
system, there’s no need to keep track of changes to the data source since the last successful extraction. The source data will be
provided as-is and no additional logical information (for example, timestamps) is necessary on the source site. An example for a full
extraction may be an export file of a distinct table or a remote SQL statement scanning the complete source table.
Incremental Extraction

At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted. This event
may be the last time of extraction or a more complex business event like the last booking day of a fiscal period. To identify this delta
change there must be a possibility to identify all the changed information since this specific time event. This information can be
either provided by the source data itself like an application column, reflecting the last-changed timestamp or a change table where
an appropriate additional mechanism keeps track of the changes besides the originating transactions. In most cases, using the latter
method means adding extraction logic to the source system.

Full Extraction
In this method, data is completly extracted from the source system. The source data will be provided as-is and no additional
logical information is necessary on the source system. Since it is complete extraction, so no need to track source system for
changes.

For example, exporting complete table in the form of flat file.

Incremental Extraction
In incremental extraction, the changes in source data need to be tracked since the last successful extraction. Only these changes in
data will be extracted and then loaded. Identifying the last changed data itself is the complex process and involve many lo

1. Full load: entire data dump that takes place the first time a data source is loaded into the warehouse

2. Incremental load: delta between target and source data is dumped at regular intervals. The last extract date is stored so that only records added

after this date are loaded.


INSERT INTO table2
SELECT * FROM table1
WHERE condition;

INSERT INTO table2 (column1, column2, column3, ...)


SELECT column1, column2, column3, ...
FROM table1
WHERE condition;

CREATE TABLE new_table_name AS


SELECT column1, column2,...
FROM existing_table_name
WHERE ....;

INSERT INTO suppliers (supplier_id, supplier_name) VALUES (1000, 'IBM');

INSERT INTO suppliers (supplier_id, supplier_name) VALUES (2000, 'Microsoft');

INSERT INTO suppliers (supplier_id, supplier_name) VALUES (3000, 'Google');

Example - Insert into Multiple Tables


You can also use the INSERT ALL statement to insert multiple rows into more than one table in one command.
For example, if you wanted to insert into both the suppliers and customers table, you could run the following SQL statement:

INSERT ALL
INTO suppliers (supplier_id, supplier_name) VALUES (1000, 'IBM')
INTO suppliers (supplier_id, supplier_name) VALUES (2000, 'Microsoft')
INTO customers (customer_id, customer_name, city) VALUES (999999, 'Anderson Construction', 'New York')
SELECT * FROM dual;

This example will insert 2

insert all

into T1 (Id) values (1)

into T1 (Id) values (1)

into T1 (Id) values (0)

into T1 (Id) values (1)

into T1 (Id) values (1)

into T1 (Id) values (1)

into T1 (Id) values (0)

into T1 (Id) values (1)


SELECT * FROM dual;



 WEB HOSTING
 PRO HOSTING
 VPS & CLOUD
 DOMAINS
 DEDICATED SERVERS
 COLOCATION

 Home
 >
 kb
 >
 Linux Cat Command Usage with Examples

 ABOUT US
 BLOG
 LOGIN
 SUPPORT
 CONTACT

Linux Cat Command Usage with Examples


The cat command (short for “concatenate “) is one of the most frequently used command in Linux/Unix, Apple Mac OS X operating
systems. cat command allows us to create single or multiple files, view contain of file, concatenate files and redirect output in terminal or
files. It is a standard Unix program used to concatenate and display files. The cat command display file contents to a screen. Cat command
concatenate FILE(s), or standard input, to standard output. With no FILE, or when FILE is -, it reads standard input. Also, you can use cat
command for quickly creating a file. The cat command can read and write data from standard input and output devices. It has three main
functions related to manipulating text files: creating them, displaying them, and combining them.

The cat command is used for:

Display text file on screen

Create a new text file

Read text file

Modifying file

File concatenation

The basic syntax of cat command is as follows:

$ cat filename

OR
$ cat > filename

OR

$ cat [options] filename

Cat Command Examples

1) To view a file using cat command, you can use the following command.

$ cat filename

2) You can create a new file with the name file1.txt using the following cat command and you can type the text you want to insert in the file.
Make sure you type ‘Ctrl-d’ at the end to save the file.

$ cat > file1.txt

This is my new file in Linux.

The cat command is very useful.

Thanks

Now you can display the contents of the file file1.txt by using the following command.

$ cat file1.txt

This is my new file in Linux.

The cat command is very useful.

Thanks

3) To create two sample files and you need to concatenate them, use the following command.

$ cat smaple1.txt

This is my first sample text file

$ cat sample2.txt

This is my second sample text file

Now you can concatenate these two files and can save to another file named sample3.txt. For this, use the below given command.

$ cat sample1.txt sample2.txt > sample3.txt

$ cat sample3.txt

This is my first sample text file

This is my second sample text file

4) To display contents of all txt files, use the following command.

$ cat *.txt
This is my first sample text file

This is my second sample text file

5) To display the contents of a file with line number, use the following command.

$ cat -n file1.txt

1 This is my new file in Linux.

2 The cat command is very useful.

3 Thanks

6) To copy the content of one file to another file, you can use the greater than ‘>’ symbol with the cat command.

$ cat file2.txt> file1.txt

7) To append the contents of one file to another, you can use the double greater than ‘>>’ symbol with the cat command.

$ cat sample1.txt >> sample2.txt

If you need any further assistance please contact our support department. Specifying the search string as a regular expression pattern.

grep "^[0-9].*" file.txt

This will search for the lines which starts with a number. Regular expressions is huge topic and I am not covering it here. This example is just for
providing the usage of regular expressions.

6. Checking for the whole words in a file.

By default, grep matches the given string/pattern even if it found as a substring in a file. The -w option to grep makes it match only the whole words.

grep -w "world" file.txt

7. Displaying the lines before the match.

Some times, if you are searching for an error in a log file; it is always good to know the lines around the error lines to know the cause of the error.

grep -B 2 "Error" file.txt

This will prints the matched lines along with the two lines before the matched lines.

8. Displaying the lines after the match.

grep -A 3 "Error" file.txt


This will display the matched lines along with the three lines after the matched lines.

9. Displaying the lines around the match

grep -C 5 "Error" file.txt

This will display the matched lines and also five lines before and after the matched lines.

10. Searching for a sting in all files recursively

You can search for a string in all the files under the current directory and sub-directories with the help -r option.

grep -r "string" *

11. Inverting the pattern match

You can display the lines that are not matched with the specified search sting pattern using the -v option.

grep -v "string" file.txt

12. Displaying the non-empty lines

You can remove the blank lines using the grep command.

grep -v "^$" file.txt

13. Displaying the count of number of matches.

We can find the number of lines that matches the given string/pattern

grep -c "sting" file.txt

14. Display the file names that matches the pattern.

We can just display the files that contains the given string/pattern.

grep -l "string" *

15. Display the file names that do not contain the pattern.

We can display the files which do not contain the matched string/pattern.

grep -L "string" *

16. Displaying only the matched pattern.


By default, grep displays the entire line which has the matched string. We can make the grep to display only the matched string by using the -o option.

grep -o "string" file.txt

17. Displaying the line numbers.

We can make the grep command to display the position of the line which contains the matched string in a file using the -n option

grep -n "string" file.txt

18. Displaying the position of the matched string in the line

The -b option allows the grep command to display the character position of the matched string in a file.

grep -o -b "string" file.txt

19. Matching the lines that start with a string

The ^ regular expression pattern specifies the start of a line. This can be used in grep to match the lines which start with the given string or pattern.

grep "^start" file.txt

20. Matching the lines that end with a string

The $ regular expression pattern specifies the end of a line. This can be used in grep to match the lines which end with the given string or pattern.

grep "end$" file.txt

If you like this post, please share it on google by clicking on the +1 button.

3 comments:

1.

Hemanth Gantepalli19 March, 2013 23:43

!grep command is not working on the HP-UX''s KSH.


Could you please write any alternative to this.
Reply

Replies

1.
Ashutosh Prasad02 April, 2013 23:24

You can use 'r grep', it will re run the last grep executed.
Reply
2.

Sumeet Kotwal01 April, 2017 04:53

This is a fantastic site to look for useful options for unix commands and useful concepts.
One correction:
13. Displaying the count of number of matches.
grep -c "sting" file.txt

It will count no of lines containing pattern, rather than counting no of matches.


Consider the below text file as an input.

>cat file.txt

unix is great os. unix is opensource. unix is free os.

learn operating system.

unixlinux which one you choose.

Sed Command Examples

1. Replacing or substituting string

Sed command is mostly used to replace the text in a file. The below simple sed command replaces the word "unix" with "linux" in the file.

>sed 's/unix/linux/' file.txt

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string.

By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line.

2. Replacing the nth occurrence of a pattern in a line.

Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word
"unix" with "linux" in a line.

>sed 's/unix/linux/2' file.txt

unix is great os. linux is opensource. unix is free os.

learn operating system.

unixlinux which one you choose.


3. Replacing all the occurrence of the pattern in a line.

The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.

>sed 's/unix/linux/g' file.txt

linux is great os. linux is opensource. linux is free os.

learn operating system.

linuxlinux which one you choose.

4. Replacing from nth occurrence to all occurrences in a line.

Use the combination of /1, /2 etc and /g to replace all the patterns from the nth occurrence of a pattern in a line. The following sed command replaces
the third, fourth, fifth... "unix" word with "linux" word in a line.

>sed 's/unix/linux/3g' file.txt

unix is great os. unix is opensource. linux is free os.

learn operating system.

unixlinux which one you choose.

5. Changing the slash (/) delimiter

You can use any delimiter other than the slash. As an example if you want to change the web url to another url as

>sed 's/http:\/\//www/' file.txt

In this case the url consists the delimiter character which we used. In that case you have to escape the slash with backslash character, otherwise the
substitution won't work.

Using too many backslashes makes the sed command look awkward. In this case we can change the delimiter to another character as shown in the
below example.

>sed 's_http://_www_' file.txt

>sed 's|http://|www|' file.txt

6. Using & as the matched string

There might be cases where you want to search for the pattern and replace that pattern by adding some extra characters to it. In such cases & comes
in handy. The & represents the matched string.

>sed 's/unix/{&}/' file.txt

{unix} is great os. unix is opensource. unix is free os.

learn operating system.


{unix}linux which one you choose.

>sed 's/unix/{&&}/' file.txt

{unixunix} is great os. unix is opensource. unix is free os.

learn operating system.

{unixunix}linux which one you choose.

7. Using \1,\2 and so on to \9

The first pair of parenthesis specified in the pattern represents the \1, the second represents the \2 and so on. The \1,\2 can be used in the
replacement string to make changes to the source string. As an example, if you want to replace the word "unix" in a line with twice as the word like
"unixunix" use the sed command as below.

>sed 's/\(unix\)/\1\1/' file.txt

unixunix is great os. unix is opensource. unix is free os.

learn operating system.

unixunixlinux which one you choose.

The parenthesis needs to be escaped with the backslash character. Another example is if you want to switch the words "unixlinux" as "linuxunix", the
sed command is

>sed 's/\(unix\)\(linux\)/\2\1/' file.txt

unix is great os. unix is opensource. unix is free os.

learn operating system.

linuxunix which one you choose.

Another example is switching the first three characters in a line

>sed 's/^\(.\)\(.\)\(.\)/\3\2\1/' file.txt

inux is great os. unix is opensource. unix is free os.

aelrn operating system.

inuxlinux which one you choose.

8. Duplicating the replaced line with /p flag

The /p print flag prints the replaced line twice on the terminal. If a line does not have the search pattern and is not replaced, then the /p prints that line
only once.

>sed 's/unix/linux/p' file.txt


linux is great os. unix is opensource. unix is free os.

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

linuxlinux which one you choose.

9. Printing only the replaced lines

Use the -n option along with the /p print flag to display only the replaced lines. Here the -n option suppresses the duplicate rows generated by the /p
flag and prints the replaced lines only one time.

>sed -n 's/unix/linux/p' file.txt

linux is great os. unix is opensource. unix is free os.

linuxlinux which one you choose.

If you use -n alone without /p, then the sed does not print anything.

10. Running multiple sed commands.

You can run multiple sed commands by piping the output of one sed command as input to another sed command.

>sed 's/unix/linux/' file.txt| sed 's/os/system/'

linux is great system. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you chosysteme.

Sed provides -e option to run multiple sed commands in a single sed command. The above output can be achieved in a single sed command as shown
below.

>sed -e 's/unix/linux/' -e 's/os/system/' file.txt

linux is great system. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you chosysteme.

11. Replacing string on a specific line number.

You can restrict the sed command to replace the string on a specific line number. An example is

>sed '3 s/unix/linux/' file.txt

unix is great os. unix is opensource. unix is free os.


learn operating system.

linuxlinux which one you choose.

The above sed command replaces the string only on the third line.

12. Replacing string on a range of lines.

You can specify a range of line numbers to the sed command for replacing a string.

>sed '1,3 s/unix/linux/' file.txt

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

Here the sed command replaces the lines with range from 1 to 3. Another example is

>sed '2,$ s/unix/linux/' file.txt

linux is great os. unix is opensource. unix is free os.

learn operating system.

linuxlinux which one you choose.

Here $ indicates the last line in the file. So the sed command replaces the text from second line to last line in the file.

13. Replace on a lines which matches a pattern.

You can specify a pattern to the sed command to match in a line. If the pattern match occurs, then only the sed command looks for the string to be
replaced and if it finds, then the sed command replaces the string.

>sed '/linux/ s/unix/centos/' file.txt

unix is great os. unix is opensource. unix is free os.

learn operating system.

centoslinux which one you choose.

Here the sed command first looks for the lines which has the pattern "linux" and then replaces the word "unix" with "centos".

14. Deleting lines.

You can delete the lines a file by specifying the line number or a range or numbers.

>sed '2 d' file.txt

>sed '5,$ d' file.txt


1. Delete first line or header line

The d option in sed command is used to delete a line. The syntax for deleting a line is:

> sed 'Nd' file

Here N indicates Nth line in a file. In the following example, the sed command removes the first line in a file.

> sed '1d' file

unix

fedora

debian

ubuntu

2. Delete last line or footer line or trailer line

The following sed command is used to remove the footer line in a file. The $ indicates the last line of a file.

> sed '$d' file

linux

unix

fedora

debian

3. Delete particular line

This is similar to the first example. The below sed command removes the second line in a file.

> sed '2d' file

linux

fedora

debian

ubuntu

4. Delete range of lines

The sed command can be used to delete a range of lines. The syntax is shown below:
> sed 'm,nd' file

Here m and n are min and max line numbers. The sed command removes the lines from m to n in the file. The following sed command deletes the lines ranging from
2 to 4:

> sed '2,4d' file

linux

ubuntu

5. Delete lines other than the first line or header line

Use the negation (!) operator with d option in sed command. The following sed command removes all the lines except the header line.

> sed '1!d' file

linux

6. Delete lines other than last line or footer line

> sed '$!d' file

ubuntu

7. Delete lines other than the specified range

> sed '2,4!d' file

unix

fedora

debian

Here the sed command removes lines other than 2nd, 3rd and 4th.

8. Delete first and last line

You can specify the list of lines you want to remove in sed command with semicolon as a delimiter.

> sed '1d;$d' file

unix

fedora
debian

9. Delete empty lines or blank lines

> sed '/^$/d' file

The ^$ indicates sed command to delete empty lines. However, this sed do not remove the lines that contain spaces.

Sed Command to Delete Lines - Based on Pattern Match

In the following examples, the sed command deletes the lines in file which match the given pattern.

10. Delete lines that begin with specified character

> sed '/^u/d' file

linux

fedora

debian

^ is to specify the starting of the line. Above sed command removes all the lines that start with character 'u'.

11. Delete lines that end with specified character

> sed '/x$/d' file

fedora

debian

ubuntu

$ is to indicate the end of the line. The above command deletes all the lines that end with character 'x'.

12. Delete lines which are in upper case or capital letters

> sed '/^[A-Z]*$/d' file

13. Delete lines that contain a pattern

> sed '/debian/d' file

linux
unix

fedora

ubuntu

14. Delete lines starting from a pattern till the last line

> sed '/fedora/,$d' file

linux

unix

Here the sed command removes the line that matches the pattern fedora and also deletes all the lines to the end of the file which appear next to this matching line.

15. Delete last line only if it contains the pattern

> sed '${/ubuntu/d;}' file

linux

unix

fedora

debian

Here $ indicates the last line. If you want to delete Nth line only if it contains a pattern, then in place of $ place the line number.

Note: In all the above examples, the sed command prints the contents of the file on the unix or linux terminal by removing the lines. However the sed command
does not remove the lines from the source file. To Remove the lines from the source file itself, use the -i option with sed command.

> sed -i '1d' file

If you dont wish to delete the lines from the original source file you can redirect the output of the sed command to another file.

sed '1d' file > newfile

Recommended Reading:

Sed Command in Unix


Replace String With Awk / Sed Command in Unix
Grep Command in Unix
Awk Command
SSH Command in Unix
If you like this article, then please share it or click on the google +1 button.

9 comments: ) What is fact? What are the types of facts?

It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to
dimensions.

Types of facts are

 Additive Facts
 Semi-additive Facts

Non-additive FactsData mining is a analytical process designed to explore data.There are four main types in data mining.

 Regression (predictive)
 Association Rule Discovery (descriptive)
 Classification (predictive)
 Clustering (descriptive)
Business Intelligence is conversion for data into usable information for companies.

Data Mining is commonly defined as the analysis of data for relationships and
patterns that have not previously been discovered by applying statistical and
mathematical methods. Business intelligence (BI) describes processes and
procedures for systematically gathering, storing, analyzing, and providing access
to data to help enterprises in making better operative and strategic business
decisions. BI applications include the activities of decision support systems,
management information systems, query and reporting, online analytical
processing (OLAP), statistical analysis, forecasting, and data mining.

A Key Performance Indicator (KPI) is a measurable value that demonstrates how effectively a company is achieving key business objectives.
Organizations use KPIs to evaluate their success at reaching targets. Learn more: What is a key performance indicator (KPI)?

rstanding the MicroStrategy architecture


A MicroStrategy system is built around a three-tier or four-tier structure. The diagram
below illustrates a four-tier system.
 The first tier, at the bottom, consists of two databases: the data warehouse, which contains
the information that your users analyze; and the MicroStrategy metadata, which contains
information about your MicroStrategy projects. For an introduction to these databases,
see Storing information: the data warehouse and Indexing your data: MicroStrategy
metadata.

 The second tier consists of MicroStrategy Intelligence Server, which executes your
reports against the data warehouse. For an introduction to Intelligence Server,
see Processing your data: Intelligence Server.

If MicroStrategy Developer users connect via a two-tier project source (also called a direct
connection), they can access the data warehouse without Intelligence Server. For more
information on two-tier project sources, see Tying it all together: projects and project sources.
 The third tier in this system is MicroStrategy Web or Mobile Server, which delivers the reports
to a client. For an introduction to MicroStrategy Web, see Administering MicroStrategy Web
and Mobile.

 The last tier is the MicroStrategy Web client or MicroStrategy Mobile app, which
provides documents and reports to the users.

MicroStrategy Intelligence Server


MicroStrategy Intelligence Server is an analytical server optimized for enterprise querying, reporting, and OLAP analysis. The important
functions of MicroStrategy Intelligence Server are:

• Sharing objects

• Sharing data

• Managing the sharing of data and objects in a controlled and secure environment

• Protecting the information in the metadata

 2-Tier Architecture (Direct Mode)


 In the two tier architecture the client connects directly to the metadata and data
Warehouse. A 2-tier connection mode connects the project to the metadata via an
Open Database Connectivity (ODBC) data source name (DSN). The following
diagram shows the 2-tier architecture.



 3-Tier Architecture (Server Mode)
 In the 3-tier architecture the client connects to the metadata and the data
Warehouse through the Intelligence Server. Following diagram shows the 3-tier
architecture.



 Multi-Tier Architecture (Server Mode)
 In the Multi-tier architecture the client connects to the Intelligence Server (IS)
through the Web server fr
 Multi-Tier Architecture (Server Mode)
 In the Multi-tier architecture the client connects to the Intelligence Server (IS)
through the Web server from the web browser. The IS in turn connects to the
metadata and the data Warehouse. Following diagram shows the 3-tier
architecture.



 Difference between 2, 3, 4 tier connection?
 In 2 tier architecture, the Micro Strategy Desktop itself queries against the Data
warehouse and the Metadata without the Intermediate tier of the Intelligence
server.

 The 3 Tier architecture comprises an Intelligence server between Micro Strategy
Desktop and the data Warehouse and the Metadata.

 The 4 tier architecture is same as 3 tier except it has an additional component of
Microstate Web.

 Intelligence Server is the architectural foundation of the Micro Strategy platform.
It serves as a central point for the Micro Strategy metadata so you can manage
thousands of end user requests.
 You are very limited in what you can do with 2-tier architecture. Things like
clustering, mobile, distribution services, report services, OLAP services,
scheduling, governing, I cubes, and project administration are only available via
Intelligence Server.


MicroStrategy Intelligence Server
MicroStrategy Intelligence Server is an analytical server optimized for
enterprise querying, reporting, and OLAP analysis. The important functions
of MicroStrategy Intelligence Server are:

• Sharing objects

• Sharing data

• Managing the sharing of data and objects in a controlled and secure


environment

Ad by MongoDB

MongoDB Atlas: Built for your most sensitive workloads.

Automated MongoDB service built for the most sensitive workloads at any scale. Get started free.

Free trial at mongodb.com

3 Answers

A primary key is a special constraint on a column or set of columns. A primary key constraint ensures
that the column(s) so designated have no NULL values, and that every value is unique. Physically, a
primary key is implemented by the database system using a unique index, and all the columns in the
primary key must have been declared NOT NULL. A table may have only one primary key, but it may
be composite (consist of more than one column).

A surrogate key is any column or set of columns that can be declared as the primary key instead of a
"real" or natural key. Sometimes there can be several natural keys that could be declared as the
primary key, and these are all called candidate keys. So a surrogate is a candidate key. A table could
actually have more than one surrogate key, although this would be unusual. The most common type
of surrogate key is an increment integer, such as an auto_increment column in MySQL, or a
sequence in Oracle, or an identity column in SQL Server.

59Natural Keys

If a key’s attribute is used for identification independently of the database scheme, it is called a
Natural Key. In layman’s language, it means keys are natural if people use them, for example, SSN,
Invoice ID, Tax ID, Vehicle ID, person unique identifiers, etc. The attributes of a natural key always
exist in real world.

Pros:

 No additional Index is required.

 It can be used as a search key.

Cons:

 While using strings, joins are a bit slower as compared to the int data-type joins, storage is
more as well. Since storage is more, less data-values get stored per index page. Also, reading
strings is a two step process in some RDBMS: one to get the actual length of the string and
second to actually perform the read operation to get the value.

 Locking contentions can arise while using application driven generation mechanism for the
key.

 You can’t enter a record until the value is known since the value has some meaning.

Surrogate Keys

Surrogate keys have no “business” meaning and their only purpose is to identify a record in the
table. They are always generated independently of the current row data. Their generation can be
managed by the database system or the server itself.

Pros:
 Business Logic is not in the keys.

 Small 4-byte key (the surrogate key will most likely be an integer and SQL Server for example
requires only 4 bytes to store it, if a bigint, then 8 bytes).

 Joins are very fast.

 No locking contentions because of unique constraint (this refers to the waits that get
developed when two sessions are trying to insert the same unique business key) as the
surrogates get generated by the Database and are cached.

Cons:

 An additional index is needed.

 They cannot be used as a search key.

 If it is database controlled, for products that support multiple databases, different


implementations are needed, example: identity in SS2k, before triggers and sequences in
Oracle, identity/sequence in DB2 UDB.

 Always requires a join when browsing the child tables.

RecommendedAll
Sometimes the primary key is made up of real data and these are normally referred to as natural
keys, while other times the key is generated when a new record is inserted into a table.

When a primary key is generated at runtime, it is called a surrogate key. A surrogate key is typically a
numeric value.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an
object in the database. The surrogate key is not derived from application data, unlike a natural (or
business) key which is derived from application data.

The surrogate is internally generated by the system but is nevertheless visible to the user or
application.The value contains no semantic meaning

Learn basics of SQL from this below video

370 Views

Upvote

urrogate keys are keys that have no business meaning and are solely used
to identify a record in the table.

Such keys are either database generated (example: Identity in SQL Server,
Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system
generated values (like generated via a table in the schema).
Surrogate keys are keys that have no business meaning and are solely used
to identify a record in the table.

Such keys are either database generated (example: Identity in SQL Server,
Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system
generated values (like generated via a table in the schema).
Surrogate key = Artificial key generated internally that has no real
meaning outside the Db (e.g. a UniqueIdentifier or Int with Identity
property set etc.). Implemented in SQL Sevrver by Primary Key
Constraints on a column(s).

Natural key = uniquely identifies an instance (or record) using real


meaningful data as provided to the database ( e.g. email address might
qualify as a natural key) . A natural key is not a database object in itself
but a column(s) that is a natural key can be enforced using unique
constraints, unique indexes or can be a primary key. Or it can jsut be
described a natural key in a data model with nothing specificly
implemented.

Candidate keys = A key that uniquely identifies an instance bu

Surrogate Key Overview


A surrogate key is a system generated (could be GUID, sequence, etc.)
value with no business meaning that is used to uniquely identify a record
in a table.

Types of Data Models


There are mainly three different types of data models:
1. Conceptual: This Data Model defines WHAT the system contains.
This model is typically created by Business stakeholders and Data
Architects. The purpose is to organize, scope and define business
concepts and rules.
2. Logical: Defines HOW the system should be implemented
regardless of the DBMS. This model is typically created by Data
Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.
3. Physical: This Data Model describes HOW the system will be
implemented using a specific DBMS system. This model is typically
created by DBA and developers. The purpose is actual
implementation of the database.

Conceptual Model
The main aim of this model is to establish the entities, their attributes, and
their relationships. In this Data modeling level, there is hardly any detail
available of the actual Database structure.
The 3 basic tenants of Data Model are
Entity: A real-world thing
Attribute: Characteristics or properties of an entity
Relationship: Dependency or association between two entities
For example:
 Customer and Product are two entities. Customer number and name
are attributes of the Customer entity
 Product name and price are attributes of product entity
 Sale is the relationship between the customer and product

Characteristics of a conceptual data model


 Offers Organisation-wide coverage of the business concepts.
 This type of Data Models are designed and developed for a business
audience.
 The conceptual model is developed independently of hardware
specifications like data storage capacity, location or software
specifications like DBMS vendor and technology. The focus is to
represent data as a user will see it in the "real world."
Conceptual data models known as Domain models create a common
vocabulary for all stakeholders by establishing basic concepts and scope.

Logical Data Model


Logical data models add further information to the conceptual model
elements. It defines the structure of the data elements and set the
relationships between them.
The advantage of the Logical data model is to provide a foundation to form
the base for the Physical model. However, the modeling structure remains
generic.
At this Data Modeling level, no primary or secondary key is defined. At
this Data modeling level, you need to verify and adjust the connector
details that were set earlier for relationships.

Characteristics of a Logical data model


 Describes data needs for a single project but could integrate with
other logical data models based on the scope of the project.
 Designed and developed independently from the DBMS.
 Data attributes will have datatypes with exact precisions and length.
 Normalization processes to the model is applied typically till 3NF.

Physical Data Model


A Physical Data Model describes the database specific implementation of
the data model. It offers an abstraction of the database and helps generate
schema. This is because of the richness of meta-data offered by a Physical
Data Model.
This type of Data model also helps to visualize database structure. It helps
to model database columns keys, constraints, indexes, triggers, and other
RDBMS features.

Characteristics of a physical data model:


 The physical data model describes data need for a single project or
application though it maybe integrated with other physical data
models based on project scope.
 Data Model contains relationships between tables that which
addresses cardinality and nullability of the relationships.
 Developed for a specific version of a DBMS, location, data storage
or technology to be used in the project.
 Columns should have exact datatypes, lengths assigned and default
values.
 Primary and Foreign keys, views, indexes, access profiles, and
authorizations, etc. are defined.

Advantages and Disadvantages of Data Model:


Advantages of Data model:
 The main goal of a designing data model is to make certain that data
objects offered by the functional team are represented accurately.
 The data model should be detailed enough to be used for building the
physical database.
 The information in the data model can be used for defining the
relationship between tables, primary and foreign keys, and stored
procedures.
 Data Model helps business to communicate the within and across
organizations.
 Data model helps to documents data mappings in ETL process
 Help to recognize correct sources of data to populate the model

Disadvantages of Data model:


 To developer Data model one should know physical data stored
characteristics.
 This is a navigational system produces complex application
development, management. Thus, it requires a knowledge of the
biographical truth.
 Even smaller change made in structure require modification in the
entire application.
 There is no set data manipulation language in DBMS.

Conclusion
 Data modeling is the process of developing data model for the data
to be stored in a Database.
 Data Models ensure consistency in naming conventions, default
values, semantics, security while ensuring quality of the data.
 Data Model structure helps to define the relational tables, primary
and foreign keys and stored procedures.
 There are three types of conceptual, logical, and physical.
 The main aim of conceptual model is to establish the entities, their
attributes, and their relationships.
 Logical data model defines the structure of the data elements and set
the relationships between them.
 A Physical Data Model describes the database specific
implementation of the data model.
 The main goal of a designing data model is to make certain that data
objects offered by the functional team are represented accurately.
 The biggest drawback is that even smaller change made in structure
require modification in the entire application.

 Prev

Report a Bug
 Next

YOU MIGHT LIKE:


DATA WAREHOUSING

An OLAP cube is a method of storing data in a multidimensional form,


generally for reporting purposes. In OLAP cubes, data (measures) are
categorized by dimensions. OLAP cubes are often pre-summarized across
dimensions to drastically improve query time over relational databases. he
FUNCTIONAL difference between these is how they information is stored. In all cases, the users see the data as a cube of dimensions and
facts.

ROLAP - detailed data is stored in a relational database in 3NF, star, or snowflake form. Queries must summarize data on the fly.
MOLAP - data is stored in multidimensional form - dimension and facts stored together. You can think of this a a persistent cube. Level of
detail is determined by the intersection of the dimension hierarchies.

HOALP - data is stored using a combination of relational and multi-dimensional storage. Summary data might persist as a cube, while detail
data is stored relationally, but transitioning between the two is invisible to the end-user.

ROLAP

Advatages -
The advantages of this model is it can handle a large amount of data and can leverage all the functionalities of the relational
database.

Disadvantages -
The disadvantages are that the performance is slow and each ROLAP report is an SQL query with all the limitations of the
genre. It is also limited by SQL functionalities

ROLAP vendors have tried to mitigate this problem by building into the tool out-of-the-box complex functions as well as
providing the users with an ability to define their own functions.

MOLAP

Advantages -
The advantages of this mode is that it provides excellent query performance and the cubes are built for fast data retrieval. All
calculations are pre-generated when the cube is created and can be easily applied while querying data.

Disadvantages -
The disadvantages of this model are that it can handle only a limited amount of data. Since all calculations have been pre-built
when the cube was created, the cube cannot be derived from a large volume of data. This deficiency can be bypassed by
including only summary level calculations while constructing the cube. This model also requires huge additional investment as
cube technology is proprietary and the knowledge base may not exist in the organization.

Key Differences Between ROLAP and MOLAP


1. ROLAP stands for Relational Online Analytical Processing whereas; MOLAP stands for
Multidimensional Online Analytical Processing.
2. In both the cases, ROLAP and MOLAP data is stored in the main warehouse. In ROLAP
data is directly fetched from the main warehouse whereas, in MOLAP data is fetched
from the proprietary databases MDDBs.
3. In ROLAP, data is stored in the form of relational tables but, in MOLAP data is stored in
the form of a multidimensional array made of data cubes.
4. ROLAP deals with large volumes of data whereas, MOLAP deals with limited data
summaries kept in MDDBs.
5. ROLAP engines use complex SQL to fetch data from the data warehouse. However,
MOLAP engine creates prefabricated and precalculated datacubes to present
multidimensional view of data to a user and to manage data sparsity in data cubes,
MOLAP uses Sparse matrix technology.
6. ROLAP engine creates a multidimensional view of data dynamically whereas, MOLAP
statically stores multidimensional view of data in proprietary databases MDDBs for a
user to view it from there.
7. As ROLAP creates a multidimensional view of data dynamically, it is slower than MOLAP
which do not waste time in creating a multidimensional view of data.

Conclusion:
Which one to opt between ROLAP and MOLAP depends upon the performance and complexity
of the query. MOLAP becomes the choice of a user if it wants the faste

MOLAP – Multidimensional OnLine Analytical Processes


(not to be confused with Mobile OnLine Analytical Processes)
MOLAP is the more traditional OLAP type. In MOLAP, both the source data and the aggregation calculations are stored in a multidimensional format. This
type is the fastest option for data retrieval, but it also requires the most storage space.
MOLAP systems are more optimized for fast query performance and retrieval of summarized data. The limitations in MOLAP are that it is not very scalable
and can only handle limited amounts of data since calculations are predefined in the cube.

ROLAP – Relational OnLine Analytical Processes


(not to be confused with Remote OnLine Analytical Processes)
ROLAP stores all data, including aggregations, in the source relational database. This type of storage is good for enterprises that need larger data
warehousing. ROLAP uses an SQL reporting tool to query data directly from the data warehouse.
ROLAP’s advantages include better scalability, enabling it to handle huge amounts of data, and the ability to efficiently manage both numeric and textual
data.

HOLAP – Hybrid OnLine Analytical Processes (Combination of MOLAP & ROLAP)


HOLAP attempts to combine the best features of MOLAP and ROLAP in a single system. HOLAP systems store larger amounts of data in relational tables,
and aggregations are stored in the pre-calculated cubes, offering better scalability, quick data processing and flexibility in accessing data sources.

Slice:-
Slice operation performs a selection on one dimension of the given cube, thus creates subset a cube. Below example depicts
how slice operation works-
Dice:-
Dice operation performs a selection on two or more dimension from a given cube and creates a sub-cube. Below example
depicts how slice operation works-

Roll-up:-
The roll-up operation (also called drill-up or aggregation operation) performs aggregation on a data cube, either by climbing
up a concept hierarchy for a dimension or by climbing down a concept hierarchy, i.e. dimension reduction. Below example
depicts how slice operation works-
Drill-down:-
Drill-down is the reverse operation of roll-up. It allows users to navigate among different levels of data i.e. most summarized
(up) to most details (down). Below example depicts how slice operation works-
Drill-down refers to the process of viewing data at a level of increased detail,
while roll-up refers to the process of viewing data with decreasing detail.Pivot:-
Pivot also known as rotation changes the dimensional rotation of the cube, i.e. rotates the axes to view the data from different
perspectives. The below cubes shows 2D representation of Pivot-

drill-down'' and ``roll-up'' operations. Drill-down refers to the process of viewing


data at a level of increased detail, while roll-up refers to the process of viewing
data with decreasing detail.

 Categories
o
o
o
o
o
o

 About

Oracle Scalar Functions

This SQL Tutorial provides you a summary of some of the most common
Oracle Scalar Functions. For this lesson’s exercises, use this link.

Oracle Scalar Functions allow you to perform different calculations on data values.
These functions operate on single rows only and produce one result per row. There
are different types of Scalar Functions, this tutorial covers the following:

 String functions – functions that perform operations on character values.


 Numeric functions – functions that perform operations on numeric values.
 Date functions – functions that perform operations on date values.
 Conversion functions – functions that convert column data types.
 NULL-related Functions – functions for handling null values.

Note: this tutorial focuses on Oracle Scalar Functions; for more details about
Group Functions, use this link.
Oracle String Functions

Syntax Description Function


Returns text strings CONCAT
SELECT CONCAT('Hello' , concatenated
1
'World')
2
FROM dual
3
-- Result: 'HelloWorld'
Returns the location of a INSTR
1 SELECT INSTR('hello' , 'e') substring in a string.
2 FROM dual

3 -- Result: 2
Returns the number of LENGTH
1 SELECT LENGTH('hello') characters of the specified
string expression.
2 FROM dual

3 -- Result: 5

Returns a character string after RTRIM


1 SELECT RTRIM(' hello ') truncating all trailing blanks.
2 FROM dual

3 -- Result: ' hello'


Returns a character expression LTRIM
1 SELECT LTRIM(' hello ') after it removes leading blanks.
2 FROM dual

3 -- Result: 'hello '


Replaces all occurrences of a REPLACE
SELECT REPLACE('hello' , 'e' , specified string value with
1
'$') another string value.
2 FROM dual

3 -- Result: 'h$llo'
Returns the reverse order of a REVERSE
1 SELECT REVERSE('hello') string value.
2 FROM dual

3 -- Result: 'olleh'
Returns part of a text. SUBSTR
1 SELECT SUBSTR('hello' , 2,3)

2 FROM dual

3 -- Result: 'ell'
Returns a character expression LOWER
1 SELECT LOWER('HELLO') after converting uppercase
character data to lowercase.
2 FROM dual

3 -- Result: 'hello'

Returns a character expression UPPER


1 SELECT UPPER('hello') with lowercase character data
converted to uppercase.
2 FROM dual

3 -- Result: 'HELLO'

Returns a character expression, INITCAP


1 SELECT INITCAP('hello') with the first letter of each
word in uppercase, all other
2 FROM dual letters in lowercase.
3 -- Result: 'Hello'

Oracle Date Functions


Syntax Description Function
Returns a ADD_NONTH
1 SELECT ADD_MONTHS('05-JAN-2001' , 4) specified date S
with
2 FROM dual additional nmont
hs
3 -- Result : '05-MAY-2001'
Returns the value EXTRACT
SELECT EXTRACT (DAY FROM of a specified
1
SYSDATE) date.
2
FROM dual
3
-- Result : 16

Returns a LAST_DAY
1 SELECT LAST_DAY('15-AUG-2014') date representing
the last day of
2 FROM DUAL the month for
specified date.
3 -- Result: '31-AUG-2014'

Returns the count MONTHS_BE


SELECT MONTHS_BETWEEN('01-MAY- of months TWEEN
1
2010', '01-JAN-2010') between the
2 specified
FROM dual startdate and
3 enddate
-- Result : 4

returns the first NEXT_DAY


SELECT NEXT_DAY('30-AUG-2014' , weekday that is
1
'Sunday') greater than the
2 specified date.
FROM dual
3
-- Result: '31-AUG-2014'
Returns the SYSDATE()
1 SELECT SYSDATE current database
system date. This
2 FROM dual value is derived
from the
3 -- Result: (current date)
operating system
of the computer
on which the
instance of
Oracle is
.running

Oracle Numeric Functions

Syntax Description Function


Returns an integer that is TRUNC
1 SELECT TRUNC(59.9) less than or equal to the
specified numeric
2 FROM dual
expression.
3 -- Result: 59
Returns an integer that is CEIL
1 SELECT CEIL(59.1) greater than, or equal to,
the specified numeric
2 FROM dual expression.
3 -- Result: 60
Returns a numeric value, ROUND
1 SELECT ROUND(59.9) rounded to the specified
length or precision.
2 FROM dual

3 -- Result: 60

5 SELECT ROUND(59.1)
6 FROM dual

7 -- Result: 59

Oracle Conversion Functions

Syntax Description Function


Converts a date TO_CHAR
0 or number to a
1 string
0
2

0 SELECT TO_CHAR(1506)
3
FROM dual
0
4 -- Result : The string value '98'

0
5
SELECT TO_CHAR(1507, '$9,999')
0
FROM dual
6
-- Result : The string value '$1,507'
0
7

0 SELECT TO_CHAR(sysdate, 'dd/mm/yyyy')


8
FROM dual
0
9 -- Result : The string value '01/01/2015'
1
0

1
1
Converts a TO_DATE
SELECT TO_DATE('01-MAY-2015') string value to a
1
date
FROM dual
2
-- Result : The date value '01-MAY-2015'
3
SELECT TO_DATE('01/05/2015' ,
4
'dd/mm/yyyy')
5
FROM dual
6
-- Result : The date value : '01-MAY-2015'

Converts a TO_NUMB
1 SELECT TO_NUMBER('9432') string value to a ER
number
2 FROM dual

3 -- Result : The numeric value : 9432

5 SELECT TO_NUMBER('$9,324' , '$9,999')

6 -- Result : The numeric value 9324

Oracle NULL-Related Functions

Syntax Description Function


Replaces NULL with the NVL
1 SELECT NVL(NULL,'Somevalue') specified replacement
value.
2 -- Result: Somevalue

Share this:
 Google
 Facebook
 LinkedIn
 Twitter
 Print
 Email

Leave a Reply

Login

Database Testing Checklist for test engineers


3
Some time back I thought to create a checklist for database testing to ensure we are covering every aspect in Testing. With the help of my colleagues, Neha and Ankit,we prepared this list. Add one more
column to this list for the result of each check.
Database Testing Checklist

# Check Point

1) Data Integrity

Is the complete data in the database is


1 stored in tables

2 Is the data well logically organized

3 Is the data stored in tables is correct

4 Is there any unnecessary data present

5 Is the data present in the correct table

Is the data present in correct field within


6 the table

Is the data stored correct with respect to


7 Front End updated data

Is LTRIM and RTRIM performed on


8 data before inserting data into database

2) Field Validations

Is ‘Allow Null’ condition removed at


database level for mandatory fields on
1 UI

Is ‘Null’ as value not allowed in


2 database

Is Field not mandatory while allowing


3 NULL values on that field

Is the Field length specified on UI same


as field length specified in table to store
4 same element from UI into database.

Is the length of each field of sufficient


5 size

Is the Data type of each field is as per


6 specifications
Does all similar fields have same names
7 across tables

Is there any computed field in the


8 Database

3) Constraints

Is required Primary key constraints are


1 created on the Tables

Is required Foreign key constraints are


2 created on the Tables

Are valid references are done for foreign


3 key

Is Data type of Primary key and the


corresponding foreign key same in two
4 tables

Does Primary key’s ‘Allow Null’


5 condition not allowed

Does Foreign key contains only not


6 NULL values

4) Stored Procedures/ Functions

1 Is proper coding conventions followed

2 Is proper handling done for exceptions

Are all conditions/loops covered by


3 input data

Does TRIM is applied when data is


4 fetched from Database

Does executing the Stored Procedure


5 manually gives the correct result

Does executing the Stored Procedure


manually updates the table fields as
6 expected

7
Does execution of the Stored Procedure
fires the required triggers

Are all the Stored Procedures/Functions


used by the application (i.e. no unused
8 stored procedures present)

Does Stored procedures have ‘Allow


Null’ condition checked at data base
9 level

Are all the Stored Procedures and


Functions successfully executed when
10 Database is blank

5) Triggers

Is proper coding conventions followed in


1 Triggers

Are the triggers executed for the


2 respective DML transactions

Does the trigger updates the data


3 correctly once executed

6) Indexes

Are required Clustered indexes created


1 on the tables

Are required Non Clustered indexes


2 created on the tables

7) Transactions

1 Are the transactions done correct

Is the data committed if the transaction


2 is successfully executed

Is the data rollbacked if the transaction is


3 not executed successfully

4 Is the data rollbacked if the transaction is


not executed successfully and multiple
Databases are invlolved in the
transaction

Are all the transactions executed by


5 using TRANSACTION keyword

8) Security

Is the data secured from unauthorized


1 access

Are different user roles created with


2 different permissions

3 Do all the users have access on Database

9) Performance

Does Database perform as expected


(within expected time) when query is
1 executed for less number of records

Does Database perform as expected


(within expected time) when query is
2 executed for large number of records

Does Database perform as expected


(within expected time) when multiple
3 users access same data

4 Is Performance Profiling done

5 Is Performance Benchmarking done

6 Is Query Execution Plan created

Is Database testing done when server is


7 behind Load Balancer

8 Is Database normalized

10
) Miscellaneous

1
Are the log events added in database for
all login events

Is it verified in SQL Profiler that queries


are executed only in Stored Procedures
2 (i.e. no direct visible query in Profiler)

3 Does scheduled jobs execute timely

4 Is Test Data Dev Tool available

11
) SQL Injection

1 Is the Query parameterized

2 Does URL contain any query details

12
) Backup and Recovery

1 Is timely backup of Database taken

Advertisements

NOV

21

ETL Test Scenarios and Test Cases


Based on my experience I prepared maximum test scenarios and test cases to validate the
ETL process. I will keep on update this content. Thanks

Test Scenario Test Cases

 Mapping doc Verify mapping doc whether corresponding ETL


validation information is provided or not. Change log should
maintain in every mapping doc.
Define the default test strategy If mapping docs are
missed out some optional information. Ex: data types
length etc

 Structure validation 1. Validate the source and target table structure against
corresponding mapping doc.

2. Source data type and Target data type should be


same.

3. Length of data types in both source and target should


be equal.

4. Verify that data field types and formats are specified

5. Source data type length should not less than the


target data type length.

6. Validate the name of columns in table against


mapping doc.

 Constraint Ensure the constraints are defined for specific table as


Validation expected.

 Data Consistency 1. The data type and length for a particular attribute
Issues may vary in files or tables though the semantic definition
is the same.
Example: Account number may be defined as:
Number (9) in one field or table and Varchar2(11) in
another table

2. Misuse of Integrity Constraints: When referential


integrity constraints are misused, foreign key values
may be left “dangling” or
inadvertently deleted.
Example: An account record is missing but dependent
records are not deleted.
 Data Completeness Ensures that all expected data is loaded in to target
Issues table
1. Compare records counts between source and target.
Check for any rejected records.

2. Check Data should not be truncated in the column of


target table.

3. Check boundary value analysis (ex: only >=2008 year


data has to load into the target)

4. Comparing unique values of key fields between source


data and data loaded to the warehouse. This is a
valuable technique that points out a variety of possible
data errors without doing a full validation on all fields.

 Data Correctness 1. Data that is misspelled or inaccurately recorded.


Issues
2. Null, non-unique, or out of range data may be stored
when the integrity constraints are disabled.
Example: The primary key constraint is disabled during
an import function. Data is entered into the existing
data with null unique identifiers.

 Data 1. Create a spread sheet of scenarios of input data and


Transformation expected results and validate these with the business
customer. This is an excellent requirements elicitation
step during design and could also be used as part of
testing.

2. Create test data that includes all scenarios. Utilize an


ETL developer to automate the entire process of
populating data sets with the scenario spread sheet to
permit versatility and mobility for the reason that
scenarios are likely to change.

3. Utilize data profiling results to compare range and


submission of values in each field between target and
source data.

4. Validate accurate processing of ETL generated fields;


for example, surrogate keys.

5. Validate that the data types within the warehouse are


the same as was specified in the data model or design.
6. Create data scenarios between tables that test
referential integrity.

7. Validate parent-to-child relationships in the data.


Create data scenarios that test the management of
orphaned child records.

 Data Quality 1. Number check: if in the source format of numbering


the columns are as xx_30 but if the target is only 30
then it has to load not pre_fix(xx_). we need to validate.

2. Date Check: They have to follow Date format and it


should be same across all the records. Standard format:
yyyy-mm-dd etc..

3. Precision Check: Precision value should display as


expected in the target table.
Example: In source 19.123456 but in the target it should
display as 19.123 or round of 20.

4. Data Check: Based on business logic, few record


which does not meet certain criteria should be filtered
out.
Example: only record whose date_sid >=2008 and
GLAccount != ‘CM001’ should only load in the target
table.

5. Null Check: Few columns should display “Null” based


on business requirement.
Example: Termination Date column should display null
unless & until if his “Active status” Column is “T” or
“Deceased”.
Note: Data cleanness will be decided during design
phase only.

 Null Validation Verify the null values where "Not Null" specified for
specified column.

 Duplicate check 1. Needs to validate the unique key, primary key and
any other column should be unique as per the business
requirements are having any duplicate rows.

2. Check if any duplicate values exist in any column


which is extracting from multiple columns in source and
combining into one column.

3. Some time as per the client requirements we needs


ensure that no duplicates in combination of multiple
columns within target only.

Example: One policy holder can take multiple polices


and multiple claims. In this case we need to verify the
CLAIM_NO, CLAIMANT_NO, COVEREGE_NAME,

EXPOSURE_TYPE, EXPOSURE_OPEN_DATE,
EXPOSURE_CLOSED_DATE, EXPOSURE_STATUS,
PAYMENT

 DATE Validation Date values are using many areas in ETL development
for:

1. To know the row creation date ex: CRT_TS

2. Identify active records as per the ETL development


perspective Ex: VLD_FROM, VLD_TO

3. Identify active records as per the business


requirements perspective Ex: CLM_EFCTV_T_TS,
CLM_EFCTV_FROM_TS

4. Sometimes based on the date values the updates and


inserts are generated.

Possible Test scenarios to validate the Date values:

a. From_Date should not greater than To_Date


b. Format of date values should be proper.
c. Date values should not any junk values or null values

 Complete Data 1. To validate the complete data set in source and target
Validation table minus query is best solution.
(using minus and
intersect)
2. We need to source minus target and target minus
source.

3. If minus query returns any value those should be


considered as mismatching rows.

4. And also we needs to matching rows among source


and target using Intersect statement.

5. The count returned by intersect should match with


individual counts of source and target tables.

6. If minus query returns o rows and count intersect is


less than source count or target table count then we can
considered as duplicate rows are exists.

 Some Useful test 1. Verify that extraction process did not extract duplicate
scenarios data from the source (usually this happens in repeatable
processes where at point zero we need to extract all
data from the source file, but the during the next
intervals we only need to capture the modified, and new
rows.)

2. The QA team will maintain a set of SQL statements


that are automatically run at this stage to validate that
no duplicate data have been extracted from the source
systems.

 Data cleanness Unnecessary columns should be deleted before loading


into the staging area.

Example2: If a column have name but it is taking extra


space , we have to “trim” space so before loading in the
staging area with the help of expression transformation
space will be trimmed.

Example1: Suppose telephone number and STD code in


different columns and requirement says it should be in
one column then with the help of expression
transformation we will concatenate the values in one
column.

Posted 21st November 2012 by satish kumar

Add a comment
ETL Testing

ETL Testing , Reporting Testing, Automation of ETL

Testing and PL/SQL Scripts

 Classic

 Flipcard

 Magazine

 Mosaic

 Sidebar

 Snapshot

 Timeslide
1.
2.
NOV

21

Upcoming Posts

1. Automation of ETL Testing using SQL and PL/SQL scripts.

2. How to test the complete data set during SCD type 2 implementation?

3. Effective methodologies to detect defects in the initial stage of ETL process.


Posted 21st November 2012 by satish kumar
Labels: Upcoming

View comments
3.
NOV

21

ETL Test Scenarios and Test Cases


Based on my experience I prepared maximum test scenarios and test cases to validate the
ETL process. I will keep on update this content. Thanks

Test Scenario Test Cases

 Mapping doc Verify mapping doc whether corresponding ETL


validation information is provided or not. Change log should
maintain in every mapping doc.
Define the default test strategy If mapping docs are
missed out some optional information. Ex: data types
length etc

 Structure validation 1. Validate the source and target table structure against
corresponding mapping doc.

2. Source data type and Target data type should be


same.

3. Length of data types in both source and target should


be equal.

4. Verify that data field types and formats are specified

5. Source data type length should not less than the


target data type length.

6. Validate the name of columns in table against


mapping doc.
 Constraint Ensure the constraints are defined for specific table as
Validation expected.

 Data Consistency 1. The data type and length for a particular attribute
Issues may vary in files or tables though the semantic definition
is the same.
Example: Account number may be defined as:
Number (9) in one field or table and Varchar2(11) in
another table

2. Misuse of Integrity Constraints: When referential


integrity constraints are misused, foreign key values
may be left “dangling” or
inadvertently deleted.
Example: An account record is missing but dependent
records are not deleted.

 Data Completeness Ensures that all expected data is loaded in to target


Issues table
1. Compare records counts between source and target.
Check for any rejected records.

2. Check Data should not be truncated in the column of


target table.

3. Check boundary value analysis (ex: only >=2008 year


data has to load into the target)

4. Comparing unique values of key fields between source


data and data loaded to the warehouse. This is a
valuable technique that points out a variety of possible
data errors without doing a full validation on all fields.

 Data Correctness 1. Data that is misspelled or inaccurately recorded.


Issues
2. Null, non-unique, or out of range data may be stored
when the integrity constraints are disabled.
Example: The primary key constraint is disabled during
an import function. Data is entered into the existing
data with null unique identifiers.
 Data 1. Create a spread sheet of scenarios of input data and
Transformation expected results and validate these with the business
customer. This is an excellent requirements elicitation
step during design and could also be used as part of
testing.

2. Create test data that includes all scenarios. Utilize an


ETL developer to automate the entire process of
populating data sets with the scenario spread sheet to
permit versatility and mobility for the reason that
scenarios are likely to change.

3. Utilize data profiling results to compare range and


submission of values in each field between target and
source data.

4. Validate accurate processing of ETL generated fields;


for example, surrogate keys.

5. Validate that the data types within the warehouse are


the same as was specified in the data model or design.

6. Create data scenarios between tables that test


referential integrity.

7. Validate parent-to-child relationships in the data.


Create data scenarios that test the management of
orphaned child records.
 Data Quality 1. Number check: if in the source format of numbering
the columns are as xx_30 but if the target is only 30
then it has to load not pre_fix(xx_). we need to validate.

2. Date Check: They have to follow Date format and it


should be same across all the records. Standard format:
yyyy-mm-dd etc..

3. Precision Check: Precision value should display as


expected in the target table.
Example: In source 19.123456 but in the target it should
display as 19.123 or round of 20.

4. Data Check: Based on business logic, few record


which does not meet certain criteria should be filtered
out.
Example: only record whose date_sid >=2008 and
GLAccount != ‘CM001’ should only load in the target
table.

5. Null Check: Few columns should display “Null” based


on business requirement.
Example: Termination Date column should display null
unless & until if his “Active status” Column is “T” or
“Deceased”.
Note: Data cleanness will be decided during design
phase only.

 Null Validation Verify the null values where "Not Null" specified for
specified column.

 Duplicate check 1. Needs to validate the unique key, primary key and
any other column should be unique as per the business
requirements are having any duplicate rows.

2. Check if any duplicate values exist in any column


which is extracting from multiple columns in source and
combining into one column.

3. Some time as per the client requirements we needs


ensure that no duplicates in combination of multiple
columns within target only.
Example: One policy holder can take multiple polices
and multiple claims. In this case we need to verify the
CLAIM_NO, CLAIMANT_NO, COVEREGE_NAME,
EXPOSURE_TYPE, EXPOSURE_OPEN_DATE,
EXPOSURE_CLOSED_DATE, EXPOSURE_STATUS,
PAYMENT

 DATE Validation Date values are using many areas in ETL development
for:

1. To know the row creation date ex: CRT_TS

2. Identify active records as per the ETL development


perspective Ex: VLD_FROM, VLD_TO

3. Identify active records as per the business


requirements perspective Ex: CLM_EFCTV_T_TS,
CLM_EFCTV_FROM_TS

4. Sometimes based on the date values the updates and


inserts are generated.

Possible Test scenarios to validate the Date values:

a. From_Date should not greater than To_Date


b. Format of date values should be proper.
c. Date values should not any junk values or null values

 Complete Data 1. To validate the complete data set in source and target
Validation table minus query is best solution.
(using minus and
intersect)
2. We need to source minus target and target minus
source.

3. If minus query returns any value those should be


considered as mismatching rows.

4. And also we needs to matching rows among source


and target using Intersect statement.

5. The count returned by intersect should match with


individual counts of source and target tables.

6. If minus query returns o rows and count intersect is


less than source count or target table count then we can
considered as duplicate rows are exists.

 Some Useful test 1. Verify that extraction process did not extract duplicate
scenarios data from the source (usually this happens in repeatable
processes where at point zero we need to extract all
data from the source file, but the during the next
intervals we only need to capture the modified, and new
rows.)

2. The QA team will maintain a set of SQL statements


that are automatically run at this stage to validate that
no duplicate data have been extracted from the source
systems.

 Data cleanness Unnecessary columns should be deleted before loading


into the staging area.

Example2: If a column have name but it is taking extra


space , we have to “trim” space so before loading in the
staging area with the help of expression transformation
space will be trimmed.

Example1: Suppose telephone number and STD code in


different columns and requirement says it should be in
one column then with the help of expression
transformation we will concatenate the values in one
column.

Posted 21st November 2012 by satish kumar

Add a comment

Loading
Dynamic Views theme. Powered by Blogger.

You might also like