You are on page 1of 21

lOMoARcPSD|29080144

SUMMER TRAINING PROJECT REPORT ON

SQL FOR DATA ANALYTICS


In partial fulfillment of

B.Tech 3rd yr (Computer Science & Engg.)

Submitted To: Submitted By:


Ms. Vishali Nakul Rohilla
(A.P, CSE 21/CS/148
VCE, Rohtak)

1
lOMoARcPSD|29080144

STUDENT DECLARATION

I, Nakul Rohilla , enrolled as a student of Vaish College of Engineering, hereby declare that the
practical training report titled "SQL for Data Analytics" is the result of my own work and has
been undertaken during the practical training period from 27 June to 8 August.

I affirm that the content of this report is based on my own research, experiences, and
observations during the practical training period.

Date: 09-01-2024

Signature:-

Nakul Rohilla

Acknowledgment
2
lOMoARcPSD|29080144

I take this opportunity to convey my sincere thanks and gratitude to all


those who have directly or indirectly helped and contributed towards the
completion of this project.
First and foremost, I realized that the degree of relevance of the learning
being imparted in the class is very high. The learning enabled us to get a
better understanding of the Data Analysis subject which I studied.
I would also like to thank our batch mates for the discussions that I had with
them. All these have resulted in the enrichment of our knowledge and their
inputs have helped us to incorporate relevant issues into our project. During
the project, I realized that the degree of relevance of the learning being
imparted in the class is very high. The learning enabled us to get a better
understanding of the nitty-gritty of the subject which we studied.

Nakul Rohilla
CSE Vth SEM

3
lOMoARcPSD|29080144

1. Table of Content

Student Declaration ……………………………………………………………..2

ACKNOWLEDGMENT ................................................................................................... 3

1. TABLE OF CONTENTS ............................................................................................. 4

2. ABOUT TRAINING & COMPANY ............................................................................ 5

3. INTRODUCTION ..........................................................................................................8

4. TECHNOLOGY USED IN PROJECT ....................................................................... 9

4.1 Project Code ....................................................................................................................................... 10

4.2 Project Screenshot.............................................................................................................. 15

5. APPLICATIONS ......................................................................................................... 19

6. FUTURE WORK & CONCLUSION ......................................................................... 20

7. REFERENCES ........................................................................................................ 21

4
lOMoARcPSD|29080144

2. About Training & Company

I am completed my training from Internshala in data analytics for SQL and many of
Indian and foreign teacher uploaded their content.
Internshala is an internship and online training platform, based in Gurgaon, India. Founded by
Sarvesh Agrawal, an IIT Madras alumnus, in 2011, the website helps students find internships
with organisations in India.
The platform, which was founded in 2010, started out as a WordPress blog that aggregated
internships across India and articles on education, technology and skill gap. Internshala
launched its online trainings in 2014. As of 2018, the platform had 3.5 million students and
80,000 companies.

I am attached My Training Certificate which Internshala gives after completion of courses :-

5
lOMoARcPSD|29080144

6
lOMoARcPSD|29080144

7
lOMoARcPSD|29080144

3. Introduction

In database analytics is the technology that allows data processing to be conducted within
the database by building analytic logic into the database itself. Doing so eliminates the time
and effort required to transform data and move it back and forth between a database and a
separate analytics application.

PostgreSQL is an advanced object-relational database management system that supports an


extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers,
user-defined types and functions.

Here I am attaching my project page

8
lOMoARcPSD|29080144

4. Technology used in Project

In this project I used PostgreSQL which is a powerful, open source object-


relational database system that uses and extends the SQL language
combined with many features that safely store and scale the most
complicated data workloads. The origins of PostgreSQL date back to 1986 as
part of the POSTGRES project at the University of California at Berkeley and
has more than 30 years of active development on the core platform.

PostgreSQL has earned a strong reputation for its proven architecture,


reliability, data integrity, robust feature set, extensibility, and the dedication
of the open source community behind the software to consistently deliver
performant and innovative solutions. PostgreSQL runs on all major
operating systems, has been ACID compliant since 2001, and has powerful
add-ons such as the popular Post GIS geospatial database extender. It is no
surprise that PostgreSQL has become the open source relational database
of choice for many people and organisations.

9
lOMoARcPSD|29080144

4.1 Technical Details of Project/Study

PostgreSQL is a relational database management system (RDBMS). That means it is a


system for managing data stored in relations. Relation is essentially a mathematical term
for table.

4.1 Project Code

-- 1. Create a table named 8deliveries9 with appropriate data types for columns.
create table deliveries (id
int, inning int, over int, ball
int,

batsman char (100), non_striker


char (100), bowler char(100),
batsman_runs int, extra_runs int,
total_runs int, is_wicket int,
dismissal_kind char(50),
player_dismissed char(100),

fielder char(100), extras_type


char(25), batting_team
char(50), bowling_team
char(50)); -- drop table
deliveries;

select * from deliveries;

-- 2.Create a table named 8matches9 with appropriate data types for columns.
create table matches( id
int primary key, city
char(50),

date date ,

player_of_match char(100),

venue char(100),
neutral_venue int, team_1
char(50), team_2 char(50),
toss_winner char(50),
toss_decision char(10),
winner char(50), result
char(10), result_margin int,

10
lOMoARcPSD|29080144

eliminator char(5), method


varchar(5), umpire_1
char(50), umpire_2
char(50)); -- drop table
matches;

select * from matches;

-- 3.Import data from csv file 9IPL_matches.csv9 attached in resources to 8matches9.


copy matches from 'C:\Program Files\PostgreSQL\14\data\dataset\IPL_matches_.csv' CSV header;

-- 4.Import data from csv file 9IPL_Ball.csv9 attached in resources to 8deliveries9.


copy deliveries from 'C:\Program Files\PostgreSQL\14\data\dataset\IPL_Ball.csv' CSV header;

-- 5.Select the top 20 rows of the deliveries table. select * from


deliveries

order by id limit 20;

-- 6.Select the top 20 rows of the matches table. select * from


matches

order by id limit 20;

-- 7.Fetch data of all the matches played on 2nd May 2013. select * from
matches where date = '02-05-2013';

-- 8.Fetch data of all the matches where the margin of victory is more than 100 runs. select * from
matches where result = 'runs' and result_margin > '100' order by result_margin desc; select * from
matches where result = 'runs' and result_margin > '100' order by result_margin desc;

-- 9.Fetch data of all the matches where the final scores of both teams tied and order it in descending
order of -- the date.

select * from matches where

result = 'tie' order by


date desc;

-- 10.Get the count of cities that have hosted an IPL match.

select count (distinct city) as "Total_City_Hosted" from matches;

-- 11.Create table deliveries_v02 with all the columns of deliveries and an additional column ball_result

11
lOMoARcPSD|29080144

-- containing value boundary, dot or other depending on the total_run (boundary for >= 4, dot for 0 and other for --
any other number) create table deliveries_v02 as select *,

CASE WHEN total_runs >= 4 then 'boundary'

WHEN total_runs = 0 THEN 'dot' else 'other'

END as ball_result

FROM deliveries;

select * from deliveries_v02;

-- drop table deliveries_v02;

-- 12.Write a query to fetch the total number of boundaries and dot balls

select ball_result, count (*) from deliveries_v02 where ball_result <>'1,2,3' group by ball_result order by count
desc;

select ball_result, count (*) from deliveries_v02 group by ball_result;

-- 13.Write a query to fetch the total number of boundaries scored by each team select
batting_team,count(*) from deliveries_v02 where ball_result = 'boundary' group by batting_team order
by count;

-- 14.Write a query to fetch the total number of dot balls bowled by each team select
bowling_team, count(*) from deliveries_v02 where ball_result = 'dot' group by bowling_team

order by count;

-- 15.Write a query to fetch the total number of dismissals by dismissal kinds select dismissal_kind,
count (*) from deliveries where dismissal_kind <> 'NA' group by dismissal_kind

order by count desc;

-- 16.Write a query to get the top 5 bowlers who conceded maximum extra runs

select distinct "bowler" as Bowler, count (extra_runs) as "Max_extra_runs" from deliveries group by bowler

order by "Max_extra_runs" desc limit 5;

-- select bowler, count (extra_runs) from deliveries group by bowler

-- order by count desc limit 5;

-- 17.Write a query to create a table named deliveries_v03 with all the columns of deliveries_v02 table and

12
lOMoARcPSD|29080144

-- two additional column (named venue and match_date) of venue and date from table matches create table
deliveries_v03 AS SELECT a.*, b.venue, b.match_date from deliveries_v02 as a left join (select max(venue) as
venue, max(date) as match_date, id from matches group by id) as b on a.id = b.id;

-- select * from deliveries_v03;

-- 18.Write a query to fetch the total runs scored for each venue and order it in the descending order of

-- total runs scored.

select sum (total_runs) as Venue_runs,(select venue from matches where deliveries.id = matches.id)
from deliveries group by venue order by "venue_runs" desc;

-- select venue, sum(total_runs) as runs from deliveries_v03 group by venue order by runs desc;

-- 19.Write a query to fetch the year-wise total runs scored at Eden Gardens and order it in the descending order

-- of total runs scored.

select extract(year from match_date) as IPL_year, sum(total_runs) as runs from deliveries_v03 where venue = 'Eden
Gardens' group by IPL_year order by runs desc;

-- 20.Get unique team1 names from the matches table, you will notice that there are two entries for

-- Rising Pune Supergiant one with Rising Pune Supergiant and another one with Rising Pune Supergiants.

-- Your task is to create a matches_corrected table with two additional columns team1_corr and team2_corr
containing

-- team names with replacing Rising Pune Supergiants with Rising Pune Supergiant. Now analyse these newly
created

-- columns. select distinct team_1 from


matches;

create table matches_corrected as select *, replace(team_1, 'Rising Pune Supergiants', 'Rising Pune

Supergiant') as team1_corr

, replace(team_2, 'Rising Pune Supergiants', 'Rising Pune Supergiant') as team2_corr from matches;

select distinct team1_corr from matches_corrected;

-- 21.Create a new table deliveries_v04 with the first column as ball_id containing information of match_id, inning,

-- over and ball separated by 8-9 (For ex. 335982-1-0-1 match_id-inning-over-ball) and rest of the columns
same as

-- deliveries_v03)

create table deliveries_v04 AS SELECT concat(id,'-',inning,'-',over,'-',ball) as ball_id, * from deliveries_v03 ;

13
lOMoARcPSD|29080144

-- select * from deliveries_v04;

-- 22.Compare the total count of rows and total count of distinct ball_id in deliveries_v04; select * from
deliveries_v04 limit 20;

select count(distinct ball_id) from deliveries_v04; select count(*)


from deliveries_v04;

-- 23.Create table deliveries_v05 with all columns of deliveries_v04 and an additional column for row number

-- partition over ball_id. (HINT : row_number() over (partition by ball_id) as r_num) drop table
deliveries_v05;

create table deliveries_v05 as select *, row_number() over (partition by ball_id) as r_num from deliveries_v04;

select * from deliveries_v05;

-- 24.Use the r_num created in deliveries_v05 to identify instances where ball_id is repeating.

-- (HINT : select * from deliveries_v05 WHERE r_num=2;) select * from


deliveries_v05 WHERE r_num=2;

-- 25.Use subqueries to fetch data of all the ball_id which are repeating.

-- (HINT: SELECT * FROM deliveries_v05 WHERE ball_id in (select BALL_ID from deliveries_v05 WHERE r_num=2);

SELECT * FROM deliveries_v05 WHERE ball_id in (select BALL_ID from deliveries_v05 WHERE r_num=2);

14
lOMoARcPSD|29080144

4.2 Project Snapshot

15
lOMoARcPSD|29080144

16
lOMoARcPSD|29080144

17
lOMoARcPSD|29080144

18
lOMoARcPSD|29080144

5. Applications

It is a form of business intelligence which helps in solving specific problems and challenges.
For instance, through data analytics, an e-commerce company can track customer
behaviour and use the insights to improve the entire experience.

19
lOMoARcPSD|29080144

6. Future Work & Conclusion

Now that we have analyzed our data, the last step is to draw your conclusions.
Conclusions summarize whether the experiment or survey results support or contradict
the original hypothesis. Teams should include key facts from your team's background
research to help explain the results.

20
lOMoARcPSD|29080144

7. References

1.https://www.google.com/

2.https://www.postgresql.org/docs/

21

You might also like