Random Mysql Random String Decimal Decimal: Import Import Import Import From Import

DBT20CS353-ASSIGNMENT 3 -DATA DISTRIBUTION AND DISTRIBUTED QUERY PROCESSING
NAME : CHAITHRA
SRN: PES1UG20CS543
SECTION: I
(a) Load a million rows of data to a (transaction) table that should have distributed storage in
multiple drives of your PC. May use a program to create data - if used, submit the
program code.
PYTHON CODE TO LOAD MILLION ROWS OF DATA
BK =>Book table
import random
import mysql.connector
import random
import string
from decimal import Decimal
# Establish a connection to the MySQL database

mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="pes1ug20cs543_library"
)
# Create a cursor object to execute SQL queries

mycursor = mydb.cursor()
# Define the number of rows to generate

num_rows = 1000000
# Generate and insert data into the table

for i in range(num_rows):
isbn =i+1
title = ''.join(random.choices(string.ascii_letters, k=random.randint(1,
100)))
cost = Decimal(random.uniform(0, 1000)).quantize(Decimal('0.01'))
is_reserved = random.randint(0, 1)
edition = random.randint(1, 100)
publi_place = ''.join(random.choices(string.ascii_letters,
k=random.randint(1, 30)))
publisher = ''.join(random.choices(string.ascii_letters,
copy_year = random.randint(1900, 2023)
shelf_id = random.randint(1, 1000)
sub_name = ''.join(random.choices(string.ascii_letters,
# Insert the data into the table
sql = "INSERT INTO bk (ÌSBN`, `Title`, `Cost`, ÌsReserved`,
Èdition`, `PubliPlace`, `Publisher`, `CopyYr`, `ShelfID`, `SubName`) VALUES
(%s, %s, %s,%s, %s, %s,%s, %s, %s,%s)"
val = (isbn, title, cost, is_reserved, edition, publi_place, publisher,
copy_year, shelf_id, sub_name)
mycursor.execute(sql, val)
# Commit the changes to the database

mydb.commit()
# Close the database connection

mydb.close()
SQL QUERY TO PARTION AND INSERT DATA TO THE TABLE

CREATE TABLE partition_table(
ISBN int(11) NOT NULL,
Title varchar(100) NOT NULL,
Cost decimal(5,2) NOT NULL,
IsReserved tinyint(1) NOT NULL,
Edition int(11) NOT NULL,
PubliPlace varchar(30) DEFAULT NULL,
Publisher varchar(30) NOT NULL,
CopyYr decimal(4,0) NOT NULL,
ShelfID int(11) DEFAULT NULL,
SubName varchar(30) DEFAULT NULL,
PRIMARY KEY (ISBN)
);
Partition based on range (cost column)

ALTER TABLE partition_table
DROP PRIMARY KEY,
ADD PRIMARY KEY (ISBN, cost)
PARTITION BY RANGE(cost)(
PARTITION p0 VALUES LESS THAN (100),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
INSERT INTO partition_table

(ISBN,Title,Cost,IsReserved,Edition,PubliPlace,Publisher,CopyYr,ShelfId,SubName) Select
ISBN,Title,Cost,IsReserved,Edition,PubliPlace,Publisher,CopyYr,ShelfId,SubName from bk;
(b) Measure performance of this humongous table using multiple (min 6) varieties of
queries using different tables combinations and record the same and compare them.
(c) The Explain/Analyze plan outputs must be part of the submission in addition to
query results.
1) Cost of the book between 101 and 499
select * from bk where cost between 101 and 499;
Original table – rows accessed 899770
Partitioned table – rows accessed 398138 from partition p1 which has the range between 100 and
500
Analyze the above query

Original table scan-------→ 1027.1ms
Partition table scan------→ 430.53ms
2) select the books with cost >400
select isbn,cost,title from bk where cost>400;
The number of rows in bk table (book table) is more when compared to the partitioned table
Original table ----→518.6ms
Partitioned table-→505.68ms
Since only 3 of the 4 partitions(p1,p2,p3 were used and p0 was not used) were used the scan on
the partition table took less time when compared to the original table
3) Join query
SELECT b.ISBN, b.Title, bc.CopyID FROM bk b JOIN issue bc ON b.ISBN = bc.ISBN WHERE b.Cost <
500;
Since for this query the number of rows joined is 9 we compare the time required to join the 9
rows with cost <500 we see that the partition table takes (0.1581)lesser time to perform join
operation when compared to original table(0.2345) .
Only 2 partitions were used (p0,p1)
4 ) Aggregate
Orignal table rows scanned ---→899770

Partitioned table rows scanned→ 493166
Partition table scan ------→484.03 ms
Original table scan -------→ 742.43 ms
Only p2 and p3 partition were scanned so it took lesser time when compared to the original table
5 ) ORDER BY AND GROUP BY
Original table scan -----→5330.9 ms
Partition table scan ---→4728.4 ms
6 ) Index scan
Original table without index scan
With index scan

Partition table without index scan
Partition table with index scan

Original table without index scan -------→953.36ms
Original table with index scan------→ 11.33 ms
Partition table without index scan ---→476.43ms
Partition table with index scan-------→0.139ms
Partitioned table with index scan took lesser time when compared to all other tables .
Hence we can come to a conclusion that partitioning the table reduces the time required to scan
table since the we scan only the required partition.

Random Mysql Random String Decimal Decimal: Import Import Import Import From Import

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Mysql Random String Decimal Decimal: Import Import Import Import From Import

Uploaded by

Copyright:

Available Formats

DBT20CS353-ASSIGNMENT 3 -DATA DISTRIBUTION AND DISTRIBUTED QUERY PROCESSING

PYTHON CODE TO LOAD MILLION ROWS OF DATA

# Establish a connection to the MySQL database

# Create a cursor object to execute SQL queries

# Define the number of rows to generate

# Generate and insert data into the table

# Commit the changes to the database

# Close the database connection

SQL QUERY TO PARTION AND INSERT DATA TO THE TABLE

Partition based on range (cost column)

INSERT INTO partition_table

1) Cost of the book between 101 and 499

select * from bk where cost between 101 and 499;

Original table – rows accessed 899770

Analyze the above query

select isbn,cost,title from bk where cost>400;

Only 2 partitions were used (p0,p1)

Orignal table rows scanned ---→899770

Original table without index scan

With index scan

Partition table with index scan

You might also like