You are on page 1of 5

11/11/2020

General Information
CS102 
 Course Instructor: Muhammad Abulaish 
Database Management   Email: abulaish@sau.ac.in 
 Tel (O): 24195 (148) 
 Office: Room No. 310   
System  Teaching Assistant (TA)
 Jayati Gulati <gulati.jayati@gmail.com> 
 Lecture: 
 11:30am‐1:00pm & 3:00 pm to 4:30 pm (Tuesday)
 Lab:
Course Web Page:  Open

www.abulaish.com/dbms20m.html


Course Structure & Evaluation Prerequisite
 The course has three parts   Knowledge of 
 Lectures (to discuss basic concepts and algorithms) 
 Data Structures and Algorithms (mainly tree and graph)
 Lab (SQL & PL/SQL)
 Presentations (self study on some recent advances in   Discrete Mathematics (mainly set theory) 
database and its applications)    
 Lecture slides will be available at course web page.
 Evaluation:
 Quiz: 30% (At least three quizzes)
 Presentations: 10% (group of maximum three students)
 Lab assignment: 10% 
 Final Exam: 50%
 

Teaching Materials Topics
 Text Book  Data Modeling
 R. Elmasri and S. B. Navathe: Fundamentals of Database  ER Model
Systems, Pearson Education.  EER (Enhanced ER) Model
 Database Design
 Reference Books
 Relation Database Elements & Constraints
 A. Silberschatz, H. F. Korth and S. Sudarshan: Database System  Database Decomposition & Normalization
Concepts, McGRAW‐HILL.  Database Language
 Web resources (mainly for SQL & PL/SQL)  Relational Algebra and Relational Calculus
 Query Optimization
 Database storage and Indexing
 File Organization and Indexing (Dynamic Multilevel Indexes 
using B‐Tree and B+‐Tree)

 Emerging Databases 
 Datawarehouse, Graph database & NoSQL database

1
11/11/2020

Data Information
 Data is the Latin plural of datum  Information is interpreted (processed) data so that it has
meaning for the user.
 Used to represent unprocessed facts and figures 
without any added interpretation or analysis.   “The price of petrol has risen from Rs. 74 to Rs. 80 per
liter” – is information for a person who tracks petrol
 Generally associated with some entity and often viewed  prices.
as the lowest level of abstraction from which 
information and knowledge are derived.  Data becomes information when it is processed for some
purpose and adds value for the recipient.
 Data may be unstructured, semi‐structured, and
structured  A set of raw sales figures – Data

 Example: The price of petrol is Rs. 80 per liter  Sales report (chart plotting, trend analysis) – Information

Knowledge Summarized View
 Knowledge is a fluid mix of information, experience and  Data – as in databases, spreadsheets, text files…
insight that may benefit the individual or the
organization.  Information – Processed data

 “When petrol prices go up by Rs. 6 per liter, it is likely  knowledge – Fluid mix of information, experience, and
that bus fare will rise by 12%“ – knowledge. insight
 The boundaries between data, information, and
OR, knowledge is a meta information about the patterns
knowledge is fuzzy
hidden in the data
 What is data to one person is information to someone
else. The patterns must be discovered automatically!!!

Data Categories & Mining Terminologies Another Data Categorization
Data are stored in Documents (A file)  Quantitative vs Categorical
 Quantitative data
Unstructured            Semi‐structured               Structured
 Discrete

A file stored on A web page  Continuous


A database
your PC stored on WWW
 Categorical data
(Text Mining) 90%
(Web Mining)                           (Data Mining)
 Nominal

WLM WSM WCM  Ordinal


 Ratio‐scaled
Opinion Mining & Sentiment Analysis Advice Mining

2
11/11/2020

The Huber Taxonomy of Data Set Sizes

Descriptor Data Set Size  Storage Mode


in Bytes
Limitations of Computing Machines  Tiny 102 Piece of Paper
Small 104 A Few Pieces of Paper
and Data Deluge Medium 106 A Floppy Disk
Large 108 Multipl Floppy Disks
Huge 1010 Hard Disk

Massive 1012 Multiple Hard Disks, e.g. RAID 


Storage

No. of Operations for Algorithms of Various 
Algorithmic Complexity Computational Complexities and various Data Set Sizes

Algorithm Complexity
Plot a scatterplot O(n 1/2)
n n1/2 n n log(n) n3/2 n2
Calculate means, variances, kernel density  O(n) tiny 10 102
2x10 2
10 3
104
estimates 2 4 4 6
small 10 10 4x10 10 108
Calculate fast Fourier transforms O(n log(n)) medium 10 3
106
6x10 6
10 9
1012
Calculate singular value decomposition of an rc  O(nc) large 10 4
108
8x10 8
10 12
1016
matrix; solve a multiple linear regression
huge 105 1010 1011 1015 1020

Solve most clustering algorithms O(n2)

Computational Feasibility on a Pentium PC Computational Feasibility on a Silican Graphics 
(10 MegaFLOPs) Onyx Workstation (300 MegaFLOPs)

n n 1/2 n n log(n) n 3/2 n2 n n1/2 n n log(n) n 3/2 n2


tiny 10 -6 10 -5 2x10 -5 .0001 .001
tiny 3.3x10-8 3.3x10 -7 6.7x10 -7 3.3x10 -6 3.3x10 -5
seconds seconds seconds seconds seconds
seconds seconds seconds seconds seconds
small 10 -5 .001 .004 .1 10
seconds seconds seconds seconds seconds small 3.3x10-7 3.3x10 -5 1.3x10 -4 3.3x10 -3 .33
seconds seconds seconds seconds seconds
medium .0001 .1 .6 1.67 1.16
seconds seconds seconds minutes days medium 3.3x10-6 3.3x10 -3 .02 3.3 55
seconds seconds seconds seconds minutes
large .001 10 1.3 1.16 31.7
seconds seconds minutes days years large 3.3x10-5 .33 2.7 55 1.04
seconds seconds seconds minutes years
huge .01 16.7 2.78 3.17 317,000
-4
seconds minutes hours years years huge 3.3x10 33 5.5 38.2 10,464
seconds seconds minutes days years

3
11/11/2020

Computational Feasibility on an Intel Paragon  Computational Feasibility on a TeraFLOP Grand 
XP/S A4 (4.2 GigaFLOPs)  Challenge Computer (1000 GigaFLOPs)

n n1/2 n n log(n) n3/2 n2 n n1/2 n n log(n) n3/2 n2

tiny 2.4x10 -9
2.4x10 -8
4.8x10-8
2.4x10 -7
2.4x10 -6 tiny 10-11 10-10 2x10-10 10-9 10-8
seconds seconds seconds seconds seconds seconds seconds seconds seconds seconds

small 2.4x10 -8
2.4x10 -6
9.5x10-6
2.4x10 -4
.024 small 10-10 10-8 4x10-8 10-6 10-4
seconds seconds seconds seconds seconds seconds seconds seconds seconds seconds

medium 2.4x10-7 2.4x10-4 .0014 .24 4.0 medium 10-9 10-6 6x10-6 .001 1
seconds seconds seconds seconds minutes seconds seconds seconds seconds second

large 2.4x10-6 .024 .19 4.0 27.8 large 10-8 10-4 8x10-4 1 2.8
seconds seconds seconds minutes days seconds seconds seconds second hours
huge 2.4x10-5 2.4 24 66.7 761 huge 10-7 .01 .1 16.7 3.2
seconds seconds seconds hours years seconds seconds seconds minutes years

Types of Computers for Interactive Feasibility
Types of Computers for Feasibility
(Response Time < 1 Second)
(Response Time < 1 Week)

n n 1/2 n n log(n) n 3/2 n2 n n 1/2 n n log(n) n 3/2 n2

tiny Personal Personal Personal Personal Personal tiny Personal Personal Personal Personal Personal
Computer Computer Computer Computer Computer Computer Computer Computer Computer Computer

small Personal Personal Personal Personal Super small Personal Personal Personal Personal Personal
Computer Computer Computer Computer Computer Computer Computer Computer Computer Computer

medium Personal Personal Personal Super Computer Teraflop medium Personal Personal Personal Personal Personal
Computer Computer Computer Computer Computer Computer Computer Computer Computer

large Personal Workstation Super Computer Teraflop --- large Personal Personal Personal Personal Teraflop
Computer Computer Computer Computer Computer Computer Computer

huge Personal Super Teraflop --- --- huge Personal Personal Personal Super Computer ---
Computer Computer Computer Computer Computer Computer

Data Mining Applications Resources
 Almost every information system  Graph database (neo4j): https://neo4j.com/developer/graph‐
database/
 NoSQL database (MongoDB) https://www.mongodb.com/nosql‐
explained
 Database related conferences



4
11/11/2020

Resources…



You might also like