You are on page 1of 52

‫الجامعة السعودية االلكترونية‬

‫الجامعة السعودية االلكترونية‬

‫‪26/12/2021‬‬
College of Computing and Informatics
Data Science Pre-Master Program
IT244
Introduction to Database
IT244
Introduction to Database
Week 1
Orientation and Introduction
Contents
1. Understand the Major Objective of Saudi Vision 2030
2. Demonstrate the Saudi Digital Library and searching for
knowledge
3. Databases and Database Users
4. Introduction to NOSQL
Weekly Learning Outcomes
1. Understand the Major Objective of Saudi Vision 2030.
2. Demonstrate the Saudi Digital Library and searching for
knowledge.
3. Explain the basic concepts of Database.
Required Reading
1. Chapter 1: The Complete chapter
2. Chapter 24: Introduction to NOSQL
(Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez
Elmasri & Shamkant Navathe)

Recommended Reading
Saudi Digital Library: https://sdl.edu.sa/sdlportal/en/publishers.aspx
Introduction to Database Management Systems:
https://courses.cs.vt.edu/cs4604/Spring21/pdfs/1-intro.pdf

This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe
• Saudi Vision 2030

Weekly
Learning
Outcomes
Saudi Vision 2030

• Under the leadership of the Custodian of the Two Holy Mosques,


Vision 2030 was launched, a roadmap drawn up by His Royal
Highness the Crown Prince, to harness the strengths God
bestowed upon us – our strategic position, investment power and
place at the center of Arab and Islamic worlds. The full attention
of the Kingdom, and our Leadership, is on harnessing our
potential to achieve our ambitions.

Reference: https://www.vision2030.gov.sa/
Saudi Vision 2030

• Vision 2030 Draws on The Nation’s Intrinsic Strengths

1. Saudi Arabia is the land of the Two Holy Mosques which positions the
Kingdom at the heart of the Arab and Islamic worlds

2. Saudi Arabia is using its investment power to create a more diverse


and sustainable economy

3. The Kingdom is using its strategic location to build its role as an


integral driver of international trade and to connect three continents:
Africa, Asia and Europe

Reference: https://www.vision2030.gov.sa/
• Saudi Digital Library

Weekly
Learning
Outcomes
Saudi Digital Library

• Saudi Digital Library, is the largest academic gathering of


information sources in the Arab world, with more than (310،000)
scientific reference, covering all academic disciplines, and the
continuous updating of the content in this; thus achieving huge
accumulation cognitive in the long run. Library has contracted
with more than 300 global publisher. The library won the award
for the Arab Federation for Libraries and Information ‘know’ for
outstanding projects in the Arab world in 2010.

Reference: https://sdl.edu.sa/sdlportal/en/publishers.aspx
Saudi Digital Library

• Advantages:
• One central management, manages this huge content, and constantly
updated.
• Common share for the benefit of, any University would benefit other
universities that are now available to the other, in any scientific field.
• Enhance the status of universities when evaluating, for Academic
Accreditation, and through sources rich, modern, and publish the best
Global Publishers.
• Bridging the gap between Saudi universities, where emerging
universities can get the same service, you get major Saudi universities.

Reference: https://sdl.edu.sa/sdlportal/en/publishers.aspx
• Databases and Database Users
Chapter 1 Outline
 Types of Databases and Database Applications
 Basic Definitions
 Typical DBMS Functionality
 Example of a Database (UNIVERSITY)
 Main Characteristics of the Database Approach
 Types of Database Users
 Advantages of Using the Database Approach
 When Not to Use Databases
 Introduction to NOSQL
Basic Definitions(1)
 Data
 Known facts that can be recorded and have an implicit meaning.

 Example: the names, telephone numbers, and addresses of the people you know
 Database
 A collection of related data in a DBMS.

 Example: the list of names and addresses, and computerized catalog of a large library
 Defining a database
 Involves specifying the data types, structures, and constraints of the data to be stored in the
database.
 Meta-data
 The database definition or descriptive information is also stored by the DBMS in the form of
a database catalog or dictionary.
 Database Management System (DBMS)
 A computerized system that enables users to create and maintain a database. It is a general-
purpose software system that facilitates the processes of defining, constructing,
manipulating, and sharing databases among various users and applications.
 Database System
 The database and DBMS software together; Sometimes, the application programs and
interfaces are also included.
Basic Definitions(2)
 Manipulating a database
 Includes querying the database to retrieve specific data, updating the database, and
generating reports from the data.
 Sharing a database
 Allows multiple users and programs to access the database simultaneously.

 Application program
 Accesses the database by sending queries or requests for data to the DBMS.

 Query
 A query causes some data to be retrieved from the database.

 Transaction
 May cause some data to be read from and some data to be written into the database.

 Protection
 May includes system protection against hardware or software malfunction (or crashes) and
security protection against unauthorized or malicious access.
 Maintenance
 A typical large database has a life cycle of many years, so the DBMS must be allowing the
system to evolve as requirements change over time.
Simplified database system environment
Implicit Properties of a Database

 A database represents some aspect of the real world, called


the miniworld or the universe of discourse (UoD). Changes to
the miniworld are reflected in the database.
 A database is a logically coherent collection of data with some
inherent meaning. A random assortment of data cannot
correctly be referred to as a database.
 A database is designed, built, and populated with data for a
specific purpose. It has an intended group of users and some
preconceived applications in which these users are interested.
Example of a Database UNIVERSITY
Application(1)
 Mini-world for the example:
 Part of a UNIVERSITY environment.
 Some mini-world entities:
 INSTRUCTORs
 STUDENTs

 DEPARTMENTs

 COURSEs

 SECTIONs (of COURSEs)

 Some mini-world relationships:


 SECTIONs are of specific COURSEs
 STUDENTs take SECTIONs
 COURSEs have prerequisite COURSEs
 INSTRUCTORs teach SECTIONs
 COURSEs are offered by DEPARTMENTs
 STUDENTs major in DEPARTMENTs

 Note: The above entities and relationships are typically expressed in the ENTITY-
RELATIONSHIP data model
Example of a Database UNIVERSITY
Application(2)
Example of a Database UNIVERSITY
Application(3)
Typical DBMS Functionality
 Define a particular database in terms of its data types, structures, and
constraints
 Construct or Load the initial database contents on a secondary storage
medium
 Manipulating the database:
 Retrieval: Querying, generating reports
 Modification: Insertions, deletions and updates to its content
 Accessing/changing the database through Web applications
 Processing and Sharing by a set of concurrent users and application
programs
 Protection or Security measures to prevent unauthorized access
 “Active” processing to take internal actions on data
 Presentation and Visualization of data
 Maintaining the database and associated programs over its lifetime
Main Characteristics of the Database
Approach(1)
 Self-describing nature of a database system:
 A DBMS catalog stores the description of a particular database
 The description is called meta-data
 This allows the DBMS software to be integrated with different database
applications
 Insulation between programs and data:
 Allows changing data structures and data storage organization without having to
change the DBMS access programs.
 Accomplished through data abstraction
 A data model is used to hide storage details and present the users with a conceptual
view of the database.
 Programs refer to the data model constructs rather than data storage details Called
program-data independence.
Main Characteristics of the Database Approach (2)
 Support of multiple views of the data:
 Each user may see a different view of the database, which
describes only the data of interest to that user.
 Sharing of data and multi-user transaction processing:
 Allowing a set of user transactions to access and update the
database concurrently (at the same time).
 Concurrency control within the DBMS guarantees that each
transaction is correctly executed or aborted
 Recovery subsystem ensures each completed transaction has its
effect permanently recorded in the database
 OLTP (Online Transaction Processing) is a major part of database
applications (allows hundreds of concurrent transactions to
execute per second)
Example of meta-date in a simplified database
catalog
Types of Database Users (Actors on the
scene)
 Database administrators:
 Responsible for authorizing/controlling access to the database; coordinating
and monitoring its use; acquiring software and hardware resources; and
monitoring efficiency of operations. The DBA is accountable for security
breaches and poor system response time.
 Database Designers:
 Responsible for defining database structure, constraints, and transactions;
communicate with users to understand their needs.
 End-users: Use the database for queries, reports, and updating the database
content. Can be categorized into:
 Casual end-users: access database occasionally when needed

 Naïve (or Parametric) end-users: largest section of end-user population.


 Use previously implemented and tested programs (called “canned transactions”) to
access/update the database. Examples are bank-tellers or hotel reservation clerks or sales clerks.
 Sophisticated end-users:
 These include business analysts, scientists, engineers, etc. Many use tools of software packages
that work closely with the stored database.
 Stand-alone end-users:
 Mostly maintain personal databases using ready-to-use packaged applications.
Types of Database Applications
 Traditional Applications:
 Numeric and Textual Databases in Business Applications
 More Recent Applications:
 Multimedia Databases (images, videos, voice, etc.)
 Geographic Information Systems (GIS)
 Data Warehouses
 Real-time and Active Databases
 Many other applications
Advantages of Using the Database
Approach
 Controlling redundancy in data storage and in development and
maintenance efforts.
 Restricting unauthorized access to data.
 Providing persistent storage for program Objects
 Providing Storage Structures (e.g. indexes) for efficient Query
Processing
 Providing backup and recovery services.
 Providing multiple interfaces to different classes of users.
 Representing complex relationships among data.
 Enforcing integrity constraints on the database.
 Permitting inferencing and actions using rules and triggers
 Allowing multiple “views” of the same data
Additional Implications of Using the Database
Approach

 Potential for enforcing standards:


 Crucial for the success of database applications in large organizations.
Standards refer to data item names, display formats, screens, report
structures, meta-data, etc.
 Reduced application development time:
 The time needed to add each new application is reduced.
 Flexibility to change data storage structures:
 Storage structures may evolve to improve performance, or because of
new requirements.
 Availability of up-to-date information:
 Extremely important for on-line transaction systems such as airline, hotel, car
reservations.
 Economies of scale:
 Wasteful overlap of resources and personnel can be avoided by consolidating
data and applications across departments.
Historical Development of Database
Technology
 Early Database Applications using Hierarchical and Network Systems:
 Starting in the mid-1960s and continuing through the 1970s and 1980s. Were based on
three main paradigms: hierarchical systems, network model–based systems, and inverted
file systems.
 Relational Model based Systems:
 Relational model was introduced in 1970, and heavily researched and experimented with
at IBM Research and several universities. Relational DBMS Products emerged in the early
1980s and now exist on almost all types of computers, from small personal computers to
large servers.
 Object-oriented and emerging applications:
 Object Databases were introduced in late 1980s and early 1990s. Their use has not taken
off much. Many relational DBMSs have incorporated object database concepts, leading
to a new category called object-relational databases (ORDBs)
 Extended relational systems add further capabilities (e.g. for multimedia data, XML,
spatial, and other data types)
 Data on the Web and E-commerce Applications:
 Starting in the 1990s, e-commerce emerged as a major application on the Web. The
critical information on e-commerce Web pages is dynamically extracted data from
DBMSs, such as flight information, product prices, and product availability.
 The eXtended Markup Language (XML) is one standard for interchanging data among
various types of databases and Web pages.
Extending Database Capabilities
 New functionality is being added to DBMSs in the following
areas:
 Scientific Applications

 XML (eXtensible Markup Language)

 Image Storage and Management

 Audio and Video Data Management

 Data Warehousing and Data Mining

 Spatial Data Management and Geographic Information


Systems
 Time Series and Historical Data Management

 Collecting and fusing data from distributed sensors


When not to use a DBMS
 Main inhibitors (costs) of using a DBMS:
 High initial investment and possible need for additional hardware.
 Overhead for providing generality, security, concurrency control, recovery,
and other functions.
 When a DBMS may be unnecessary:
 If the database and applications are simple, well defined, and not expected
to change.
 If there are stringent real-time requirements that may not be met because
of DBMS overhead.
 If access to data by multiple users is not required.
 When no DBMS may suffice:
 If the database system is not able to handle the complexity of data because of
modeling limitations
 If the database users need special operations not supported by the DBMS
 When DBMS overhead makes it impossible to achieve the needed application
performance
• Introduction to NOSQL

Adopted from slides and/or materials by P. Hoekstra, J. Lu, A. Lakshman, P. Malik, J. Lin, R. Sunderraman, T. Ivarsson, J. Pokorny, N. Lynch, S. Gilbert, J. Widom, R. Jin, P. McFadin, C. Nakhli, and R. Ho
Background
• Relational databases are suitable for conventional business applications
• In the first decade of the twenty-first century, the proliferation of social
media Web sites, large e-commerce companies, Web search indexes, and
cloud storage/backup led to a surge in the amount of data stored on large
databases and massive servers.
• Employing DBMS/RDBMS to web-based application create problems
• New types of database systems were necessary to manage these huge
databases with fast search and retrieval as well as reliable and safe storage of
nontraditional types of data, such as social media posts and tweets.
• Web-based applications require more than the DBMS/RDBMS can provide
• Explosion of social media sites with very large data size
• Many cloud-based applications provide simple storage solution

35
Issues with scaling up

• Best way to provide ACID and rich query model is to have the dataset
on a single machine
• Limits to scaling up (or vertical scaling: make a “single” machine
more powerful)  dataset is just too big!
• Scaling out (or horizontal scaling: adding more smaller/cheaper
servers) is a better choice
• Different approaches for horizontal scaling (multi-node database):
• Master/Slave
• Sharding (partitioning)

36
Scaling out RDBMS
• Master/Slave
• All writes are written to the master
• All reads performed against the replicated slave databases
• Critical reads may be incorrect as writes may not have been propagated down
• Large datasets can pose problems as master needs to duplicate data to slaves
• Sharding (Partitioning)
• Scales well for both reads and writes
• Not transparent, application needs to be partition-aware
• Can no longer have relationships/joins across partitions
• Loss of referential integrity across shards
• Other Ways
• Multi-Master replication
• INSERT only, not UPDATES/DELETES
• No JOINs, thereby reducing query time
• This involves de-normalizing data
• In-memory databases 37
NOSQL (1)
• The Name:
• Stands for Not Only SQL
• The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-based
database
• It was again re-introduced by Eric Evans when an event was organized to
discuss open source distributed databases
• Eric states that “… but the whole point of seeking alternatives is that you need
to solve a problem that relational databases are a bad fit for. …”

38
NOSQL(2)

• Key features (also advantages):


• Non-relational
• Don’t require schema
• Data are replicated to multiple
nodes (so, identical & fault-tolerant)
and can be partitioned:
• Down nodes easily replaced
• No single point of failure
• Horizontal scalable
• Cheap, easy to implement
(open-source)
• Massive write performance
• Fast key-value access

39
NOSQL(3)
• Disadvantages:
• Don’t fully support relational features
• No join, group by, order by operations (except within partitions)
• No referential integrity constraints across partitions
• No declarative query language (e.g., SQL)  more programming
• Relaxed ACID (see CAP theorem)  fewer guarantees
• No easy integration with other applications that support SQL

40
NOSQL categories

1. Key-value
• Example: DynamoDB, Voldermort, Scalaris
2. Document-based
• Example: MongoDB, CouchDB
3. Column-based
• Example: BigTable, Cassandra, Hbased
4. Graph-based
• Example: Neo4J, InfoGrid
• “No-schema” is a common characteristics of most NOSQL storage
systems
• Provide “flexible” data types

41
Key-value
• Focus on scaling to huge amounts of data
• Designed to handle massive load
• Based on Amazon’s dynamo paper
• Data model: (global) collection of Key-value pairs
• Dynamo ring partitioning and replication
• Example: (DynamoDB)
• items having one or more attributes (name, value)
• An attribute can be single-valued or multi-valued like set.
• items are combined into a table
• Basic API access:
• get(key): extract the value given a key
• put(key, value): create or update the value given its key
• delete(key): remove the key and its associated value
• execute(key, operation, parameters): invoke an operation to the value (given its key) which is a
special data structure (e.g. List, Set, Map .... etc)

42
Key-value
Pros:
• Very fast
• Very scalable (horizontally distributed to nodes based on key)
• Simple data model
• Eventual consistency
• Fault-tolerance

Cons:
• Can’t model more complex data structure such as objects

43
Document-based
• Can model more complex objects
• Inspired by Lotus Notes
• Data model: collection of documents
• Document: JSON (JavaScript Object Notation is a data model, key-value
pairs, which supports objects, records, structs, lists, array, maps, dates,
Boolean with nesting), XML, other semi-structured formats.
• Example: (MongoDB) document
• {Name:"Jaroslav",
Address:"Malostranske nám. 25, 118 00 Praha 1”,
Grandchildren: {Claire: "7", Barbara: "6", "Magda: "3", "Kirsten: "1", "Otis: "3", Richard:
"1“}
Phones: [ “123-456-7890”, “234-567-8963” ]
}

44
Document-based

Name Producer Data model Querying

MongoDB 10gen object-structured manipulations with objects in


documents stored in collections (find object or
collections; objects via simple selections
each object has a primary and logical expressions,
key called ObjectId delete, update,)
Couchbase Couchbase1 document as a list of by key and key range, views
named (structured) items via Javascript and
(JSON document) MapReduce

45
Column-based
• Based on Google’s BigTable paper
• Like column oriented relational databases (store data in column order) but with a
twist
• Tables similarly to RDBMS, but handle semi-structured
• Data model:
• Collection of Column Families
• Column family = (key, value) where value = set of related columns (standard, super)
• indexed by row key, column key and timestamp

allow key-value pairs to be stored (and retrieved on key) in a massively parallel system
storing principle: big hashed distributed tables
properties: partitioning (horizontally and/or vertically), high availability etc.
completely transparent to application

* Better: extendible records

46
Column-based
• One column family can have variable
numbers of columns
• Cells within a column family are sorted “physically”
• Very sparse, most cells have null values
• Comparison: RDBMS vs column-based NOSQL
• Query on multiple tables
• RDBMS: must fetch data from several places on disk and glue together
• Column-based NOSQL: only fetch column families of those columns that are required
by a query (all columns in a column family are stored together on the disk, so
multiple rows can be retrieved in one read operation  data locality)
• Example: (Cassandra column family--timestamps removed for simplicity)
UserProfile = {
Cassandra = { emailAddress:”casandra@apache.org” , age:”20”}
TerryCho = { emailAddress:”terry.cho@apache.org” , gender:”male”}
Cath = { emailAddress:”cath@apache.org” , age:”20”,gender:”female”,address:”Seoul”}
}
47
Column-based
Name Producer Data model Querying

BigTable Google set of couples (key, {value}) selection (by combination of


row, column, and time stamp
ranges)
HBase Apache groups of columns (a BigTable JRUBY IRB-based shell
clone) (similar to SQL)
Hypertable Hypertable like BigTable HQL (Hypertext Query
Language)
CASSANDRA Apache columns, groups of columns simple selections on key,
(originally corresponding to a key range queries, column or
Facebook) (supercolumns) columns ranges
PNUTS Yahoo (hashed or ordered) tables, selection and projection from a
typed arrays, flexible schema single table (retrieve an
arbitrary single record by
primary key, range queries,
complex predicates, ordering,
top-k)

48
Graph-based
• Focus on modeling the structure of data (interconnectivity)
• Scales to the complexity of data
• Inspired by mathematical Graph Theory (G=(E,V))
• Data model:
• (Property Graph) nodes and edges
• Nodes may have properties (including ID)
• Edges may have labels or roles
• Key-value pairs on both
• Interfaces and query languages vary
• Single-step vs path expressions vs full recursion
• Example:
• Neo4j, FlockDB, Pregel, InfoGrid …

49
Conclusion

• NOSQL database cover only a part of data-intensive cloud applications


(mainly Web applications)
• Problems with cloud computing:
• SaaS (Software as a Service or on-demand software) applications require enterprise-
level functionality, including ACID transactions, security, and other features
associated with commercial RDBMS technology, i.e. NOSQL should not be the only
option in the cloud
• Hybrid solutions:
• Voldemort with MySQL as one of storage backend
• deal with NOSQL data as semi-structured data
 integrating RDBMS and NOSQL via SQL/XML

50
Main Reference
1. Chapter 1: The Complete chapter
2. Chapter 24: Introduction to NOSQL
(Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez
Elmasri & Shamkant Navathe)

Additional References
Saudi Digital Library:
https://sdl.edu.sa/sdlportal/en/publishers.aspx
https://courses.cs.vt.edu/cs4604/Spring21/pdfs/1-intro.pdf

This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe
Thank You

You might also like