Information Retrieval Techniques: Lecture # 14

Uploaded by

Hamza Hashmi

0% found this document useful (0 votes)

8 views15 pages

Original Title

Lecture-14 (2)

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views15 pages

Information Retrieval Techniques: Lecture # 14

Uploaded by

Hamza Hashmi

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 15

Search inside document

INFORMATION

RETRIEVAL TECHNIQUES
Lecture # 14

• Index Constructions

1
ACKNOWLEDGEMENTS
The presentation of this lecture has been taken from the following
sources
1. “Introduction to information retrieval” by Prabhakar Raghavan,
Christopher D. Manning, and Hinrich Schütze
2. “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C.
Bell
3. “Modern information retrieval” by Baeza-Yates Ricardo,
4. “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon,
Marco Brambilla
Outline
• Scaling index construction
• Memory Hierarchy
• Hard Disk Tracks and Sectors
• Hard Disk Blocks
• Hardware basics

3
Scaling index construction
• In-memory index construction does not scale
• Can’t stuff entire collection into memory, sort, then write back
• How can we construct an index for very large collections?
• Taking into account the hardware constraints we just learned
about . . .
• Memory, disk, speed, etc.
Memory Hierarchy

5
Moore’s Law

6
Hard Disk Tracks and Sectors

7
Hard Disk Blocks

8
Disk Access Time
• Access time = (seek time) + (rotational delay) + (transfer time)
• Seek time – moving the head to the right track
• Rotational delay – wait until the right sector comes below the head
• Transfer time – read/transfer the data

9
Hardware basics
• Access to data in memory is much faster than access to data on disk.
• Disk seeks: No data is transferred from disk while the disk head is
being positioned.
• Therefore: Transferring one large chunk of data from disk to memory
is faster than transferring many small chunks.
• Disk I/O is block-based: Reading and writing of entire blocks (as
opposed to smaller chunks).
• Block sizes: 8KB to 256 KB.
Inverted Index

11
Hardware basics (Cont.…)
• Servers used in IR systems now typically have several GB of main
memory, sometimes tens of GB.
• Available disk space is several (2–3) orders of magnitude larger.
• Fault tolerance is very expensive: It’s much cheaper to use many
regular machines rather than one fault tolerant machine.
Distributed computing
• Distributed computing is a field of computer science that studies
distributed systems. A distributed system is a software system in
which components located on networked computers communicate
and coordinate their actions by passing messages. Wikipedia

13
Hardware assumptions
• symbol statistic value
•s average seek time 5 ms = 5 x 10−3 s
•b transfer time per byte 0.02 μs = 2 x 10−8 s
• processor’s clock rate 109 s−1
•p low-level operation 0.01 μs = 10−8 s
(e.g., compare & swap a word)
• size of main memory several GB
• size of disk space 1 TB or more
Resources for today’s lecture
• Chapter 4 of IIR
• MG Chapter 5
• Original publication on MapReduce: Dean and Ghemawat (2004)
• Original publication on SPIMI: Heinz and Zobel (2003)

CORE12 0docs
Document530 pages
CORE12 0docs
Marcos
No ratings yet
Advanced Computer Skills and Hardware
Document7 pages
Advanced Computer Skills and Hardware
ong kar weng
100% (2)
h2hc2014 Reversing Firmware Radare Slides
Document46 pages
h2hc2014 Reversing Firmware Radare Slides
ntcase
No ratings yet
CompTIA® IT Fundamentals™
Document284 pages
CompTIA® IT Fundamentals™
René Eric Urbano Ehijo
100% (1)
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
CSS Exam
Document7 pages
CSS Exam
Robert Coloma
No ratings yet
OceanStor Dorado 3000, Dorado 5000, and Dorado 6000 6.0.1 Quick Installation Guide
Document15 pages
OceanStor Dorado 3000, Dorado 5000, and Dorado 6000 6.0.1 Quick Installation Guide
ginoanticona
No ratings yet
AOMEI® Partition Assistant User Manual
Document66 pages
AOMEI® Partition Assistant User Manual
Dred Ched
0% (1)
Digital Forensics: Working With Windows and DOS Systems
Document50 pages
Digital Forensics: Working With Windows and DOS Systems
onele mabhena
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
Document46 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
Chu Mạnh Tuấn
No ratings yet
21csc205p Dbms Unit 5
Document204 pages
21csc205p Dbms Unit 5
ESHITA RAI
No ratings yet
18CSC205J Operating Systems-Unit-5
Document138 pages
18CSC205J Operating Systems-Unit-5
Darsh Rawat
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document13 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
Presentation On Hard Disk
Document33 pages
Presentation On Hard Disk
raees786
60% (10)
Computer Architecture and Organization (Eeng 3192) : 4. The Memory Unit
Document25 pages
Computer Architecture and Organization (Eeng 3192) : 4. The Memory Unit
Esuyawkal Adugna
No ratings yet
Chapter 12
Document23 pages
Chapter 12
Hein Htet
No ratings yet
Raid Levels
Document47 pages
Raid Levels
Gopi Bala
No ratings yet
Chapter 4: Spatial Storage and Indexing
Document39 pages
Chapter 4: Spatial Storage and Indexing
Sudung Situmorang
No ratings yet
Secondary Storage Devices: Magnetic Disks
Document34 pages
Secondary Storage Devices: Magnetic Disks
Prabhu Nathan
No ratings yet
Introduction To File Structures: CENG 351 1
Document78 pages
Introduction To File Structures: CENG 351 1
puja kumari
No ratings yet
Lesson 15 Magnetic Systems
Document1 page
Lesson 15 Magnetic Systems
Mitha GEE
No ratings yet
ICT Presentation: Presented By: Rimsha Faisal Class: Bs-It
Document27 pages
ICT Presentation: Presented By: Rimsha Faisal Class: Bs-It
Syed Ramsha Faisal
No ratings yet
6 Data Storage and Querying
Document58 pages
6 Data Storage and Querying
fdghj
100% (1)
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
Document28 pages
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
Gilbert Heß
No ratings yet
Lecture06-Memory and Storage Systems
Document32 pages
Lecture06-Memory and Storage Systems
usheelike666
No ratings yet
Chapter 10 - External Storage - Part 2
Document41 pages
Chapter 10 - External Storage - Part 2
Ali Nawaz
No ratings yet
Secondary Storage Devices
Document75 pages
Secondary Storage Devices
Hirdesh
100% (1)
Ch.6 - CPU Scheduling
Document7 pages
Ch.6 - CPU Scheduling
fdf
No ratings yet
Introduction To Computer Ch-04
Document51 pages
Introduction To Computer Ch-04
Mian Abdullah
No ratings yet
UNIT4 - Memory Organization
Document65 pages
UNIT4 - Memory Organization
Paras Dwivedi
No ratings yet
Memory Organization
Document57 pages
Memory Organization
vpriyacse
No ratings yet
Lecture Memory
Document55 pages
Lecture Memory
Mohamad Yassine
No ratings yet
Lecture Memory Removed
Document33 pages
Lecture Memory Removed
Mohamad Yassine
No ratings yet
Cache Memory: 13 March 2013
Document80 pages
Cache Memory: 13 March 2013
Guru Balan
No ratings yet
Ch02 Storage Media
Document24 pages
Ch02 Storage Media
HidayatAsri
No ratings yet
Mass Storage and Case Study
Document56 pages
Mass Storage and Case Study
Ritish Gowda
No ratings yet
MODULE5 Mass Storage Struc
Document24 pages
MODULE5 Mass Storage Struc
Geetha Shyamsundar
No ratings yet
Memory Devices PC
Document27 pages
Memory Devices PC
Multan Singh Bhati
No ratings yet
This Unit: Caches: - Basic Memory Hierarchy Concepts
Document24 pages
This Unit: Caches: - Basic Memory Hierarchy Concepts
jameskipkosgei
No ratings yet
Lecture 20: Hard Disk Internals: Mythili Vutukuru IIT Bombay
Document9 pages
Lecture 20: Hard Disk Internals: Mythili Vutukuru IIT Bombay
Ravi Chandra Reddy
No ratings yet
Storage and Multimedia: The Facts and More
Document49 pages
Storage and Multimedia: The Facts and More
mayankgaba007
No ratings yet
Magnetic Disk (Track, Sector, Clusters, SATA
Document80 pages
Magnetic Disk (Track, Sector, Clusters, SATA
maulikdhakal2063
No ratings yet
Getting Hands in IO
Document15 pages
Getting Hands in IO
jfgfgft
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
Document53 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
Devika Harikrishnan
No ratings yet
11.mass Storage System and Disc Scheduling Hk3cLRdEXm PDF
Document50 pages
11.mass Storage System and Disc Scheduling Hk3cLRdEXm PDF
Shivam mishra
No ratings yet
Storage and File Structure
Document60 pages
Storage and File Structure
aparna_savalam485
No ratings yet
5 Disk Scheduling
Document108 pages
5 Disk Scheduling
prabha.b
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
Document43 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
abbas
No ratings yet
Unit 3 - Revision Notes
Document6 pages
Unit 3 - Revision Notes
Kwok Cheng Ka Yasuka
No ratings yet
Storage and File Structure
Document37 pages
Storage and File Structure
Rupesh Ainarkar
No ratings yet
5 Disk Scheduling
Document108 pages
5 Disk Scheduling
Spanvl Shine
No ratings yet
ICT Presentation: Presented By: Rimsha Faisal Class: Bs-It
Document32 pages
ICT Presentation: Presented By: Rimsha Faisal Class: Bs-It
Syed Ramsha Faisal
No ratings yet
Types of Storage Device: AA A Pog D EE SRM U
Document27 pages
Types of Storage Device: AA A Pog D EE SRM U
Paulomario Remuzgo
No ratings yet
External Memory CHP 6
Document95 pages
External Memory CHP 6
muusdsd
No ratings yet
Log-Structured Merge Trees (LSMS) Bloom Filters
Document18 pages
Log-Structured Merge Trees (LSMS) Bloom Filters
Frank Fenng
No ratings yet
ch4 SpatialStorageAndIndexing
Document51 pages
ch4 SpatialStorageAndIndexing
Ravi Maurya
No ratings yet
Review: (R&G Chapter 9) - Aren't Databases Great? - Relational Model - SQL
Document7 pages
Review: (R&G Chapter 9) - Aren't Databases Great? - Relational Model - SQL
khilanvekaria
No ratings yet
03-Chap4-Cache Memory Mapping
Document24 pages
03-Chap4-Cache Memory Mapping
abdul shakoor
No ratings yet
I/O (Input Output) and Buffer Manager
Document10 pages
I/O (Input Output) and Buffer Manager
asma
No ratings yet
Storing Data: Disks and Files: (R&G Chapter 9)
Document39 pages
Storing Data: Disks and Files: (R&G Chapter 9)
raw.junk
No ratings yet
External Memory Ch1
Document44 pages
External Memory Ch1
shanbel ayayu
No ratings yet
Applied Operating System
Document57 pages
Applied Operating System
MAEGAN THERESE GABRIANA
No ratings yet
m5 Main m6 Main 1 Merged
Document95 pages
m5 Main m6 Main 1 Merged
McLeen Cruz
No ratings yet
Memory Cache
Document18 pages
Memory Cache
Funsuk Vangdu
No ratings yet
Lesson9: Types of Storage Devices
Document28 pages
Lesson9: Types of Storage Devices
Shenbagam Sri
No ratings yet
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
Fast Python: High performance techniques for large datasets
From Everand
Fast Python: High performance techniques for large datasets
Tiago Antao
No ratings yet
Divide-and-Conquer Analysis of Merge Sort
Document20 pages
Divide-and-Conquer Analysis of Merge Sort
Hamza Hashmi
No ratings yet
Information Retrieval Techniques: Lecture # 15
Document8 pages
Information Retrieval Techniques: Lecture # 15
Hamza Hashmi
No ratings yet
Divide-and-Conquer Analysis of Merge Sort
Document20 pages
Divide-and-Conquer Analysis of Merge Sort
Hamza Hashmi
No ratings yet
Singly Linked List
Document13 pages
Singly Linked List
Hamza Hashmi
No ratings yet
Lists and Arrays
Document16 pages
Lists and Arrays
Hamza Hashmi
No ratings yet
Information Retrieval Techniques: Lecture # 18
Document11 pages
Information Retrieval Techniques: Lecture # 18
Hamza Hashmi
No ratings yet
Detailed Solutions: Office Technology
Document13 pages
Detailed Solutions: Office Technology
Tatiana Johnson
No ratings yet
Seagate WSS NAS Administration Guide: Model: SRPS20 / SRPS40 / SRPS60
Document42 pages
Seagate WSS NAS Administration Guide: Model: SRPS20 / SRPS40 / SRPS60
Hengky Tupang
No ratings yet
SSD Buying Guide List
Document19 pages
SSD Buying Guide List
yignarokno
No ratings yet
How To Fix Bad Sectors Using HDAT2 Utility On Bootable USB Flash - HDAT2 Tutorial
Document11 pages
How To Fix Bad Sectors Using HDAT2 Utility On Bootable USB Flash - HDAT2 Tutorial
Nelson Marcelo Zanabria Rios
No ratings yet
Unmountable Boot Volume
Document3 pages
Unmountable Boot Volume
Fejs Za Scribd
No ratings yet
DiskGenius User Guide
Document193 pages
DiskGenius User Guide
Eassos Software
100% (1)
DELL Precision T5810 Manual
Document54 pages
DELL Precision T5810 Manual
Max Lederer
No ratings yet
Basic Concepts of Computer
Document6 pages
Basic Concepts of Computer
Alage Teka
No ratings yet
P150EM/ P151EM1 Service Manual
Document124 pages
P150EM/ P151EM1 Service Manual
smooky1980
No ratings yet
Fujitsu PRIMERGY RX300 S5
Document7 pages
Fujitsu PRIMERGY RX300 S5
Iwene Israa
No ratings yet
Start Data Recovery Business
Document18 pages
Start Data Recovery Business
Parwez Alam
No ratings yet
IBM Corporation Power S812
Document78 pages
IBM Corporation Power S812
dubravko_akmacic
No ratings yet
Characteristics of Computers
Document4 pages
Characteristics of Computers
Veera Mani
No ratings yet
Dell PowerEdge RAID Controller H330
Document2 pages
Dell PowerEdge RAID Controller H330
danxl007
No ratings yet
ITGS Terms and Definitions
Document91 pages
ITGS Terms and Definitions
HoodFiga
0% (1)
Iba - PDA Analysis Bar Mill Patratu: by Abhishek Singh
Document13 pages
Iba - PDA Analysis Bar Mill Patratu: by Abhishek Singh
nilton.flavio
No ratings yet
WinGuard System Requirements en 20180228
Document2 pages
WinGuard System Requirements en 20180228
Vikram Singh
No ratings yet
Assignment 1 Computer Fundamentals and Programming 1 History of Computers
Document33 pages
Assignment 1 Computer Fundamentals and Programming 1 History of Computers
Cyrus De Leon
No ratings yet
List
Document34 pages
List
Pendidikan Fisika
No ratings yet
Guide To SATA Hard Disks Installation and RAID Configuration
Document25 pages
Guide To SATA Hard Disks Installation and RAID Configuration
abdulbaset alselwi
No ratings yet
1z0-536 Oracle Exadata 11g Essentials
Document7 pages
1z0-536 Oracle Exadata 11g Essentials
dario_durso
No ratings yet
List of Office Supplies Inventory Office Supplies
Document31 pages
List of Office Supplies Inventory Office Supplies
Alex Sibal
No ratings yet
HP Dealer Price List - June Month PDF
Document4 pages
HP Dealer Price List - June Month PDF
SSV CIVIL CONTRACTORS
No ratings yet