Welcome to Scribd!

14-Learning Emb

Uploaded by

0% found this document useful (0 votes)

11 views8 pages

Standard dimensionality reduction methods like singular value decomposition (SVD) decompose a document-term matrix A of size m x n into three matrices: U of size m x r, S of size r x r, and V of size n x r. U and V contain the left and right singular vectors. S contains the singular values, representing the strength of each concept. Projecting A onto U and V produces low-dimensional embeddings of the documents and terms in an r-dimensional space, where similarities can be computed. For example, a document's embedding would be a vector of its dot products with each of the first r right singular vectors V.

Original Description:

embadding

Original Title

14-learning_emb

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

11 views8 pages

14-Learning Emb

Uploaded by

Imane Ch'atoui

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 8

Search inside document

¡ Standard dimensionality Reduction methods

§ Singular value decompositions (SVD)

n r r n
´ S ´ VT r

m A = U

¡ A: Input data matrix: m x n matrix (e.g., m documents, n terms)

(r : rank of the matrix A – often r < min(m,n) )
¡ U: Left singular vectors: m x r matrix (m documents, r concepts)
¡ S: Singular values: r x r diagonal matrix (strength of each ‘concept’)
¡ V: Right singular vectors: n x r matrix (n terms, r concepts)

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 11
¡ U, V: column orthonormal
§ UT U = I; VT V = I (I: identity matrix)
§ Columns are orthogonal unit vectors hence they
define an r-dimensional subspace
§ U defines an r-dim subspace in Rm
§ V defines an r-dim subspace in Rn

¡ Projecting A onto V and U produces embeddings:

§ Since A = U S VT then AV = U S are row embeddings
§ Since A = U S VT then UTA = S VT are col embeddings

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 12
Ex: compute document & word embeddings
Step 1: given a corpus of documents convert it
to BOW vectors à get a term-document matrix
§ Use term frequencies (tf), or normalize using tf-idf
data science spark Stanford learning
document 1 10 15 3 0 10
document 2 0 9 2 8 2
document 3 1 2 20 0 4
document 4 14 11 1 32 2
document 5 5 1 7 12 5
document 6 6 3 5 1 1
document 7 2 3 5 2 7

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 13
Step 2: apply SVD on the term-document matrix
and pick a value 𝑟 ≤ 𝑟𝑎𝑛𝑘(𝐴)
Here, we set r = 3.

10 15 3 0 10 -0.30 0.41 -0.79

0 9 2 8 2 -0.25 0.03 -0.12 S
1 2 20 0 4 -0.14 0.74 0.5 42.7 0 0
14 11 1 32 2 ~ -0.83 -0.40 0.12 x 0 23.8 0 x
5 1 7 12 5 -0.33 0.11 0.31 0 0 16.7
6 3 5 1 1 -0.13 0.20 -0.04
2 3 5 2 7 -0.14 0.27 -0.05
-0.41 -0.40 -0.20 -0.77 -0.20
A U 0.06 0.21 0.78 -0.45 0.37
VT -0.26 -0.63 0.55 0.40 -0.28
2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 14
Step 3: compute embedding of documents as
emb = [<doc, v1> , <doc, v2> , <doc, v3>]
doc

¡ <doc, v1> = <[10,15,3,0,10] , v1>= -12.7

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 15
¡ Step 3: compute embedding of documents as
emb = [<doc, v1> , <doc, v2> , <doc, v3>]
doc

¡ <doc, v1> = <[10,15,3,0,10] , v1>= -12.7

¡ <doc, v2> = <[10,15,3,0,10] , v2> = 9.79

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 16
¡ Step 3: compute embedding of documents as
emb = [<doc, v1> , <doc, v2> , <doc, v3>]
doc

¡ <doc, v1> = <[10,15,3,0,10] , v1>= -12.7

¡ <doc, v2> = <[10,15,3,0,10] , v2> = 9.79
¡ <doc, v3> = <[10,15,3,0,10] , v3>= -13.9

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 17
¡ Step 3: compute embedding of documents as
emb = [<doc, v1> , <doc, v2> , <doc, v3>]
doc

¡ <doc, v1> = <[10,15,3,0,10] , v1>= -12.7

¡ <doc, v2> = <[10,15,3,0,10] , v2> = 9.79 emb1 = [-12.7, 9.79, -13.9]
¡ <doc, v3> = <[10,15,3,0,10] , v3>= -13.9

2/17/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets 18

Lee J. Bain, Max Engelhardt-Introduction To Probability and Mathematical Statistics (2000)
Document658 pages
Lee J. Bain, Max Engelhardt-Introduction To Probability and Mathematical Statistics (2000)
nadia_anisa22
67% (3)
EFDC DSI Lagrangian Particle Tracking PDF
Document24 pages
EFDC DSI Lagrangian Particle Tracking PDF
Lenin Alfaro Vidal
No ratings yet
AHMED, T.H - Reservoir Engineering Handbook 4ed-727-745
Document19 pages
AHMED, T.H - Reservoir Engineering Handbook 4ed-727-745
fennydwi alisiswoyo
No ratings yet
Diagnostic Test - AP Calculus
Document6 pages
Diagnostic Test - AP Calculus
Shannon Burton
No ratings yet
ES190 Inertia Lab Report Template-DESKTOP-T1PR75R
Document5 pages
ES190 Inertia Lab Report Template-DESKTOP-T1PR75R
krisnaakumar2001
No ratings yet
Stats Quiz 1
Document3 pages
Stats Quiz 1
Ru Ru
No ratings yet
Full Download Business Statistics 10th Edition Groebner Solutions Manual PDF Full Chapter
Document36 pages
Full Download Business Statistics 10th Edition Groebner Solutions Manual PDF Full Chapter
carotidhoom8zgrck
100% (20)
Mid Point Line Drawing Algorithm
Document8 pages
Mid Point Line Drawing Algorithm
Ashley Maina
No ratings yet
Uos Babsfy s2
Document14 pages
Uos Babsfy s2
Yameen Hossain Arif
No ratings yet
Experiment No.3 To Determine The Deflection and Average Normal Stress Along The Concrete Column Shown
Document13 pages
Experiment No.3 To Determine The Deflection and Average Normal Stress Along The Concrete Column Shown
aaqibamin
No ratings yet
Mumba Individual Algos
Document5 pages
Mumba Individual Algos
Musariri Talent
No ratings yet
Calculation of Highway Adzin Interval 15, 17
Document12 pages
Calculation of Highway Adzin Interval 15, 17
Amirul Adzin
No ratings yet
23 Dhieb STA08 476
Document10 pages
23 Dhieb STA08 476
Pranjay Mehta
No ratings yet
Laboratorio#1 AR21 Gil Flores Karina
Document100 pages
Laboratorio#1 AR21 Gil Flores Karina
karina
No ratings yet
Tugas1 Statistikekonomi Rama Ahmadani 2022500086
Document4 pages
Tugas1 Statistikekonomi Rama Ahmadani 2022500086
ramaahmadani696
No ratings yet
Performance Task # 2.
Document6 pages
Performance Task # 2.
Stephanie Borja
No ratings yet
tiểu luận
Document3 pages
tiểu luận
Ngọc Bích
No ratings yet
Structural Analysis Report of 3 Storey Residence With Roof Deck Anwar 12.29.2022
Document34 pages
Structural Analysis Report of 3 Storey Residence With Roof Deck Anwar 12.29.2022
MIHDI PALAPUZ
No ratings yet
Name: Anuj Engineer SAP ID: 60001170004 Subject: A&M Vlsi Index
Document47 pages
Name: Anuj Engineer SAP ID: 60001170004 Subject: A&M Vlsi Index
Prasad mohite
No ratings yet
Experiment 13: To Demonstrate The Voltage Characteristics of Semiconductors
Document8 pages
Experiment 13: To Demonstrate The Voltage Characteristics of Semiconductors
Zain Mehmood
No ratings yet
Electronics I - Laboratory 7 Field Effect Transistors 3. SE: Equipment
Document6 pages
Electronics I - Laboratory 7 Field Effect Transistors 3. SE: Equipment
Eva Natalia
No ratings yet
Group Mini Project Marshall and Arthur-Soft Soil Engineering
Document13 pages
Group Mini Project Marshall and Arthur-Soft Soil Engineering
Marshall
No ratings yet
Typical Truss Analysis by Finite Element Method Based On 2018 IBC / 2019 CBC
Document1 page
Typical Truss Analysis by Finite Element Method Based On 2018 IBC / 2019 CBC
Bißék Śílwàl
No ratings yet
Mnist CNN - Ipynb Colaboratory
Document5 pages
Mnist CNN - Ipynb Colaboratory
Sai Raj Lgtv
No ratings yet
Unit I J Line Drawing Algorithm - DDA Algorithm, Unit I K Bresenham's Algorithm
Document22 pages
Unit I J Line Drawing Algorithm - DDA Algorithm, Unit I K Bresenham's Algorithm
jeved42246
No ratings yet
"Case Study On Distribution System Design " (Darby Company)
Document29 pages
"Case Study On Distribution System Design " (Darby Company)
Anchit Agrawal
No ratings yet
Solution ECE-438, MOS Transistor: W V I K V V V V L Min (V - V
Document10 pages
Solution ECE-438, MOS Transistor: W V I K V V V V L Min (V - V
neeno2013
No ratings yet
CH 12 A1 Note
Document2 pages
CH 12 A1 Note
Sam Rajib
No ratings yet
ECD Lab - 3 USMAN
Document5 pages
ECD Lab - 3 USMAN
Usman jadoon
No ratings yet
MS14E Chapter 21 Final
Document12 pages
MS14E Chapter 21 Final
Hilea Jane Gales
No ratings yet
DinoH HW Scan
Document8 pages
DinoH HW Scan
Sanjin Adilovic
No ratings yet
Engr. Jayson Cernado Melecio, C.E., RMP, Mat - Eng 1, AASEP 8-Sep-21
Document2 pages
Engr. Jayson Cernado Melecio, C.E., RMP, Mat - Eng 1, AASEP 8-Sep-21
MARKCHRISTMAS
No ratings yet
Day - 3 3.3
Document10 pages
Day - 3 3.3
ujjwalsingh8509
No ratings yet
STAT 211 Exam 1 - Form A SPRING2004: (C) Coefficient of Variation Because Those Datasets Have Different Units
Document4 pages
STAT 211 Exam 1 - Form A SPRING2004: (C) Coefficient of Variation Because Those Datasets Have Different Units
clara-85670
No ratings yet
2012 - The Statistical Properties of The Maximum Drawdown in Financial Time Series
Document10 pages
2012 - The Statistical Properties of The Maximum Drawdown in Financial Time Series
Carlos
No ratings yet
Experiment 13: To Demonstrate The Voltage Characteristics of Semiconductors
Document9 pages
Experiment 13: To Demonstrate The Voltage Characteristics of Semiconductors
Zain Mehmood
No ratings yet
Position: T GT GT
Document6 pages
Position: T GT GT
Halil Emre
No ratings yet
PHYSICS LAB MAN Women Swwihddwk
Document10 pages
PHYSICS LAB MAN Women Swwihddwk
ndskn
No ratings yet
12.SPC 21+1
Document1 page
12.SPC 21+1
BHUSHAN
No ratings yet
QS (Na) - Elementary Data Analysis
Document6 pages
QS (Na) - Elementary Data Analysis
Aliff Daniel
No ratings yet
Lasdt Lecture Guide in MMW 2024
Document5 pages
Lasdt Lecture Guide in MMW 2024
Rodelyn Ping-ay Salcedo
No ratings yet
Homework Problems For Basic Electronic Circuits : Chapter 5 Mosfets
Document18 pages
Homework Problems For Basic Electronic Circuits : Chapter 5 Mosfets
kashi fuuast
No ratings yet
Sushant Saurav: Job Information
Document10 pages
Sushant Saurav: Job Information
Sushant Saurav
No ratings yet
ME-22015 (CH-12, 1 ST Week)
Document21 pages
ME-22015 (CH-12, 1 ST Week)
ichijo raku
No ratings yet
Brief Solution Manual For Students, 7 Edition: @seismicisolation
Document71 pages
Brief Solution Manual For Students, 7 Edition: @seismicisolation
Azmi Bazazou
No ratings yet
Experiment 13: To Demonstrate The Voltage Characteristics of Semiconductors
Document7 pages
Experiment 13: To Demonstrate The Voltage Characteristics of Semiconductors
Zain Mehmood
No ratings yet
En Tanagra Kohonen SOM R
Document21 pages
En Tanagra Kohonen SOM R
Vbg Da
No ratings yet
RainFall - Prediction - Ipynb - Colaboratory
Document7 pages
RainFall - Prediction - Ipynb - Colaboratory
Balaji Chowdary
No ratings yet
Esame Quality Data Analysis 2018+07+23
Document18 pages
Esame Quality Data Analysis 2018+07+23
Giuseppe D'Angelo
No ratings yet
Multi Rate Operations
Document14 pages
Multi Rate Operations
Saurabh Barle
No ratings yet
Whole Project Report
Document50 pages
Whole Project Report
Anjaanjana
No ratings yet
Example 1: Riding Mowers
Document6 pages
Example 1: Riding Mowers
akirank1
No ratings yet
SE PWA Eta Photoproduction-1st Iteration: Mainz, Tuzla, Zagreb
Document33 pages
SE PWA Eta Photoproduction-1st Iteration: Mainz, Tuzla, Zagreb
Hedim Osmanovic
No ratings yet
PERHITUNGAN Lensa
Document5 pages
PERHITUNGAN Lensa
Reyhandi Rachmad
No ratings yet
Çenti̇k Darbe Beli̇rsi̇zli̇ği̇
Document6 pages
Çenti̇k Darbe Beli̇rsi̇zli̇ği̇
Delil Ozan
No ratings yet
CS 332: Algorithms: Merge Sort Solving Recurrences The Master Theorem
Document27 pages
CS 332: Algorithms: Merge Sort Solving Recurrences The Master Theorem
Gaurav Tripathi
No ratings yet
Chapter 11, Solution 130.: Motion of Truck. Acceleration
Document2 pages
Chapter 11, Solution 130.: Motion of Truck. Acceleration
LUIS ALEXANDER RODRIGUEZ ZAPATA
No ratings yet
Power System Analysis: Newton-Raphson Power Flow
Document44 pages
Power System Analysis: Newton-Raphson Power Flow
Penny87
No ratings yet
Instructor's Manual to Accompany CALCULUS WITH ANALYTIC GEOMETRY
From Everand
Instructor's Manual to Accompany CALCULUS WITH ANALYTIC GEOMETRY
Sam Stuart
No ratings yet
Summer Bridge Math, Grades 1 - 2
From Everand
Summer Bridge Math, Grades 1 - 2
Summer Bridge Activities
No ratings yet
A Second Course in Calculus
From Everand
A Second Course in Calculus
Harley Flanders
No ratings yet
Reviewer Gen Math
Document1 page
Reviewer Gen Math
Venise Revilla
No ratings yet
Multivariable Functions
Document2 pages
Multivariable Functions
Prochetto Da
No ratings yet
Calculus Problems
Document242 pages
Calculus Problems
Eq Brown
No ratings yet
Gamma Function - Wikipedia
Document28 pages
Gamma Function - Wikipedia
bibbiblos
No ratings yet
Ouyed Chapter 6 Numerical Integration PDF
Document18 pages
Ouyed Chapter 6 Numerical Integration PDF
Raghav
No ratings yet
Fourier Series-1
Document12 pages
Fourier Series-1
Bijivemula Sruthi Reddy
No ratings yet
03 01 Laplace Transforms Slides Handout
Document57 pages
03 01 Laplace Transforms Slides Handout
XavimVXS
100% (2)
Calculus of One Variable Topic 1.1: FUNCTIONS: Assoc. Prof. Dr. Loh Wei Ping
Document21 pages
Calculus of One Variable Topic 1.1: FUNCTIONS: Assoc. Prof. Dr. Loh Wei Ping
alimurtadha
No ratings yet
2 - Matrices and Tensors
Document8 pages
2 - Matrices and Tensors
amd mhm
No ratings yet
θ+cos θ ≡1 θ+1 ≡ sec θ θ ≡ csc θ a b a x, x ≠ 0: Algebraic Operations with the Trigonometric Functions
Document6 pages
θ+cos θ ≡1 θ+1 ≡ sec θ θ ≡ csc θ a b a x, x ≠ 0: Algebraic Operations with the Trigonometric Functions
charmaine pador
No ratings yet
Algebraic Geometry
Document331 pages
Algebraic Geometry
alin444444
No ratings yet
Parent Function Project
Document1 page
Parent Function Project
nb_sequim
No ratings yet
Heperpolic Equation
Document43 pages
Heperpolic Equation
naveenbabu19
No ratings yet
Function Maths CPP
Document20 pages
Function Maths CPP
Malyadip Pal
No ratings yet
102 ITF - Sheet
Document42 pages
102 ITF - Sheet
Abhishek Yadav
No ratings yet
Homework 8 Solutions
Document10 pages
Homework 8 Solutions
ijji
No ratings yet
Bicomplex Number
Document4 pages
Bicomplex Number
Giannis Pardalis
No ratings yet
Rr210404 Signals and Systems
Document1 page
Rr210404 Signals and Systems
sivabharathamurthy
No ratings yet
Inequality Messi Red3
Document34 pages
Inequality Messi Red3
VanHieu1997
No ratings yet
Engineering Mathematics III: EG 2104 SH
Document2 pages
Engineering Mathematics III: EG 2104 SH
Subas Shrestha
No ratings yet
USA Mathematical Talent Search Solutions To Problem 3/4/16
Document2 pages
USA Mathematical Talent Search Solutions To Problem 3/4/16
Arsy
No ratings yet
FRQ Review 1 - W - Teacher Notes
Document4 pages
FRQ Review 1 - W - Teacher Notes
Josh Yan
No ratings yet
Pages From Laplace - Table v1
Document1 page
Pages From Laplace - Table v1
Bobi Dimitrievski
No ratings yet
Putnam 1994
Document2 pages
Putnam 1994
Kranthi Kiran
No ratings yet
+2 Assignment-2 Matrices
Document2 pages
+2 Assignment-2 Matrices
Prince
No ratings yet
Lesson 7.1 Basic Properties and Laws of Logarithms
Document10 pages
Lesson 7.1 Basic Properties and Laws of Logarithms
Kirsten Shayne Maningas
100% (1)
Completeexam v2
Document18 pages
Completeexam v2
adoplh
No ratings yet
MIT Exercises
Document11 pages
MIT Exercises
Sorin Miu
No ratings yet
Real Analysis 18
Document3 pages
Real Analysis 18
rapsjade
No ratings yet