Open navigation menu

Welcome to Scribd!

Web Mining 5 URLs Cosine Similarity

Uploaded by

ANUBHAV BHANDARY 19BCE2483

0% found this document useful (0 votes)

4 views4 pages

Original Title

19BCE2483 DA1

Copyright

© © All Rights Reserved

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

© All Rights Reserved

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views4 pages

Web Mining 5 URLs Cosine Similarity

Uploaded by

ANUBHAV BHANDARY 19BCE2483

Copyright:

© All Rights Reserved

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Web Mining

19BCE2483

Anubhav Bhandary

prob.1.

Vectorize the contents of the web pages and create a data frame for atleast 5 web pages.

———————

Code:

import re

import pandas as pd

from bs4 import BeautifulSoup

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords

from urllib import request

sw=set(stopwords.words('english'))

list=[]

urls=["https://vit.ac.in/academics/home",

"https://vit.ac.in/admissions/international/overview",

"https://vit.ac.in/internationalrelations/SAP",

"https://vit.ac.in/placements/internship",

"https://vit.ac.in/academics/library"]

for url in urls:

html=request.urlopen(url).read().decode('utf8')

raw=BeautifulSoup(html,'html.parser').get_text()

print(len(raw))

tokens=word_tokenize(raw)

list_of_list=[]

for i in tokens:

temp=re.sub(r"[^a-zA-Z0-9]+"," ",i)

temp=temp.strip()

if len(temp)>2 and temp not in sw:

list_of_list.append(temp)

list.append(list_of_list)

ss=set()

for i in list:

ss=ss|set(i)

d={}

for i in ss:

d[i]=[]

for j in range(5):

d[i].append(list[j].count(i))

df=pd.DataFrame(d)

df.to_csv(‘fileA.csv')

Output: (The screenshot of the spreadsheet is not complete.)

Prob.2.

import pandas as pd

import math

import numpy as np

df=pd.read_csv("fileA.csv")

for i in range(5):

for j in range(len(df.columns)):

df.iloc[i,j]=math.log(1+df.iloc[i,j])

print(df.head())

arr=[]

for i in range(5):

arr.append(list(df.iloc[i]))

max_score=0

for i in range(5):

for j in range(i+1,5):

a=arr[i]

b=arr[j]

dot=np.dot(a,b)

norm_a=np.linalg.norm(a)

norm_b=np.linalg.norm(b)

cosine_value=dot/(norm_a * norm_b)

if cosine_value>max_score:

similar_d=(i,j,round(cosine_value,5))

max_score=cosine_value

print(i,j,round(cosine_value,5))

print("The most similar document is : ",similar_d)

Output:

You might also like

Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
25_Awesome_Python_Scripts
Document26 pages
25_Awesome_Python_Scripts
moises tinte
No ratings yet
Web Mining: Cosine Similarity of Log-Transformed CSV Data
Document2 pages
Web Mining: Cosine Similarity of Log-Transformed CSV Data
ANUBHAV BHANDARY 19BCE2483
No ratings yet
Computer
Document37 pages
Computer
Palak
No ratings yet
Prototype 13
Document1 page
Prototype 13
Yemi Towobola
No ratings yet
Python Lab
Document16 pages
Python Lab
Siddharth
No ratings yet
Spark and Scala 2
Document11 pages
Spark and Scala 2
vinodnerella
No ratings yet
ADA Lab Manual
Document88 pages
ADA Lab Manual
Maithreyi G Rao
No ratings yet
Database Assignment FA22 IET 001,003,021,023 1
Document8 pages
Database Assignment FA22 IET 001,003,021,023 1
aliammad351
No ratings yet
Cs Activity
Document29 pages
Cs Activity
hariharan97g
No ratings yet
50 Useful Python Scripts Free PDF
Document65 pages
50 Useful Python Scripts Free PDF
Kk
100% (1)
Python Note 3
Document11 pages
Python Note 3
Coding Knowledge
No ratings yet
Python RR
Document39 pages
Python RR
Rachna
No ratings yet
Lab Programming in C
Document87 pages
Lab Programming in C
Saket
No ratings yet
Analyzing US Economic Data and Building Dashboard - IBM Watson Studio
Document1 page
Analyzing US Economic Data and Building Dashboard - IBM Watson Studio
NKB Srest
No ratings yet
Analyzing US Economic Data and Building Dashboard - IBM Watson Studio
Document1 page
Analyzing US Economic Data and Building Dashboard - IBM Watson Studio
NKB Srest
No ratings yet
Term 2 CS Practical File 2021-22
Document29 pages
Term 2 CS Practical File 2021-22
Pratyush Vishwakarma
No ratings yet
CS6212 PDS Lab Manual CSE 2013 Regulations
Document52 pages
CS6212 PDS Lab Manual CSE 2013 Regulations
heyramzz
No ratings yet
Intro To Py and ML - Part 2
Document10 pages
Intro To Py and ML - Part 2
KAORU Amane
No ratings yet
CSE 3024: Web Mining Lab Assessment - 3 Decision Tree vs Naive Bayes Performance
Document13 pages
CSE 3024: Web Mining Lab Assessment - 3 Decision Tree vs Naive Bayes Performance
Nikitha Reddy
No ratings yet
INFORMATIC Complete Project
Document27 pages
INFORMATIC Complete Project
wildindiangaming
No ratings yet
Library Management System Project
Document19 pages
Library Management System Project
Niketan Bhatt
100% (1)
Adil Practicall Final
Document48 pages
Adil Practicall Final
adil hussain
No ratings yet
035 Assignment PDF
Document14 pages
035 Assignment PDF
Tman Letswalo
No ratings yet
Kunj Project 2
Document31 pages
Kunj Project 2
kunj123sharma
No ratings yet
Advanced Data Structures C++ Lab Guide
Document114 pages
Advanced Data Structures C++ Lab Guide
Neeraj Soni
No ratings yet
Modifiedip
Document27 pages
Modifiedip
sayantuf17
No ratings yet
Neo4j Graph Database Research Papers
Document18 pages
Neo4j Graph Database Research Papers
ert
No ratings yet
SESION 10 (Pandas 2)
Document120 pages
SESION 10 (Pandas 2)
2marlenehh2003
No ratings yet
AWP Practicals demonstrate string operations and numeric programs
Document94 pages
AWP Practicals demonstrate string operations and numeric programs
Sanket Dalvi
No ratings yet
Python Practicals
Document14 pages
Python Practicals
Mr.Sachin Harne Rungta
No ratings yet
EM622 Data Analysis and Visualization Techniques For Decision-Making
Document47 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
Ridhi B
No ratings yet
Appendix B: Source Code
Document5 pages
Appendix B: Source Code
AISHWARYA S
No ratings yet
Design Analysis of Algorithm Lab Practical File
Document41 pages
Design Analysis of Algorithm Lab Practical File
Mayank
No ratings yet
Data Structure
Document65 pages
Data Structure
Aayush
No ratings yet
Compiler Lab File
Document44 pages
Compiler Lab File
H04Rishi Uttam
No ratings yet
Examination Module System
Document22 pages
Examination Module System
Ronit Roy
No ratings yet
Library Management System Project
Document27 pages
Library Management System Project
Niketan Bhatt
0% (1)
Actividad 4 - Componentes Conectados - Colaboratory
Document27 pages
Actividad 4 - Componentes Conectados - Colaboratory
Ana Velez
No ratings yet
Pandas Series and DataFrame Practicals
Document18 pages
Pandas Series and DataFrame Practicals
Rudra Dewangan
No ratings yet
6-10 PYTHON LAB PROGRAM
Document16 pages
6-10 PYTHON LAB PROGRAM
abcd12341109
No ratings yet
Python 2 Lab Esy
Document34 pages
Python 2 Lab Esy
Sharukh Hussain
No ratings yet
XII - Informatics Practices (LAB MANUAL)
Document42 pages
XII - Informatics Practices (LAB MANUAL)
ghun assudani
No ratings yet
CP Man
Document37 pages
CP Man
Chirag Patel
No ratings yet
Correccion
Document14 pages
Correccion
uknowiadoreu
No ratings yet
TPCV1+2+3+4.ipynb - Colaboratory
Document70 pages
TPCV1+2+3+4.ipynb - Colaboratory
douaa khila
No ratings yet
Dsa Lab File
Document97 pages
Dsa Lab File
Bhupesh Dhapola
No ratings yet
Examination Module System
Document34 pages
Examination Module System
aryabhumika946
No ratings yet
Final Print Py Spark
Document133 pages
Final Print Py Spark
Shivaraj K
No ratings yet
Pyspark Funcamentals
Document10 pages
Pyspark Funcamentals
mamatha
No ratings yet
Fundamental Pyspark Operations 1708364268
Document10 pages
Fundamental Pyspark Operations 1708364268
technicarguru
No ratings yet
Information Retrieval Journal
Document33 pages
Information Retrieval Journal
crazzy demon
No ratings yet
Fs Lab
Document52 pages
Fs Lab
Monica Sudarshan
No ratings yet
Spring JDBC Example
Document4 pages
Spring JDBC Example
Ariel Cupertino
No ratings yet
Simple Sqli Dumper
Document8 pages
Simple Sqli Dumper
Alex Rozack
No ratings yet
Holidays Homework - 20231204 - 195647 - 0000
Document15 pages
Holidays Homework - 20231204 - 195647 - 0000
Arshpreet Singh
No ratings yet
Fop 11
Document18 pages
Fop 11
Abdul Rehman Khan Tareen
No ratings yet
EmployeeMgmt XII IP ProjectReprot 2022 23
Document16 pages
EmployeeMgmt XII IP ProjectReprot 2022 23
ushavalsa
No ratings yet
PYTHON
Document2 pages
PYTHON
bkcc.feedback
No ratings yet
Panda
Document33 pages
Panda
kr
No ratings yet
02wireshark HTTP v6.1
Document7 pages
02wireshark HTTP v6.1
Edgar Bustamante
No ratings yet
Cse3024: Web Mining Lab: Assessment - 5
Document2 pages
Cse3024: Web Mining Lab: Assessment - 5
ANUBHAV BHANDARY 19BCE2483
No ratings yet
Web Crawler Techniques Compared
Document4 pages
Web Crawler Techniques Compared
ANUBHAV BHANDARY 19BCE2483
No ratings yet
Import Java - Io. Import Java - Util.Arraylist Import Java - Util.Iterator Import Java - Util.Scanner Public Class Main (
Document3 pages
Import Java - Io. Import Java - Util.Arraylist Import Java - Util.Iterator Import Java - Util.Scanner Public Class Main (
ANUBHAV BHANDARY 19BCE2483
No ratings yet
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
From Everand
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Pedro Domingos
Rating: 4.5 out of 5 stars
4.5/5 (107)
Generative AI: The Insights You Need from Harvard Business Review
From Everand
Generative AI: The Insights You Need from Harvard Business Review
Harvard Business Review
Rating: 4.5 out of 5 stars
4.5/5 (2)
Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
From Everand
Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
Mo Gawdat
Rating: 4.5 out of 5 stars
4.5/5 (55)
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
From Everand
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Alec Rowe
No ratings yet
Deep Utopia: Life and Meaning in a Solved World
From Everand
Deep Utopia: Life and Meaning in a Solved World
Nick Bostrom
No ratings yet
Summary of Mustafa Suleyman's The Coming Wave
From Everand
Summary of Mustafa Suleyman's The Coming Wave
Milkyway Media
No ratings yet
Artificial Intelligence: The Insights You Need from Harvard Business Review
From Everand
Artificial Intelligence: The Insights You Need from Harvard Business Review
Thomas H. Davenport
Rating: 4.5 out of 5 stars
4.5/5 (104)
The AI Advantage: How to Put the Artificial Intelligence Revolution to Work
From Everand
The AI Advantage: How to Put the Artificial Intelligence Revolution to Work
Thomas H. Davenport
Rating: 4 out of 5 stars
4/5 (7)
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
From Everand
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
Alec Rowe
No ratings yet
ChatGPT: Is the future already here?
From Everand
ChatGPT: Is the future already here?
Rodrigo Serzedello
Rating: 4 out of 5 stars
4/5 (21)
Artificial Intelligence: A Guide for Thinking Humans
From Everand
Artificial Intelligence: A Guide for Thinking Humans
Melanie Mitchell
Rating: 4.5 out of 5 stars
4.5/5 (30)
Who's Afraid of AI?: Fear and Promise in the Age of Thinking Machines
From Everand
Who's Afraid of AI?: Fear and Promise in the Age of Thinking Machines
Thomas Ramge
Rating: 4.5 out of 5 stars
4.5/5 (13)
Midjourney Mastery - The Ultimate Handbook of Prompts
From Everand
Midjourney Mastery - The Ultimate Handbook of Prompts
Andreea Todinca
Rating: 4.5 out of 5 stars
4.5/5 (2)
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
From Everand
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Alec Rowe
No ratings yet
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
From Everand
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Sanket Subhash Khandare
No ratings yet
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
From Everand
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
John Adamssen
Rating: 4 out of 5 stars
4/5 (15)
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
From Everand
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
Ben Preston
No ratings yet
AI Money Machine: Unlock the Secrets to Making Money Online with AI
From Everand
AI Money Machine: Unlock the Secrets to Making Money Online with AI
Lucas Bennett
No ratings yet
ChatGPT For Dummies
From Everand
ChatGPT For Dummies
Pam Baker
No ratings yet
Power and Prediction: The Disruptive Economics of Artificial Intelligence
From Everand
Power and Prediction: The Disruptive Economics of Artificial Intelligence
Ajay Agrawal
Rating: 4.5 out of 5 stars
4.5/5 (38)
Artificial Intelligence For Dummies
From Everand
Artificial Intelligence For Dummies
John Mueller
Rating: 4.5 out of 5 stars
4.5/5 (15)
Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments
From Everand
Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments
Sol Rashidi
No ratings yet
101 Midjourney Prompt Secrets Volume 2
From Everand
101 Midjourney Prompt Secrets Volume 2
Marcus Byrne
Rating: 5 out of 5 stars
5/5 (1)
AI and Machine Learning for Coders: A Programmer's Guide to Artificial Intelligence
From Everand
AI and Machine Learning for Coders: A Programmer's Guide to Artificial Intelligence
Laurence Moroney
Rating: 4 out of 5 stars
4/5 (2)
1200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library.
From Everand
1200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library.
Amaru Frank
No ratings yet
Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery
From Everand
Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery
Ben Preston
No ratings yet
2084: Artificial Intelligence and the Future of Humanity
From Everand
2084: Artificial Intelligence and the Future of Humanity
John C. Lennox
Rating: 4 out of 5 stars
4/5 (81)
Writing AI Prompts For Dummies
From Everand
Writing AI Prompts For Dummies
Stephanie Diamond
No ratings yet
Demystifying Prompt Engineering: AI Prompts at Your Fingertips (A Step-By-Step Guide)
From Everand
Demystifying Prompt Engineering: AI Prompts at Your Fingertips (A Step-By-Step Guide)
Harish Bhat
Rating: 4 out of 5 stars
4/5 (1)
100+ Amazing AI Image Prompts: Expertly Crafted Midjourney AI Art Generation Examples
From Everand
100+ Amazing AI Image Prompts: Expertly Crafted Midjourney AI Art Generation Examples
Prompt Engineering Publishing
No ratings yet