Professional Documents
Culture Documents
FILTERING
A SEMINAR REPORT
Submitted by
G JAYANDRAN [RA1811027020039]
M MUSTAFA [RA1811027020041]
A RUSHAB [RA1811027020046]
Under the guidance of
Mrs. R. SUJEETHA
(Assistant Professor, Department of Computer Science and Engineering)
degree of
BACHELOR OF TECHNOLOGY
in
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Submitted for the Viva Voce Examination held on 07-12-2020 at SRM Institute of Science
and Technology , Ramapuram Campus, Chennai -600089
• Being in the era of advance technology, people are more open minded and rely
heavily on modern applications for buying accessories, watching movies, learning
resources and stuffs related to their daily needs. Due to this perpetual rise in
everything being available online and digitally , the organizations are relying on
machine learning based technologies which are helping them in seeking the actual
targeted users with less efforts as compared to earlier methods of advertisement and
also helping out the user to have a great user experience by recommending them the
related products so the users don't have to rely on their competitions to get the desired
products. We are developing this Technology which helps us to understand the
requirements and gives recommendation for the products and services searched by the
user by comparing their previous history. This model compares various machine
learning algorithms for recommendation of various product buying pattern by users
and gives more accurate result related to search. It also carries all buying and the
watched history of user, so it is easy for the model to predict the best option to buy
and recommend the product.
• Key Words: -Machine Learning, Filtering, Recommendation system,
TABLE OF CONTENTS
. LIST OF FIGURES 5
LIST OF TABLES 6
LIST OF ABBREVATIONS 7
1. INTRODUCTION 8
1.1 Background 8
9
1.2 Problem Statement
9
1.3 Objective
9
1.4 Scope of Project
2. LITERATURE REVIEW 10
3. SYSTEM SPECIFICATION 17
LIST OF TABLES
CHAPTER 1
INTRODUCTION
1.1Background
A recommender system, or a recommendation system, is a subclass of information filtering
system that seeks to predict the "rating" or "preference" a user would give to an item. They
are primarily used in commercial applications.Collaborative filtering refers to user-to-user
association. It follows the concept that if two or more folks have identical interests in one
area then there is a likelihood that they will get attracted towards similar products or items of
some other category as well. Implicit and explicit user ratings are considered to compute
similarity between two or more users. While implicit ratings are derived from user browsing
pattern and click- through rate, explicit ratings are delivered by the user himself. The options
of people you may know, suggested posts, similar pages you may like, suggested pokes,
displayed on Facebook, are the examples of collaborative filtering. These are nothing but
recommendations, based on features like number of mutual friends, similar pages liked or
number of mutual groups, locations a user has been to or belongs to etc. For example, if two
users have mutual friends, then the possibility is that they two may know each other as
1.3 Objective
The objective of our project is to give course recommendations using collaborative
filtering.
The objective of recommender systems is to provide recommendations based on
recorded information on the users' preferences. These systems use information
filtering techniques to process information and provide the user with potentially more
relevant items.
Recommender system has the ability to predict whether a particular user would prefer
an item or not based on the user's profile. Recommender systems are beneficial to
both service providers and users They reduce transaction costs of finding and
selecting items in an online shopping environment
CHAPTER 2
LITERATURE REVIEW
2.1 LITRETURE OVERVIEW
To build a system that can automatically recommend items to users based on the preferences
of other users, the first step is to find similar users or items. The second step is to predict the
ratings of the items that are not yet rated by a user. So, you will need the answers to these
questions:
How do you determine which users or items are similar to one another?
Given that you know which users are similar, how do you determine the rating that a user
would give to an item based on the ratings of similar users?
How do you measure the accuracy of the ratings you calculate?
• The first two questions don’t have single answers. Collaborative filtering is a family of
algorithms where there are multiple ways to find similar users or items and multiple ways to
calculate rating based on ratings of similar users. Depending on the choices you make, you
end up with a type of collaborative filtering approach.
• One important thing to keep in mind is that in an approach based purely on collaborative
filtering, the similarity is not calculated using factors like the age of users, genre of the
movie, or any other data about users or items. It is calculated only on the basis of the rating
(explicit or implicit) a user gives to an item.
• The third question for how to measure the accuracy of your predictions also has multiple
answers, which include error calculation techniques that can be used in many places and not
just recommenders based on collaborative filtering.
• One of the approaches to measure the accuracy of your result is the Root Mean Square
Error (RMSE), in which you predict ratings for a test dataset of user-item pairs whose rating
values are already known. The difference between the known value and the predicted value
would be the error. Square all the error values for the test set, find the average (or mean), and
then take the square root of that average to get the RMSE.
• Another metric to measure the accuracy is Mean Absolute Error (MAE), in which you find
the magnitude of error by finding its absolute value and then taking the average of all error
values <name>(2020) mainly incorporated collaborative filtering mainly following two
approaches that is Memory-Based approach and the Model-Based approach to achieve an
efficient actively filtering system for recommending movies.
Memory Based approach is based on taking a matrix of preferences for items by users
using this matrix to predict missing preferences and recommend items with high predictions.
Simply stated “Item-Item CF and User-Item CF”
Model Based approach, also known as KNN collaborative system that is proposed
using cosine similarity to calculate distance between the target movie and every other movie
on dataset and then it ranks the top ‘K’ nearest similar movie. This methodology was
implemented using two datasets.
The Item-based CF was used in obtaining better results.The system proposed by the
paper was observed to be more reliable and accurate compared to the existing system. On the
other hand, KNN approach turned out to be faster and predictive in nature and low
calculation time.
Paper explains about content-based filtering and mainly focus on collaborative filtering and
the two popular collaborative filtering approaches:
1.Memory-Based Approach: It is based on taking a matrix of preferences for items by users
using this matrix to predict missing preferences and recommend items with high predictions.
Simply stated “Item-Item CF and User-Item CF”
2.Model-Based Approach: KNN collaborative recommendation system is proposed using
cosine similarity to calculate distance between the one course and every other course on
dataset and then it ranks the top ‘K’ nearest similar course.
The paper also implements the idea of the proposed system with 2 data sets.
For years, E-learning systems are around to provide students and learners with virtual
educational environments in which they have no need for others' assistance in the process of
learning. The major goal of such systems is to improve education and learning levels of users
through leveraging the system’s facilities. The core component of a working and efficient e-
learning system is its recommender system. Due to importance of personalization of e-
learning platforms, developing recommender systems for such systems has been become one
of the most important research areas in the field. We use Hybrid Recommender
systems.Architecture of the proposed E-learning system along with Neural network based
matching algorithm powered by user-specific patterns and system capabilities. The advantage
of using hybrid approach is toPresence of a knowledge base for decision making process
plus to improve them. But Hidden patterns in system’s log and users’ behavior is not
explored in this architecture.
2.2 WHAT IS RECOMMENDATION SYSTEM & ITS TYPES
A recommendation engine is a system that suggests products, services, information to users
based on analysis of data. Notwithstanding, the recommendation can derive from a variety of
Recommendation systems are quickly becoming the primary way for users to expose to the
density and product overload, a recommendation engine provides an efficient way for
Types:
It’s the most sought after, most widely implemented and most mature technologies that is
available in the market. Collaborative recommender systems aggregate ratings or
recommendations of objects, recognize commonalities between the users on the basis of their
ratings, and generate new recommendations based on inter-user comparisons. The greatest
strength of collaborative techniques is that they are completely independent of any machine-
readable representation of the objects being recommended and work well for complex objects
where variations in taste are responsible for much of the variation in preferences.
Collaborative filtering is based on the assumption that people who agreed in the past will
agree in the future and that they will like similar kind of objects as they liked in the past.
This system aims to categorize the users based on attributes and make recommendations
based on demographic classes. Many industries have taken this kind of approach as it’s not
that complex and easy to implement. In Demographic-based recommender system the
algorithms first need a proper market research in the specified region accompanied with a
short survey to gather data for categorization. Demographic techniques form “people-to-
people” correlations like collaborative ones, but use different data. The benefit of a
demographic approach is that it does not require a history of user ratings like that in
collaborative and content-based recommender systems.
Due to different scenarios of availability of training data, test data and evaluation of teaching
methods, the following types of machine learning algorithms can be distinguished:
a) Supervised learning
In the case of learning with the teacher, the algorithm receives training data in which the
output value known from the input data is known. This is one of the most popular learning
methods.
b) Unsupervised learning
In contrast to the teaching method with the teacher, the algorithm receives training data that
does not take into account which output value should be obtained from the input data. In this
scenario, the assessment of the extent to which the algorithm has mastered the training data
can be troublesome.
c) emi-supervised learning
Training data, while partially supervised, consist of samples having the expected initial value
as well as samples that do not have it. This method is popular when the input data is easy to
obtain, but the output data is much more expensive.
d) Reinforcement learning
The training and testing phases are combined in a reinforcement approach. The learned
algorithm, by interacting with the environment, collects data. He receives, depending on the
action taken, a reward or penalty. The purpose of this method is to maximize the reward for
the learned algorithm.
TABLE: -1
DESCRIPTION
Collaborative filtering technique has more advantages over other techniques.
The main advantage over content-based filtering technique is that it improves
the performance of the recommender system and gives better recommendation
because it also considers other similar users interest and past history of that
similar users for giving the recommendation/suggestions to the user.
CHAPTER 3
SYSTEM SPECIFICATION
3.1 System Requirements
In order to accomplish the project, we required,
Anaconda Navigator
Python 3.6 version
NumPy Library
PyAutoGUI Library
CHAPTER 4
IMPLEMENTATION
4.1 Explanation About Project
This project aims to develop a tool for recommending course to users based on their interest.
People like to get recommendations easily without any difficulty. For that purpose, instead of
using internet or other people reference, we built a recommendation system using
collaborative filtering which gives users a better choice for studying. By using machine
learning techniques and algorithms, the machine detects the inputs of the user and
recommends for the further courses. Firstly, the user gives the input based on his knowledge
and then the recommendations are given to user after the system makes the best
recommendation to learn. This system uses collaborative filtering.
LIBRARIES
In order to materialize an active data filtering system irrespective of its type, there are
certain support packages that are needed to be involved buildo one such system. The
methodology that we propose in this project involves Python programming language and its
libraries as the backbone. Libraries are needed to be installed using pip from the terminal and
required libraries are:
1. NumPy
NumPy is a low-level library written in C (and Fortran) for high level mathematical
functions. NumPy cleverly overcomes the problem of running slower algorithms on
Python by using multidimensional arrays and functions that operate on arrays. Any
algorithm can then be expressed as a function on arrays, allowing the algorithms to be
run quickly.
2. SciPy
It is a library that uses NumPy for more mathematical functions. SciPy uses NumPy
arrays as the basic data structure, and comes with modules for various commonly used
tasks in scientific programming, including linear algebra, integration (calculus),
ordinary differential equation solving, and signal processing.
3. Pandas
Pandas is a data manipulation library based on NumPy which provides many useful
functions for accessing, indexing, merging, and grouping data easily. The main data
structure (Data Frame) is close to what could be found in the R statistical package;
that is, heterogeneous data tables with name indexing, time series operations, and
auto-alignment of data.
4.2How to do it?
Open up the application.
Get suggestions.
Step-2: Create a new account and login using the credentials. The new account which is
created, their credentials gets stored in sql data base.
Step-3: Enter the previously studied subjects. This technology helps us to understand the
requirements and gives recommendation for the products and services searched by the user
by comparing their previous history. This model compares various machine learning
algorithms for recommendation of various product buying pattern by users and gives more
accurate result related to search. It also carries all buying and the watched history of user, so
it is easy for the model to predict the best option to buy and recommend the product.
4.7 RESULT
Login Page Interface: Figure 3
It is the interface when the user opens the application and should click login to enter the
credentials.
It is the interface where user can enter the credentials and enter to his personal dashboard.
After Login click “Get Recommendations” Figure 5
It is the interface where the user need to click on the get recommendations.
It is the interface to enter the gained knowledge and give the knowledge rating so that the
system or the application can verify it.
After Entering Gained Knowledge Course Figure 7
After entering the gained knowledge and rating click on the get recommendation.
After clicking on the get recommendation, you will get the output on what courses you can
study and also it gives some recommendation of courses which doesn’t require pre-requisite
knowledge .
CHAPTER 5
CONCLUSION
By using machine learning techniques and algorithms, the machine is able to detect the
movement of palm of the player and the game has been played with hand gestures of the
player in an easy way. By using machine learning techniques and algorithms, Before
recommending course to user , all these levels of filtering is consolidated together by making
the data pass through each filtering funnels - CBF, Collab, Demographic BF and then the
products with the highest favorable score is presented to the customer which help in
achieving robustness dues to the hybrid nature of the model
FUTURE SCOPE
Since recommender system is a project based on machine learning algorithms and artificial
intelligence , it requires large sql data base meory and also it should provide almost 100%
accuracy there are many enhancements to be done, such as: -
• Data should be updated regularly.
• Since in future, data will be in large scale there will be required of content based filtering
based on the user interest and gained previous knowledge.
• The recommender system should give all the related outputs based on the user input.
• The application should be compatible with the latest technology and should be more user
friendly.
• A recommender systems are an important research field today. Rapidly, increasing in data
size like a number of items and users over sites raises the big data analysis techniques like
Spark, Map-Reduce, Apache Hadoop, etc . Recommender system used to recommend course
to the user according to their interests and previously gained knowledge.
CHAPTER 6
REFERENCES
6) Qian Wang, Xianhu Yuan, Min Sun “Collaborative Filtering Recommendation Algorithm
based on Hybrid User Model”, FSKD, 2010.
7) Katarya, Rahul, and Om Prakash Verma. "A collaborative recommender system enhanced
with particle swarm optimization technique." Multimedia Tools and Applications 75.15
(2016): 9225-9239.
8) Xie, Li, Wenbo Zhou, and Yaosen Li. "Application of Improved Recommendation System
Based on Spark Platform in Big Data Analysis." Cybernetics and Information Technologies
16.6 (2016): 245-255
10) P.Priyanga , Dr.A.R.Nadira Banu Kamal. "Methods of Mining the Data from Big Data
and Social Networks Based on Recommender System". International Journal of Advanced
Networking & Applications (IJANA), vol 8(5) 2017, pp.55-60.
APPENDIX
PYTHON CODE
import tkinter as tk
from tkinter import *
from tkinter import messagebox as ms
from tkinter import *
from tkinter import messagebox as ms
import sqlite3
from user_specific import *
from user_data import user_record
from similarity_matrix import sim_matrix
from inverted_index import List_of_Courses
# make database and users (if not exists already) table at programme start up
with sqlite3.connect('quit.db') as db:
c = db.cursor()
def cleanup(array,array2):
for items in array:
array2.append(items[:len(items)-1])
i,j=0,0
class s(tk.Tk):
def __init__(self,*args,**kwargs):
tk.Tk.__init__(self,*args,**kwargs)
container=tk.Frame(self,bg="blue")
container.pack(expand=True,fill="both")
self.frame={}
for F in (startpage,main,p1,cr,page2,logout):
frame=F(container,self)
self.frame[F]=frame
frame.grid(row=0,column=0,sticky="nsew")
self.show_frame(startpage)
def show_frame(self,container):
frame=self.frame[container]
frame.tkraise()
class startpage(tk.Frame):
def __init__(self,parent,controller):
tk.Frame.__init__(self,parent)
label=tk.Label(self,text="WELCOME!! CLICK BELOW TO GO TO
LOGIN PAGE",font=("Ariel",24),bg="white",fg="blue")
label.pack(padx=300,pady=100)
button1=Button(self,text="Login",font=("Ariel",24),bg =
"red",fg="black",command=
lambda: controller.show_frame(main))
button1.pack(ipadx=200,pady=100)
class page2(tk.Frame):
def __init__(self,parent,controller):
tk.Frame.__init__(self,parent)
button1=Button(self,text="Get
recommendations",font=("Ariel",24),bg="white",fg="blue",command=
lambda: controller.show_frame(p1))
button1.pack(padx=290,pady=20)
button1=Button(self,text="Logout",font=("Ariel",24),bg =
"red",fg="black",command=
lambda: controller.show_frame(logout))
button1.pack(ipadx=250,pady=300)
class logout(tk.Frame):
def __init__(self,parent,controller):
tk.Frame.__init__(self,parent)
label1=Label(self,text="You have been successfully logged
out",font=("Ariel",24),bg="white",fg="blue")
label1.pack(padx=290,pady=20)
button1=Button(self,text="Click here to go to login
again",font=("Ariel",24),bg = "red",fg="black",command=
lambda: controller.show_frame(main))
button1.pack(ipadx=250,pady=300)
class main(tk.Frame):
def __init__(self,parent,controller):
tk.Frame.__init__(self,parent)
self.controller = controller
self.username = StringVar()
self.password = StringVar()
self.label1 = Label(self,text ='LOGIN',font = ('',35),pady = 10)
self.label1.pack(padx=50,pady=50)
self.label1=Label(self,text = 'Username: ',font =
('',20),pady=5,padx=5).pack()
Entry(self,textvariable = self.username,bd = 5,font = ('',15)).pack()
self.label3=Label(self,text = 'Password: ',font =
('',20),pady=5,padx=5).pack()
Entry(self,textvariable = self.password,bd = 5,font = ('',15),show =
'*').pack()
self.log=Button(self,text = ' Login ',bd = 3 ,font =
('',15),padx=5,pady=5,command=self.login).pack()
self.create=Button(self,text = ' Create Account ',bd = 3 ,font =
('',15),padx=5,pady=5,command=
lambda: controller.show_frame(cr)).pack()
#Login Function
def login(self):
#Establish Connection
with sqlite3.connect('quit.db') as db:
c = db.cursor()
else:
ms.showerror('Oops!','Username Not Found.')
class cr(tk.Frame):
def __init__(self,parent,controller):
tk.Frame.__init__(self,parent)
self.n_username = StringVar()
self.n_password = StringVar()
self.label1 = Label(self,text ='CREATE ACCOUNT',font = ('',35),pady =
10)
self.label1.pack(padx=50,pady=50)
Label(self,text = 'Username: ',font = ('',20),pady=5,padx=5).pack()
Entry(self,textvariable = self.n_username,bd = 5,font = ('',15)).pack()
Label(self,text = 'Password: ',font = ('',20),pady=5,padx=5).pack()
Entry(self,textvariable = self.n_password,bd = 5,font = ('',15),show =
'*').pack()
Button(self,text = 'Create Account',bd = 3 ,font =
('',15),padx=5,pady=5,command=self.new_user).pack()
Button(self,text = 'Go to Login',bd = 3 ,font =
('',15),padx=5,pady=5,command=
lambda: controller.show_frame(main)).pack()
def new_user(self):
#Establish Connection
with sqlite3.connect('quit.db') as db:
c = db.cursor()
class p1(tk.Frame):
def __init__(self,parent,controller):
tk.Frame.__init__(self,parent)
self.array=[]
self.text=StringVar()
i,j=0,0
self.items=StringVar()
topics = open('Course-List.txt')
self.array.extend(topics.readlines())
self.topic_new=[]
self.l=[]
cleanup(self.array,self.topic_new)
self.var = []
self.var2 = []
self.var3=StringVar()
self.var4=StringVar()
self.var5=StringVar()
self.var8=StringVar()
self.subject_rating={}
self.sub=[]
entry = Entry(self,textvariable = self.var8,justify = CENTER)
entry.place(x=300,y=140)
self.button1 = Button(self,bg =
"blue",activebackground="pink",text="Back to Previous
page",fg="black",font=("Ariel",18),command =
lambda: controller.show_frame(page2) )
self.button1.place(x=700,y=10)
label =
Label(self,bg="white",fg="blue",font=("Ariel",30),textvariable=self.var5)
self.var5.set("Course recommendation system")
label.place(x=10,y=10)
label2 =
Label(self,bg="white",fg="black",font=("Ariel",18),textvariable=self.var4)
self.var4.set("Please Enter your subjects")
label2.place(x=10, y=100)
self.var.append(StringVar())
subject1 = OptionMenu(self,self.var[0],*list(self.topic_new))
self.var[0].set("Enter The Subject")
subject1.place(x=10,y=140)
self.var2.append(StringVar())
#Entry(root,textvariable = var2[0],justify = CENTER).place(x=300,y=140)
Spinbox(self,textvariable = self.var2[0], from_=1,
to=5).place(x=300,y=140)
self.var2[0].set("")
self.add = Button(self,bg =
"blue",activebackground="pink",text="Add",fg="black",font=("Ariel",18),com
mand =self.action)
self.add.place(x=10,y=200)
self.submit=Button(self,bg =
"red",activebackground="pink",text="Submit",fg="black",font=("Ariel",18),co
mmand =self.action2)
self.submit.place(x=100,y=200)
self.submit1=Button(self,bg =
"green",activebackground="black",text="Get
recomendations",fg="black",font=("Ariel",18),command =self.action3)
self.submit1.place(x=10,y=300)
def action2(self):
ms.showinfo('Success!','Ratings successfully submitted')
def action(self):
global i,j
self.sub.append(self.var[i].get())
self.subject_rating[self.sub[i]]=int(self.var2[i].get())
i = i+1
self.var.append(StringVar())
self.var2.append(StringVar())
OptionMenu(self,self.var[i],*list(self.topic_new)).place(x=10,y=180+j)
self.var[i].set("Enter Subject")
self.add.place(x=10,y=220+j)
self.submit.place(x=100,y=220+j)
self.submit1.place(x=10,y=320+j)
#Entry(root,textvariable = var2[i],justify =
CENTER).place(x=300,y=180+j)
Spinbox(self,textvariable = self.var2[i], from_=1,
to=5).place(x=300,y=180+j)
self.var2[i].set("")
j=j+40
def action3(self):
global i,j
self.sub.append(self.var[i].get())
self.subject_rating[self.sub[i]]=int(self.var2[i].get())
print(self.subject_rating)
l = Course_recommendation(self.subject_rating,user_record)
self.display =
Label(self,bg="blue",fg="black",font=("Ariel",24),textvariable=self.text)
self.display.place(x=500,y=300)
self.text.set("You May Also Like")
scrollbar=Scrollbar(self)
scrollbar.pack(side="right")
mylist=Listbox(self,bg="black",bd=5,fg="red",width=50,height=10,font=('
',16),yscrollcommand=scrollbar.set)
k=0
for items in l:
subject=StringVar()
mylist.insert(END,items)
mylist.place(x=500,y=350+k)
scrollbar.config(command=mylist.yview)
app=s()
app.mainloop()
root = Tk()
root.title('Course Recommender System')
root.geometry("750x750")
array=[]
i,j=0,0
topics = open('Course-List.txt')
array.extend(topics.readlines())
def cleanup(array,array2):
for items in array:
array2.append(items[:len(items)-2])
def action():
global i,j
sub.append(var[i].get())
subject_rating[sub[i]]=int(var2[i].get())
i = i+1
var.append(StringVar())
var2.append(StringVar())
OptionMenu(root,var[i],*list(topic_new)).place(x=10,y=180+j)
var[i].set("Enter Subject")
add.place(x=10,y=220+j)
submit1.place(x=10,y=320+j)
submit.place(x=100,y=220+j)
#Entry(root,textvariable = var2[i],justify = CENTER).place(x=300,y=180+j)
Spinbox(root,textvariable = var2[i], from_=1, to=5).place(x=300,y=180+j)
var2[i].set("")
j=j+40
def action2():
ms.showinfo('Success!','Ratings successfully submitted')
def action3():
sub.append(var[i].get())
subject_rating[sub[i]]=int(var2[i].get())
print(subject_rating)
l = Course_recommendation(subject_rating,user_record)
# l contans a list of top 10 courses to be recommended
text = StringVar()
display = Label(bg="white",fg="black",font=("Ariel",24),textvariable=text)
display.place(x=10,y=360+j)
text.set("Recommended Courses For you")
k=0
for items in l:
subject = StringVar()
Label(bg="white",fg="black",font=("Ariel",18),textvariable=subject).place
(x=10,y=420+j+k)
subject.set(items)
k=k+40
topic_new=[]
cleanup(array,topic_new)
var = []
var2 = []
var3=StringVar()
var4=StringVar()
var5=StringVar()
var8=StringVar()
subject_rating={}
sub=[]
DATA PREPROCESSING
import json
import os
import math
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps= PorterStemmer()
tokenizer = RegexpTokenizer(r'\w+')
import urllib.request
def create_data_files(project_name,base_url,filename):
queue =project_name+'/'+filename+'.txt'
if os.path.isfile(queue):
append_to_file(queue,base_url)
if not os.path.isfile(queue):
write_file(queue,base_url)
def append_to_file(path,data):
with open (path,'a') as file:
file.write(data+'\n')
i=0
final_ans_total="{"
with open('IRDATA.json') as json_data:
json_response = json.load(json_data)
for course in json_response['courses']:
#course=json_response['courses'][0]
i=i+1
course_title=course['title']
course_level=course['level']
course_summary=course['summary'].lower()
stop_words=set(stopwords.words("english"))
#words=word_tokenize(course_summary)
words=tokenizer.tokenize(course_summary)
words.sort()
filtered_sentence=[]
new_str=""
final_str="'"+course_title+"':[{'level':'"+course_level+"'},{"
for w in words:
if w not in stop_words:
filtered_sentence.append(w)
#print(filtered_sentence)
freq=1
prev=ps.stem(filtered_sentence[0])
for itr in range (1,len(filtered_sentence)):
#for w in filtered_sentence:
#print(ps.stem(w))
w=ps.stem(filtered_sentence[itr])
curr=w
if(curr==prev):
freq=freq+1
else:
final_str+="'"+prev+"': "+str(1)+","
new_str+=prev+"\n"
freq=1
prev=curr
final_str+="}]"
#print(final_str)
final_ans_total+=final_str+","
create_data_files('final_IR',new_str,'myfile_27_oct_pl')
final_ans_total+="}"
#print(final_ans_total)
create_data_files('final_IR',final_ans_total,'myfile_27_oct')