Professional Documents
Culture Documents
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE & ENGINEERING
Mentor Signature
Quora is a platform for Q&A, just like StackOverflow. But quora is more of a general-purpose
Q&A platform that means there is not much code like in StackOverflow.
One of the many problems that quora face is the duplication of questions. Duplication of question
ruins the experience for both the questioner and the answerer. Since the questioner is asking a
duplicate question, we can just show him/her the answers to the previous question. And the
answerer doesn’t have to repeat his/her answer for essentially the same questions.
For example, we have a question like “How can I be a good geologist?” and there are some
answers to that question. Later someone else asks another question like “What should I do to be a
great geologist?”.
We can see that both the questions are asking the same thing. Even though the wordings for the
So the answers will be the same for both questions. That means we can just show the answers to
the first question. That way the person who is asking the question will get the answers
immediately and people who have answered already the first question don’t have to repeat
themselves.
pairs
FEASIBILTY STUDY
1) Financial feasibility: -
The project includes the use of programming language python and technology like
machine learning, deep learning and NLP . There is no need to buy any costly software to
complete this project. All the softwares required to accomplish this project are available
for free. This project is achievable with a basic computer with minimum specifications.
2) Technological feasibility: -
The techniques that use Machine Learning clustering and classification algorithms have
shown to achieve better performance on visual benchmarks. It is a project in which
firstly the data is divided into various clusters and then different algorithms are used for
different clusters which give best performance.
Programming device(laptop)
Programming tools (easily available)
Programming individual
This is an end to end project which is up to the standard of industry, can be used to
identify the duplicate questions by platforms like quora. It provides various facilities like
logging details , easy to implement etc.
METHODOLOGY/PLANNING OF WORK
The project has been divided into three modules as given below:
2. Model Training
In model training we first divide our data into different clusters. And we will train
different clusters differently. We will select the models which are appropriate for the
cluster separately.
3. Final model prediction and deployment
In this step after training the model we save it in pickle file which is a binary file. We use
binary file format to save the model in order to save the memory and the model can be
used later for detecting duplicate question pairs. The finalised model is hosted using
streamlit. Also maintainence and logging is done from the cloud itself.
We can try different machine learning and deep learning algorithms. Using advanced algorithms
like lstm, gru, bert , neural networks etc gives better accuracy. Machine learning models are hit
and trial models , we don’t have prior knowledge if the model will perform better or not. In this
various advanced models can be used to give better results. Also it includes hand engineering of
the features which will enhance the accuracy of the model.
SOFTWARE REQUIREMENTS:
WINDOWS 8
Python version 3.7.4
NLTK
Keras
tensorflow
HARDWARE RQUIREMENTS:
RAM: 8GB
Hard disk: 128 GB SSD
Processor: i5 8th generation
BIBLIOGRAPHY
[1]
https://appliedroots.com/
[2]
https://towardsdatascience.com/the-quora-question-pair-similarity-problem-3598477af172