Professional Documents
Culture Documents
CERTIFICATE
DECLARATION
ACKNOWLEDGEMENT
CONTENT
SYNOPSIS
PgNo
1.INTRODUCTION
2.SYSTEM STUDY
2.1.1 DRAWBACKS 6
2.2.1 FEATURES 10
5.CONCLUSION 20
BIBLIOGRAPHY 22
APPENDICES
B. TABLE STRUCTURE 25
C. SAMPLE CODING 26
D. SAMPLE INPUT 28
E. SAMPLE OUTPUT 30
YouTube Transcript Summarizer
SYNOPSIS
Integrated video data presentations may allow active video browsing. Such presentations
provide the user with information about the content of a particular sequence being tested while
maintaining an important message. We suggest how to automatically make video summaries for
longer videos. Our video access method involves two tasks: first, splitting the video into smaller,
compatible parts and second, setting the levels into effects. Our proposed algorithm sections are
based on analysis of word frequency in speech transcripts. After that the summary is made by
selecting the parts with the highest scores depending on the length of time and these are
illustrated. We created and conducted a user study to check the quality of the summaries made.
Comparisons are made using our proposed algorithm and a random segment selection scheme
based on mathematical analysis of user learning outcomes. Finally, we can see the summarized
context of the video we want to know about.
Summarization of the video is done by the Python API and NLP (Natural Language Processing).
An API, or Application Programming Interface, is a server you can use to receive and send data
using code. APIs are widely used to retrieve data, and that will be the focus of this first study.
When we want to receive data from an API, we need to make a request. Applications are used
across the web.
INTRODUCTION
A large number of video recordings are made and shared online all day. It is very difficult to
spend time watching such videos which may be longer than expected and sometimes our efforts
may be in vain if we do not get the right information about it. Summarize the text of those videos
automatically allows us to quickly look at important patterns in the video and helps us save time
and effort in all the content of the video.
This project will provide us with the opportunity to experience technical expertise in the NLP
state of the art to summarize the unseen text and use an exciting concept suitable for consultants
and a refreshing professional project. The summarizer is a Chrome extension that works with
YouTube to extract the key points of a video and make them accessible to the user. The summary
is customizable per user's request, allowing varying extents of summarization. Key points from
the summarization process, together with corresponding time-stamps, are then presented to the
user through a small UI next to the video feed. This allows the user to navigate to more
important sections of the video, to get to the key points more efficiently
Summarizing transcripts of such videos allows us to rapidly look for relevant patterns in the
video and saves us time and effort from having to go through the entire content. This project will
allow us to gain hands-on experience with cutting-edge NLP techniques for abstractive text
summarization.
1.1 ORGANIZATION PROFILE
SEATTLE TECHNOLOGIES
Seattle Technologies Excels Principled Technology For The Services Promising Outstanding
Business Exposure For Its Clients. Seattle Technologies On Providing Services That Influence
Evolving Efficient Business Model For The Exclusive Liberation.
Associated With A Highly Skilled Team We Are Tweaking Exclusive Development And
Execution Procedure To Its Benefited Clients. Moreover, This Is Enabling Us For The Time
Bound Delivery Of The Challenging Solutions With Confidence
1
1.2 SYSTEM CONFIGURATION
2
2. SYSTEM STUDY
The existing YouTube Transcript Summarizer system efficiently addresses the prevalent issue of
navigating through vast amounts of online video content by offering users a streamlined solution
for obtaining concise summaries of YouTube videos. Through a combination of Flask-based
backend infrastructure, Python APIs for transcript retrieval, Hugging Face transformers for text
summarization, and a user-friendly Chrome extension interface, the system provides an
accessible and efficient means for users to extract key information from videos. By empowering
users to quickly discern relevant content and saving them valuable time and resources, the
system significantly enhances the overall experience of consuming online video content while
also aiding in content screening and improving content quality.
2.1.1 Drawbacks
1. The existing YouTube Transcript Summarizer system, while effective in providing users with
concise summaries of video content, may have certain drawbacks.
2. One potential limitation lies in the accuracy and comprehensiveness of the summarization
process. Depending on the complexity of the video content and the capabilities of the text
summarization models used, there may be instances where key information is not accurately
captured or essential context is lost in the summarization process.
4. Furthermore, the system's dependency on third-party APIs and libraries could introduce
vulnerabilities related to data privacy and security, raising concerns regarding the confidentiality
of user data and the potential for unauthorized access to sensitive information.
5. Overall, while the YouTube Transcript Summarizer system offers valuable benefits in terms of
time-saving and content accessibility, it's essential to acknowledge and address these potential
drawbacks to ensure the reliability and integrity of the summarization process.
Most methods for video summarization do not use one of the most important sources of
information in video sequence, the spoken text or the natural-language context. For the sequence
like speeches, seminars and instructional programs does not have transcript we can obtain it by
applying speech recognition on the audio and later we can it our summarizer. YouTube Transcript
Summarizer is a tool that automatically generates the summary from the transcript of the video’s
audio. The model will involve developing and debugging of the different techniques and
algorithms for natural language processing (NLP) and extraction of information as well as the
implementation and testing on the large dataset of YouTube transcript. This model involves
different API such as FLASK API for testing, Python API for getting YouTube video and use
different languages and framework such as HTML, CSS and JavaScript for developing the
extension for the web browser.
2.2.1 Features
The techniques used in text summarization is Natural Language Processing (NLP) analysis based
on information-extraction techniques. This approach, utilizing artificial intelligence techniques,
involves a comprehensive analysis of the source text's meaning to construct a source
representation for a specific application.
Then, a summary representation is generated using this source representation, and the summary
text is produced. However, methods that rely on statistical processing to extract sentences for the
summary often produce summaries that lack coherence.
Although NLP-based techniques generate better summaries, the knowledge base required for
such systems is usually vast and complex. Furthermore, such systems are typically limited to a
specific application domain and are challenging to generalize to other domains.
Natural language processing (NLP) is a field of computer science and artificial intelligence
focused on enabling computers to understand and respond to human language, both written and
spoken.
There are several tasks associated with NLP, including speech recognition (converting voice data
into text), part of speech tagging (determining the part of speech of a word or text based on
context), word sense disambiguation (determining the correct meaning of a word with multiple
meanings), co-reference resolution (identifying when two words refer to the same entity),
sentiment analysis (extracting attitudes, emotions, sarcasm, and other subjective qualities from
text), and natural language generation (putting structured information into human language).
Overall, NLP aims to give computers the ability to understand and use language much like
humans do, which has many potential applications in areas such as customer service, chatbots,
voice assistants, and more.
The file system design for the YouTube Transcript Summarizer project aims to organize and
manage the storage and retrieval of data effectively. It provides structure and logic for handling
different types of information, ensuring ease of access, identification, and management.
1. AdminLogin.cshtml:
- Allows admins to authenticate and access the application using their credentials.
2. DesignDetails.cshtml:
- Displays detailed information about design elements, including design ID, name, description,
and price details.
3. NewUserRegister.cshtml:
- Presents a form for users to register with the application by providing their details such as
name, gender, contact information, and address.
4. UserBooking.cshtml:
- Displays available design options and booking options for users to select from.
5. DesignApproval.cshtml:
6. AdminReport.cshtml:
- Generates reports and analytics for administrators to track application usage, user activities,
and performance metrics.
- Presents insights and data visualizations to facilitate decision-making and strategic planning.
The file system design for the YouTube Transcript Summarizer project ensures the orderly
storage and organization of different components and functionalities. By separating data into
individual files and giving each file a distinct purpose and name, the design facilitates easy
identification, retrieval, and management of information, contributing to the overall efficiency
and usability of the application.
Input design for the YouTube Transcript Summarizer system focuses on making it easy for users
to provide the necessary information while minimizing errors. By creating user-friendly input
interfaces and implementing validation checks, the system ensures that users can enter data
accurately and efficiently.
7
2. Upload videos and Add Link
8
3.3 OUTPUT DESIGN
Output design for the YouTube Transcript Summarizer system involves presenting the
summarized content to users in a clear and understandable format. The primary objective is to
ensure that users can easily access and comprehend the summarized information.
ViewUserSummarizerVideo
The database design for the YouTube Transcript Summarizer project embodies the
essence of efficient information management. It serves as a cohesive repository of interrelated
data, devoid of unnecessary redundancy, providing seamless access to users while ensuring quick
and efficient performance.
This table consists of design id, name, design information details, and price details. Here Design
id is a primary key.
AdminLogin Table:
Stores admin login credentials, enabling secure access to the application.
Fields:
Username (Primary Key): Unique identifier for admin login.
Password: Encrypted password for authentication.
VideoInfo Table:
The database design embodies the core principles of integration, efficiency, and accessibility. By
minimizing redundancy and optimizing data relationships, it ensures smooth information
retrieval and management, aligning with the overarching objective of facilitating easy, quick, and
cost-effective access to data for all users.
10
3.5 SYSTEM DEVELOPMENT
User Registration
This module helps users to register them with the application.
Registration is mandatory since it is required for them to view videos
and posy comments. The user needs to select a username and password
at the time of registration and the username will be Unique.
Preprocessing
Content extracted from YouTube video subtitles may have some
unwanted features like emotions, symbols, web links and repetitive stop
words which are removed by preprocessing techniques.
11
a) Tokenization:
The given document is considered as a string and identifying single
word in document i.e. the given document string is divided into one
unit or token.
12
4. TESTING AND IMPLEMENTATION
Testing and implementation are crucial phases in the development lifecycle of the YouTube
Transcript Summarizer project. These stages ensure that the application functions as intended,
meets user requirements, and operates smoothly across different environments.
1. Testing :
-Unit Testing: Conduct unit tests to validate individual components such as functions, classes,
and modules. Ensure that each unit performs its intended function correctly.
-Integration Testing: Test the integration of different modules and components to verify that
they work together seamlessly. Validate data flow and interactions between various parts of the
application.
-User Acceptance Testing (UAT): Involve users or stakeholders to perform UAT to ensure
that the application meets their expectations and requirements. Gather feedback and make
necessary adjustments based on user input.
- Performance Testing: Assess the performance of the application under different loads and
conditions. Measure response times, resource utilization, and scalability to identify and address
performance bottlenecks.
- Security Testing: Conduct security testing to identify and mitigate potential vulnerabilities
such as SQL injection, cross-site scripting (XSS), and authentication issues. Ensure that sensitive
data is handled securely and access controls are enforced properly.
-Compatibility Testing: Test the application across different web browsers, operating
systems, and devices to ensure compatibility and consistent user experience.
- Regression Testing: Perform regression tests to verify that new changes or updates do not
introduce any unintended side effects or regressions in existing functionality.
13
2. Implementation:
- Deployment Planning: Plan the deployment of the application, considering factors such as
server infrastructure, deployment environment (development, staging, production), and
deployment strategy (manual deployment, continuous integration/continuous deployment).
-Monitoring and Logging: Set up monitoring and logging mechanisms to monitor application
performance, track errors, and troubleshoot issues in real-time. Use monitoring tools such as
Prometheus, Grafana, or ELK stack to gain insights into application behavior and performance.
By thoroughly testing the application and following a systematic implementation process, you
can ensure the successful deployment and operation of the YouTube Transcript Summarizer
project, meeting user needs and delivering a reliable and efficient solution.
14
6. CONCLUSION
In conclusion, our website can save time for the user .Instead of Seeing the whole buffer waste
content of the video we will prefix see the what is main content of YouTube Video and Know
which video will perfectly for us. It save the time and effort of the user. By using our website
their burden will be reduced for search the right YouTube Video.
Our website will also provide Multi-Language Summarization and made the availability to Text-
to Speech Process also. We are confident that our paper will effectively address the needs of
users by saving their time and efforts. Our approach aims to provide users with only the relevant
and useful information on the topics that interest them, eliminating the need to watch lengthy
videos. This time saved can be utilized for further knowledge acquisition and exploration.
This project offers a significant time-saving solution for users by providing a summary of the
video without the need to watch the entire video. It also helps users to identify any inappropriate or
harmful content before watching the video. Furthermore, the project offers an excellent user interface
experience by using Chrome extensions. Users can easily obtain the summarized text without having to
copy and paste the video URL into third-party applications or terminals.
15
BIBLIOGRAPHY
WEBSITES:
• Flask: https://flask.palletsprojects.com/en/3.0.x/
• CSS: https://www.tutorialspoint.com/css/index.html.
REFERENCE BOOKS:
1. Patil, S. et al. “Multilingual Speech and Text Recognition and Translation using Image.”
International journal of engineering research and technology 5 (2016).
DFD Level:2
18
B. TABLE STRUCTURE
Login
Table B.2
19
Table B.2 Answer
20
C. SAMPLE CODING
import json
import js
from flask import Flask, jsonify, make_response, abort, Response, request
from youtube_transcript_api import YouTubeTranscriptApi
from datetime import datetime
from transformers import T5ForConditionalGeneration, T5Tokenizer
from pydantic import BaseModel, parse
from pydantic_webargs import webargs
from urllib.parse import urlparse, parse_qs
import urllib.parse
from werkzeug.middleware.proxy_fix import ProxyFix
# defining 2 endpoints
@app.route('/')
def hello_world():
return "Its working"
@app.route('/time', methods=['GET'])
def get_time():
x = datetime.now()
return str(x)
# Error handling
@app.errorhandler(404)
def not_found(error):
return make_response(jsonify({'error': 'Not found'}), 404)
@app.route('/summarize/check', methods=['GET'])
# Get the transcript from the Youtube Transcript API
def transcript(video_id):
Transc = YouTubeTranscriptApi.get_transcript(video_id)
print(Transc)
string = ''
for i in Transc:
string = string + i['text'] + ''
print(string)
resp = jsonify(Transc)
resp.status = 200
return string, resp
@app.route('/summarize/summary', methods=['GET'])
def summary(string):
# initialize the model tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-base')
# initialize the model architecture and weights
print("reached here")
model = T5ForConditionalGeneration.from_pretrained('t5-base',
force_download=True)
# encode the text into tensor of integers using the appropriate tokenizer
inputs = tokenizer.encode("summarize:" + string[0], return_tensors="pt",
max_length=512, truncation=True)
print(inputs)
# generate the summarization output
outputs = model.generate(inputs, max_length=150, min_length=40,
length_penalty=2.0, num_beams=4,
no_repeat_ngram_size=2, num_return_sequences=4,
early_stopping=True)
print(outputs)
print(tokenizer.decode(outputs[0]))
return tokenizer.decode(outputs[0])
class QueryModel(BaseModel):
name: str
class BodyModel(BaseModel):
age: int
@app.route('/summarize/api', methods=['GET'])
def get_summarize(youtube_url):
youtube_url = request.args.get('youtube_url')
# youtube_url="https://www.youtube.com/watch?
v=cs1e0fRyI18&list=RDcs1e0fRyI18&start_radio=1"
url_data = urllib.parse.urlparse(youtube_url)
query = urllib.parse.parse_qs(url_data.query)
video_id = query["v"][0]
print(video_id)
# text = parse(str(video_id))
id_u = transcript(video_id)
print("this worked")
sum = summary(id_u)
data = {'responseText': sum}
return jsonify(data), 200
23
D. SAMPLE INPUTS
LOGIN PAGE ( USER )
E. SAMPLE OUTPUTS
SUMMARIZER PAGE (USER)