You are on page 1of 29

CONTENT

CERTIFICATE

DECLARATION

ACKNOWLEDGEMENT

CONTENT

SYNOPSIS
PgNo

1.INTRODUCTION

1.1 ORGANIZATION PROFILE 1

1.2 SYSTEM SPECIFICATION

1.2.1 HARDWARE SPECIFICATION 3

1.2.2 SOFTWARE SPECIFICATION 4

2.SYSTEM STUDY

2.1 EXISTING SYSTEM 5

2.1.1 DRAWBACKS 6

2.2 PROPOSED SYSTEM 8

2.2.1 FEATURES 10

3.SYSTEM DESIGN AND DEVELOPMENT

3.1 FILE DESIGN 12

3.2 INPUT DESIGN 13

3.3 OUTPUT DESIGN 15


3.4 DATABASE DESIGN 17

3.5 SYSTEM DEVELOPMENT

3.5.1 DESCRIPTION OF MODULES 18

4.TESTING AND IMPLEMENTATION 19

5.CONCLUSION 20

BIBLIOGRAPHY 22

APPENDICES

A. DATA FLOW DIAGRAM 24

B. TABLE STRUCTURE 25

C. SAMPLE CODING 26

D. SAMPLE INPUT 28

E. SAMPLE OUTPUT 30
YouTube Transcript Summarizer

SYNOPSIS

Integrated video data presentations may allow active video browsing. Such presentations
provide the user with information about the content of a particular sequence being tested while
maintaining an important message. We suggest how to automatically make video summaries for
longer videos. Our video access method involves two tasks: first, splitting the video into smaller,
compatible parts and second, setting the levels into effects. Our proposed algorithm sections are
based on analysis of word frequency in speech transcripts. After that the summary is made by
selecting the parts with the highest scores depending on the length of time and these are
illustrated. We created and conducted a user study to check the quality of the summaries made.
Comparisons are made using our proposed algorithm and a random segment selection scheme
based on mathematical analysis of user learning outcomes. Finally, we can see the summarized
context of the video we want to know about.

Summarization of the video is done by the Python API and NLP (Natural Language Processing).
An API, or Application Programming Interface, is a server you can use to receive and send data
using code. APIs are widely used to retrieve data, and that will be the focus of this first study.

When we want to receive data from an API, we need to make a request. Applications are used
across the web.
INTRODUCTION

A large number of video recordings are made and shared online all day. It is very difficult to
spend time watching such videos which may be longer than expected and sometimes our efforts
may be in vain if we do not get the right information about it. Summarize the text of those videos
automatically allows us to quickly look at important patterns in the video and helps us save time
and effort in all the content of the video.

This project will provide us with the opportunity to experience technical expertise in the NLP
state of the art to summarize the unseen text and use an exciting concept suitable for consultants
and a refreshing professional project. The summarizer is a Chrome extension that works with
YouTube to extract the key points of a video and make them accessible to the user. The summary
is customizable per user's request, allowing varying extents of summarization. Key points from
the summarization process, together with corresponding time-stamps, are then presented to the
user through a small UI next to the video feed. This allows the user to navigate to more
important sections of the video, to get to the key points more efficiently

Summarizing transcripts of such videos allows us to rapidly look for relevant patterns in the
video and saves us time and effort from having to go through the entire content. This project will
allow us to gain hands-on experience with cutting-edge NLP techniques for abstractive text
summarization.
1.1 ORGANIZATION PROFILE

SEATTLE TECHNOLOGIES

Seattle Technologies Is A Custom Software Development And Solutions Company Based In


Coimbatore, India. Seattle Technologies Possesses An Experience In Providing Complex And
Diverse Enterprise Software Development Solutions To A Large Range Of Clients.
Seattle Technologies Has The Exclusive Experience In The Software Development Stature,
Which Takes Pleasure Of The Customer Retention Rate.

Seattle Technologies Excels Principled Technology For The Services Promising Outstanding
Business Exposure For Its Clients. Seattle Technologies On Providing Services That Influence
Evolving Efficient Business Model For The Exclusive Liberation.

Associated With A Highly Skilled Team We Are Tweaking Exclusive Development And
Execution Procedure To Its Benefited Clients. Moreover, This Is Enabling Us For The Time
Bound Delivery Of The Challenging Solutions With Confidence

1
1.2 SYSTEM CONFIGURATION

1.2.1 HARDWARE REQUIREMENTS

Processor : Modern multi-core processor.

Ram : Minimum 8 GB.

Hard Disk : Minimum 256 GB.

1.2.2 SOFTWARE REQUIREMENTS

Environment : Visual Studio .Net 2010

Front End : Web browser (testing), Chrome browser (extension).

Language : HTML, CSS, JavaScript (Front End), Python (Back End).

Back End : Flask framework.

2
2. SYSTEM STUDY

2.1 EXISTING STUDY

The existing YouTube Transcript Summarizer system efficiently addresses the prevalent issue of
navigating through vast amounts of online video content by offering users a streamlined solution
for obtaining concise summaries of YouTube videos. Through a combination of Flask-based
backend infrastructure, Python APIs for transcript retrieval, Hugging Face transformers for text
summarization, and a user-friendly Chrome extension interface, the system provides an
accessible and efficient means for users to extract key information from videos. By empowering
users to quickly discern relevant content and saving them valuable time and resources, the
system significantly enhances the overall experience of consuming online video content while
also aiding in content screening and improving content quality.

2.1.1 Drawbacks

1. The existing YouTube Transcript Summarizer system, while effective in providing users with
concise summaries of video content, may have certain drawbacks.

2. One potential limitation lies in the accuracy and comprehensiveness of the summarization
process. Depending on the complexity of the video content and the capabilities of the text
summarization models used, there may be instances where key information is not accurately
captured or essential context is lost in the summarization process.

3. Additionally, the system's reliance on automated summarization techniques may result in


occasional inaccuracies or misinterpretations, particularly in cases where the content is nuanced
or requires contextual understanding.

4. Furthermore, the system's dependency on third-party APIs and libraries could introduce
vulnerabilities related to data privacy and security, raising concerns regarding the confidentiality
of user data and the potential for unauthorized access to sensitive information.
5. Overall, while the YouTube Transcript Summarizer system offers valuable benefits in terms of
time-saving and content accessibility, it's essential to acknowledge and address these potential
drawbacks to ensure the reliability and integrity of the summarization process.

2.2 PROPOSED SYSTEM

Most methods for video summarization do not use one of the most important sources of
information in video sequence, the spoken text or the natural-language context. For the sequence
like speeches, seminars and instructional programs does not have transcript we can obtain it by
applying speech recognition on the audio and later we can it our summarizer. YouTube Transcript
Summarizer is a tool that automatically generates the summary from the transcript of the video’s
audio. The model will involve developing and debugging of the different techniques and
algorithms for natural language processing (NLP) and extraction of information as well as the
implementation and testing on the large dataset of YouTube transcript. This model involves
different API such as FLASK API for testing, Python API for getting YouTube video and use
different languages and framework such as HTML, CSS and JavaScript for developing the
extension for the web browser.

2.2.1 Features

The techniques used in text summarization is Natural Language Processing (NLP) analysis based
on information-extraction techniques. This approach, utilizing artificial intelligence techniques,
involves a comprehensive analysis of the source text's meaning to construct a source
representation for a specific application.

Then, a summary representation is generated using this source representation, and the summary
text is produced. However, methods that rely on statistical processing to extract sentences for the
summary often produce summaries that lack coherence.

Although NLP-based techniques generate better summaries, the knowledge base required for
such systems is usually vast and complex. Furthermore, such systems are typically limited to a
specific application domain and are challenging to generalize to other domains.
Natural language processing (NLP) is a field of computer science and artificial intelligence
focused on enabling computers to understand and respond to human language, both written and
spoken.

This involves combining techniques from computational linguistics, statistical modeling,


machine learning, and deep learning to process human language in the form of text or voice data,
and to determine the intended meaning and sentiment behind it.

There are several tasks associated with NLP, including speech recognition (converting voice data
into text), part of speech tagging (determining the part of speech of a word or text based on
context), word sense disambiguation (determining the correct meaning of a word with multiple
meanings), co-reference resolution (identifying when two words refer to the same entity),
sentiment analysis (extracting attitudes, emotions, sarcasm, and other subjective qualities from
text), and natural language generation (putting structured information into human language).

Overall, NLP aims to give computers the ability to understand and use language much like
humans do, which has many potential applications in areas such as customer service, chatbots,
voice assistants, and more.

3. SYSTEM DESIGN AND DEVELOPMENT

3.1 FILE DESIGN

The file system design for the YouTube Transcript Summarizer project aims to organize and
manage the storage and retrieval of data effectively. It provides structure and logic for handling
different types of information, ensuring ease of access, identification, and management.

1. AdminLogin.cshtml:

- Razor view file for the admin login page.


- Contains HTML markup and Razor syntax for rendering the login form.

- Allows admins to authenticate and access the application using their credentials.

2. DesignDetails.cshtml:

- Razor view file for the design details information page.

- Displays detailed information about design elements, including design ID, name, description,
and price details.

- Provides users with comprehensive information to make informed decisions.

3. NewUserRegister.cshtml:

- Razor view file for the new user registration page.

- Presents a form for users to register with the application by providing their details such as
name, gender, contact information, and address.

- Captures user registration details to create new user accounts.

4. UserBooking.cshtml:

- Razor view file for the user booking page.

- Allows users to book design services or schedule appointments.

- Displays available design options and booking options for users to select from.

5. DesignApproval.cshtml:

- Razor view file for the design approval page.


- Enables administrators to review and approve design submissions or requests.

- Provides functionality for admins to manage design approvals efficiently.

6. AdminReport.cshtml:

- Razor view file for the admin report page.

- Generates reports and analytics for administrators to track application usage, user activities,
and performance metrics.

- Presents insights and data visualizations to facilitate decision-making and strategic planning.

The file system design for the YouTube Transcript Summarizer project ensures the orderly
storage and organization of different components and functionalities. By separating data into
individual files and giving each file a distinct purpose and name, the design facilitates easy
identification, retrieval, and management of information, contributing to the overall efficiency
and usability of the application.

3.2 INPUT DESIGN

Input design for the YouTube Transcript Summarizer system focuses on making it easy for users
to provide the necessary information while minimizing errors. By creating user-friendly input
interfaces and implementing validation checks, the system ensures that users can enter data
accurately and efficiently.

1. New User Summarizer Page

This page is used to Summarizer the user data

7
2. Upload videos and Add Link

This page is used to add new Link by the admin

8
3.3 OUTPUT DESIGN

Output design for the YouTube Transcript Summarizer system involves presenting the
summarized content to users in a clear and understandable format. The primary objective is to
ensure that users can easily access and comprehend the summarized information.

ViewUserSummarizerVideo

3.4 DATABASE DESIGN

The database design for the YouTube Transcript Summarizer project embodies the
essence of efficient information management. It serves as a cohesive repository of interrelated
data, devoid of unnecessary redundancy, providing seamless access to users while ensuring quick
and efficient performance.

This table consists of design id, name, design information details, and price details. Here Design
id is a primary key.

AdminLogin Table:
 Stores admin login credentials, enabling secure access to the application.
 Fields:
 Username (Primary Key): Unique identifier for admin login.
 Password: Encrypted password for authentication.

VideoInfo Table:

 Houses comprehensive details of YouTube videos, facilitating easy retrieval and


management.
 Fields:
 VideoID (Primary Key): Unique identifier for each video.
 Title: Title of the YouTube video.
 URL: URL of the video.
 Description: Description of the video content.
 PublishedAt: Date and time of video publication.
 ChannelID: Identifier for the video channel.
 ThumbnailURL: URL of the video thumbnail.

The database design embodies the core principles of integration, efficiency, and accessibility. By
minimizing redundancy and optimizing data relationships, it ensures smooth information
retrieval and management, aligning with the overarching objective of facilitating easy, quick, and
cost-effective access to data for all users.

10
3.5 SYSTEM DEVELOPMENT

3.5.1 DESCRIPTION OF MODULES

User Registration
This module helps users to register them with the application.
Registration is mandatory since it is required for them to view videos
and posy comments. The user needs to select a username and password
at the time of registration and the username will be Unique.

Video URL Upload process


The first module for user . Using this module the user can login
through username-password once login match authorized user can
access the system. This module helps to upload some sample videos
URL in our application.

Preprocessing
Content extracted from YouTube video subtitles may have some
unwanted features like emotions, symbols, web links and repetitive stop
words which are removed by preprocessing techniques.

11
a) Tokenization:
The given document is considered as a string and identifying single
word in document i.e. the given document string is divided into one
unit or token.

b) Removal of Stop word:


In this step the removal of usual words like a, an, but, and, of, the etc.
is done.

12
4. TESTING AND IMPLEMENTATION
Testing and implementation are crucial phases in the development lifecycle of the YouTube
Transcript Summarizer project. These stages ensure that the application functions as intended,
meets user requirements, and operates smoothly across different environments.

1. Testing :

-Unit Testing: Conduct unit tests to validate individual components such as functions, classes,
and modules. Ensure that each unit performs its intended function correctly.

-Integration Testing: Test the integration of different modules and components to verify that
they work together seamlessly. Validate data flow and interactions between various parts of the
application.

-User Acceptance Testing (UAT): Involve users or stakeholders to perform UAT to ensure
that the application meets their expectations and requirements. Gather feedback and make
necessary adjustments based on user input.

- Performance Testing: Assess the performance of the application under different loads and
conditions. Measure response times, resource utilization, and scalability to identify and address
performance bottlenecks.

- Security Testing: Conduct security testing to identify and mitigate potential vulnerabilities
such as SQL injection, cross-site scripting (XSS), and authentication issues. Ensure that sensitive
data is handled securely and access controls are enforced properly.

-Compatibility Testing: Test the application across different web browsers, operating
systems, and devices to ensure compatibility and consistent user experience.

- Regression Testing: Perform regression tests to verify that new changes or updates do not
introduce any unintended side effects or regressions in existing functionality.

13

2. Implementation:
- Deployment Planning: Plan the deployment of the application, considering factors such as
server infrastructure, deployment environment (development, staging, production), and
deployment strategy (manual deployment, continuous integration/continuous deployment).

-Configuration Management: Manage configuration settings, environment variables, and


application parameters for different deployment environments. Use configuration files or
environment-specific settings to ensure consistency and portability.

- Database Migration: If necessary, perform database migrations to update database schemas,


seed initial data, or apply schema changes. Use migration tools or scripts to automate the process
and maintain data integrity.

- Continuous Integration/Continuous Deployment (CI/CD): Implement CI/CD pipelines to


automate the build, testing, and deployment processes. Use CI/CD tools such as Jenkins, GitLab
CI/CD, or Azure DevOps to streamline development workflows and ensure consistent
deployments.

-Monitoring and Logging: Set up monitoring and logging mechanisms to monitor application
performance, track errors, and troubleshoot issues in real-time. Use monitoring tools such as
Prometheus, Grafana, or ELK stack to gain insights into application behavior and performance.

- Documentation: Document the deployment process, configuration settings, and operational


procedures to facilitate maintenance and troubleshooting. Provide user documentation and guides
to help users understand how to use the application effectively.

By thoroughly testing the application and following a systematic implementation process, you
can ensure the successful deployment and operation of the YouTube Transcript Summarizer
project, meeting user needs and delivering a reliable and efficient solution.

14

6. CONCLUSION
In conclusion, our website can save time for the user .Instead of Seeing the whole buffer waste
content of the video we will prefix see the what is main content of YouTube Video and Know
which video will perfectly for us. It save the time and effort of the user. By using our website
their burden will be reduced for search the right YouTube Video.

Our website will also provide Multi-Language Summarization and made the availability to Text-
to Speech Process also. We are confident that our paper will effectively address the needs of
users by saving their time and efforts. Our approach aims to provide users with only the relevant
and useful information on the topics that interest them, eliminating the need to watch lengthy
videos. This time saved can be utilized for further knowledge acquisition and exploration.

This project offers a significant time-saving solution for users by providing a summary of the
video without the need to watch the entire video. It also helps users to identify any inappropriate or
harmful content before watching the video. Furthermore, the project offers an excellent user interface
experience by using Chrome extensions. Users can easily obtain the summarized text without having to
copy and paste the video URL into third-party applications or terminals.

15

BIBLIOGRAPHY
WEBSITES:

• Introduction to python: https://www.w3schools.com/python/python_intro.asp

• HTML tags: https://en.wikipedia.org/wiki/HTML

• Flask: https://flask.palletsprojects.com/en/3.0.x/

• CSS: https://www.tutorialspoint.com/css/index.html.

REFERENCE BOOKS:

1. Patil, S. et al. “Multilingual Speech and Text Recognition and Translation using Image.”
International journal of engineering research and technology 5 (2016).

2. S. Sah, S. Kulhare, A. Gray, S. Venugopalan, E. Prud'Hom meaux and R. Ptucha,


"Semantic Text Summarization of Long Videos," IEEE Winter Conference on
Applications of Computer Vision (WACV), (2017).

3. A. Dilawari and M. U. G. Khan, "ASoVS: Abstractive Summarization of Video


Sequences," in IEEE Access, 7 (2019).

4. Lin, Chin-Yew, “ROUGE: A Package for Automatic Evaluation of Summaries,” In


Proceedings of 2004, Association for Computational Linguistics, Barcelona, Spain.

5. A. E. B. Ajmal and R. P. Haroon, “Maximal marginal relevance based malayalam text


summarization with successive thresholds,” International Journal on Cybernetics and
Informatics, 5, 2 (2016).
16
APPENDICES

A. DATA FLOW DIAGRAM

DFD Level:2

18
B. TABLE STRUCTURE

Login

User Video Summary


User_id Video_id Summarize_id
User_name User_id Video_id
Password Video_link Summary_text
Email Opload_date Generation_date

.User Accessing Extension

Table B.2

19
Table B.2 Answer

20
C. SAMPLE CODING

import json

import js
from flask import Flask, jsonify, make_response, abort, Response, request
from youtube_transcript_api import YouTubeTranscriptApi
from datetime import datetime
from transformers import T5ForConditionalGeneration, T5Tokenizer
from pydantic import BaseModel, parse
from pydantic_webargs import webargs
from urllib.parse import urlparse, parse_qs
import urllib.parse
from werkzeug.middleware.proxy_fix import ProxyFix

# define a variable to hold your app


app = Flask(__name__)

# defining 2 endpoints
@app.route('/')
def hello_world():
return "Its working"

@app.route('/time', methods=['GET'])
def get_time():
x = datetime.now()
return str(x)

# Error handling
@app.errorhandler(404)
def not_found(error):
return make_response(jsonify({'error': 'Not found'}), 404)

@app.route('/summarize/check', methods=['GET'])
# Get the transcript from the Youtube Transcript API
def transcript(video_id):
Transc = YouTubeTranscriptApi.get_transcript(video_id)
print(Transc)
string = ''
for i in Transc:
string = string + i['text'] + ''
print(string)
resp = jsonify(Transc)
resp.status = 200
return string, resp

@app.route('/summarize/summary', methods=['GET'])
def summary(string):
# initialize the model tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-base')
# initialize the model architecture and weights
print("reached here")
model = T5ForConditionalGeneration.from_pretrained('t5-base',
force_download=True)

# encode the text into tensor of integers using the appropriate tokenizer
inputs = tokenizer.encode("summarize:" + string[0], return_tensors="pt",
max_length=512, truncation=True)
print(inputs)
# generate the summarization output
outputs = model.generate(inputs, max_length=150, min_length=40,
length_penalty=2.0, num_beams=4,
no_repeat_ngram_size=2, num_return_sequences=4,
early_stopping=True)
print(outputs)
print(tokenizer.decode(outputs[0]))

return tokenizer.decode(outputs[0])

class QueryModel(BaseModel):
name: str

class BodyModel(BaseModel):
age: int

@app.route('/summarize/api', methods=['GET'])
def get_summarize(youtube_url):
youtube_url = request.args.get('youtube_url')
# youtube_url="https://www.youtube.com/watch?
v=cs1e0fRyI18&list=RDcs1e0fRyI18&start_radio=1"
url_data = urllib.parse.urlparse(youtube_url)
query = urllib.parse.parse_qs(url_data.query)
video_id = query["v"][0]
print(video_id)
# text = parse(str(video_id))
id_u = transcript(video_id)
print("this worked")
sum = summary(id_u)
data = {'responseText': sum}
return jsonify(data), 200

# server the app when this file is run


if __name__ == '__main__':
app.run(debug=True)

23
D. SAMPLE INPUTS
LOGIN PAGE ( USER )

Language translation (Admin)

E. SAMPLE OUTPUTS
SUMMARIZER PAGE (USER)

You might also like