Professional Documents
Culture Documents
Project Explanation PDF
Project Explanation PDF
Nowadays the Internet has become an inevitable source of information for daily life activities.
Internet has a collection of large amount of information hidden in the web pages. Webpage
contains the noisy data like advertisements, page settings, navigation buttons and notices.
The data is hidden in these type of images in the webpage. The data is hidden in the image
will be extracted is still difficult. In this project, the hidden objects(data) in image are extracted
from the webpage.The NLP model is proposed to filter the data.
Requirement
1. Python
2. Html
3. Django
4. NLP
5. postgres
Modules
1. html
2. Django
3. NLTK
4. database
Project plan
Start from Frontend to middle ware till database. One by one i am going to discuss about this.
Django sample
Introduction:
Django is a popular Python web framework, meaning it is a third-party Python library used for
developing web applications.
Python installation
sudo add-apt-repository ppa:jonathonf/python-3.6
3. Click next
4. Click install
5. Click close
6. Open command prompt
7. Type python and press enter
Django install
>> py --version
mysite/
manage .py
mysite /
__init__ .py
settings .py
urls .py
asgi .py
wsgi .py
● The outer myproject/ root directory is a container for your project. Its name doesn’t
matter to Django; you can rename it to anything you like.
● manage.py: A command-line utility that lets you interact with this Django project in
various ways. You can read all the details about manage.py in django-admin and
manage.py.
● The inner mysite/ directory is the actual Python package for your project. Its name is
the Python package name you’ll need to use to import anything inside it (e.g.
mysite.urls).
● __init__.py: An empty file that tells Python that this directory should be considered
a Python package. If you’re a Python beginner, read more about packages in the official
Python docs.
● settings.py: Settings/configuration for this Django project. Django settings will tell
you all about how settings work.
● urls.py: The URL declarations for this Django project; a “table of contents” of your
Django-powered site. You can read more about URLs in URL dispatcher.
● asgi.py: An entry-point for ASGI-compatible web servers to serve your project. See
How to deploy with ASGI for more details.
● wsgi.py: An entry-point for WSGI-compatible web servers to serve your project. See
How to deploy with WSGI for more details.
myproject/
views.py
manage.py
Manage.py is automatically created in each Django project. It does the same thing as
django-admin but also sets the DJANGO_SETTINGS_MODULE environment variable so
that it points to your project’s settings.py file. The django-admin script should be on your
system path if you installed Django via pip.
Run Application
●
output
Print text
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
</body>
</html>
Home page
Registration page
Admin page
Login page
Url form page
output
Logout
NLTK
NLTK is a leading platform for building Python programs to work with human
language data. It provides easy-to-use interfaces to over 50 corpora and lexical
resources such as WordNet, along with a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing, and semantic reasoning,
wrappers for industrial-strength NLP libraries, and an active discussion forum .
The dataset used in this project is the Artificial intelligence Raw Dataset.
Request
You're going to use requests to do this, one of the most popular and useful Python
packages out there.
Requests will allow you to send HTTP/1.1 requests using Python. With
it, you can add content like headers, form data, multipart files, and
parameters via simple Python libraries. It also allows you to access the
response data of Python in the same way.
Get the text from the Html
Here you will use package beautifulsoup. The package website says:
You want to tokenize your text, that is, split it into a list a words. Essentially, you
want to split off the parts off the text that are separated by whitespaces.
To do this, you're going to use a powerful tool called regular expressions. A regular
pattern.
Remove stop words
It is common practice to remove words that appear alot in the English language
such as 'the', 'of' and 'a' (known as stopwords) because they're not so interesting
words.
Create a histogram diagram
nltk.FreqDist();
resulting object.
Database connection
Urls -
https://www.zdnet.com/article/what-is-ai-everything-you-need-to-know-about-artificial-intellige
nce/