Main

CHAPTER-1
KMIT STROM FORECSTING 1

1. INTRODUCTION
1.1 Purpose of the Project
This project is aimed at predicting how strong a storm is and what impact storms have on the
affected place. At the core of forecasting work is climatology, the study of climates and how they
change. A basic understanding of how storm works relies on the historical weather record and a
pretty good understanding of the time of year when parts of the country are at the greatest risk and
which areas of the middle and southern United States are affected. Usually when warm, moist air
left over from winter cyclones meets winds from the jet streams; it creates high winds, tornadoes
and dangerous hail.
The analysis starts from understanding system, the intention and requirements to testing the basic
main functionality and drilling down all the way to many details until all the possible issues are
discovered.
Testing process includes aspects like identifying, documenting, reviewing, scripting and execution
for predicting the level of impact of storm.
1.2 Problem with the Existing System
The accuracy of the existing system is not up to 100%. But this system can predict values which
are approximate up to 80% to true values.
In the existing system, there are anomalies with respect to null input values i.e., the output is
improper when null, negative or irrational values are inserted for prediction of level of storm.
The prediction of estimated loss in dollars, number of injuries occurred and fatalities that can take
place due to the storm are not being displayed.

1.3 Proposed System
In the proposed system, the anomalies respect to null input values are eliminated and correct level
of storm is prediction. The accuracy of the existing system is enhanced so that the output is
approximately up to 100%.
The data about prediction of estimated loss in dollars, number of injuries occurred and fatalities
that can take place due to the storm are given to the user.
1.4 Scope of the Project
This application uses different classification algorithms to predict the level of storm based on
predictor values namely magnitude, length and width of the storm. This also provides information
on the accuracy with which level of storm is predicted.
The different types of classification algorithms are Support Vector Classifier, Random forest
Classifier and K-nearest neighbor Classifier.

1.5 Architecture Diagram
Figure 1: Architecture Diagram

1.6 Data set description
Field No. - (MySQL torn field id), (hail field id), (wind field id)
Description
1-(om)
Tornado number- A count of tornadoes during the year: Prior to 200, these numbers were assigned
to the tornado as the information arrived in the NWS database. Since 2007, the numbers may have
been assigned in sequential (temporal) order after event date/times are converted to CST. However,
do not use “om” to count the sequence of tornadoes through the year as sometimes new entries
have come in late, or corrections are made, and the data are not re-sequenced.
NOTE: Tornado segments that cross state borders and for more than 4 countries will have same
OM number.
2-(yr)
Year, 1950-2009
3-(mo)
Month, 1-12
4-(dy)
Day, 1-31
5-(date)
Date in yyyy-mm-dd format

6-(time)
Time in HH:MM:SS
7-(tz)
Tine zone – All times, except for? =unknown and 9=GMT, were converted to 3=CST. This should
be accounted for when building queries for GMT summaries such as 12z- 12z.
8-(st)
State – Two-letter postal abbreviation (PR= Puerto Rico, VI=Virgin Islands)
9-(stf)
State FIPS number (Note some Puerto Rico codes are incorrect)
10-(stn)
State number – number of this tornado, in this state, in this year: May not be sequential in some
years.
NOTE: discontinued in 2008. This number can be calculated in a spreadsheet by sorting and after
accounting for border crossing tornadoes and 4+ county segments.
11-(f), or (sz), or (mag)
F-scale (EF-scale after Jan. 2007): values -9, 0, 1, 2, 3, 4, 5 (-9=unknown). Or, hail size in inches.
Or, wind speed in knots (1 knot=1.15 mph).
12-(in)
Injuries - when summing for state totals use sn=1, not sg=1
13-(fat)
Fatalities - when summing for state totals use sn=1, not sg=1

14-(loss)
Estimated property loss information- Prior to 1996 this is a categorization of tornado damage by
dollar amount(0 or blank-unknown;1<$50,2=$50-$500,3=$500-$5000,4=$5000-$50,000,
5=$50,000-$500,000, 6=$500,000-$5,000,000, 7=$5,000,000-$50,000,000, 8=$50,000,000-
$500,000,000, 9=5000,000,000.)when summing for state total use sn=1, not sg=1. From 996, this
is tornado property damage in millions of dollars. Note: this may change to whole dollar amounts
in the future. Entry of 0 does not mean $0.
15-(closs)
Estimated crop loss in millions of dollars (started in 2007). Entry of 0 does not mean $0.
16-(slat)
Starting latitude in decimal degrees
17-(slon)
Starting longitude in decimal degrees
18-(elat)
Ending latitude in decimal degrees
19-(elon)
Ending longitude in decimal degrees
20-(len)
Length in miles
21-(wid)
Width in yards

22-(ns), 23-(sn), 24-(sg)
Understanding these fields is critical to counting state tornadoes, totaling state fatalities/losses.
This tornado segment information can be thought of as follows:
22- ns= Number of states affected by this tornado: 1, 2 or 3
23- sn= State Number: 1 or) (1= entire track info in this state)
24- sg= Tornado Segment number: 1, 2, or -9(1=entire track info)
1, 1, 1 = Entire record for the track of the tornado (unless all 4 fips codes are non-zero)
1, 0, -9 = Continuing county fips code information only from 1, 1, 1 record, above (same om)
2, 0, 1 = A two-state tornado (st=state of touchdown, other fields summarize entire track)
2, 1, 2 = First state segment for a two-state (2, 0, 1) tornado (same state as above, same om)
2, 1, 2 = Second state segment for a two-state (2, 0, 1) tornado (state tracked into, same om)
2, 0, -9 = Continuing county fips for a 2, 1, 2 record that exceeds 4 counties (same om)
3, 0, 1 = A three-state tornado (st=state of touchdown, other fields summarize entire track)
3, 1, 2 = First state segment for a three-state (3, 0, 1) tornado (state same as 3, 0, 1, same om)
3, 1, 2 = Second state segment for a three-state (3, 0, 1) tornado (2nd state tracked into, same om
as the initial 3, 0, 1 record)
3, 1, 2 = Third state segment for a three-state (3, 0, 1) tornado (3rd state tracked into, same om as
the initial 3, 0, 1 record)
25-(f1) 1st County FIPS code
26-(f2) 2nd County FIPS code
27-(f3) 3rd County FIPS code
28-(f4) 4th County FIPS code – Additional counties will be included in sg=-9 records with same
om number

29-(mt) WIND ONLY
Magnitude is only used for wind data. EG=estimated gust.
MS=measured sustained, ES=estimated sustained (started in 2006).
Tornado database file updated to add “fc” filed for estimated F-scale rating in 2016. Valid for
records altered between 1950-1982.
29-(fc)
fc=0 for unaltered (E)F-scale rating.
fc=1 for previous rating was -9(unknown)
Between 1953 and 1982, 1864 CONUS tornadoes were coded in the official database with an
F –scale rating of -9 (unknown)
The table below explains how these tornado records were modified to provide an estimated F –
scale rating. All changed records are identified in the database by the “fc” field (fc1 if the F –scale
was changed from -9 to another value, fc=0, all unchanged F –scales)
IF property loss in Then set F –scale IF path length <=5 IF path length >5
equal to: equal to: miles add: miles add:
0,1 (<$50) 0 0 +1
2,3 (up to $5K) 1 -1 +1
4,5 (up to $500K) 2 -1 +1
6,7 (up to $50M) 3 -1 +1
8,9 (up to $5B)* 4 -1 +1
* No F =-9 tornado records met the 8, 9 property loss criteria

Using the table above on unknown F scale rated tornado records that included property loss and
path length information resulted in the following breakdown of tornadoes ranked by estimated F
–scale (percent of total F =-9 records is shown):
F0: 1038 tornadoes (55.5%)
F1: 742 tornadoes (40.1%)
F5: None
1864 F =-9 records were changed/modified between 1953 and 1982
30-(imp)
Level of storm
Very low/VL (mag=0)
Low/L (mag=1)
Medium/M (mag=2)
Serious/S (mag=3)
High/H (mag=4)
Extreme/E (mag=5)

CHAPTER-2

2. SOFTWARE REQUIREMENTS SPECIFICATION
What is SRS?
Software Requirement Specification (SRS) is the starting point of the software developing activity.
As system grew more complex it became evident that the goal of the entire system cannot be easily
comprehended. Hence the need for the requirement phase arose. The software project is initiated
by the client needs. The SRS is the means of translating the ideas of the minds of clients (the
input) into a formal document (the output of the requirement phase.)
The SRS phase consists of two basic activities:
1) Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the problem, the goal
and constraints.
2) Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as representation,
specification languages and tools, and checking the specifications are addressed during this
activity.
The Requirement phase terminates with the production of the validate SRS document. Producing
the SRS document is the basic goal of this phase.
Role of SRS
The purpose of the Software Requirement Specification is to reduce the communication gap
between the clients and the developers. Software Requirement Specification is the medium though
which the client and user needs are accurately specified. It forms the basis of software
development. A good SRS should satisfy all the parties involved in the system.

2.1 Requirements Specification Document
A Software Requirements Specification (SRS) is a document that describes the nature of a project,
software or application. In simple words, SRS document is a manual of a project provided it is
prepared before you kick-start a project/application. This document is also known by the names
SRS report, software document. A software document is primarily prepared for a project, software
or any kind of application.
There are a set of guidelines to be followed while preparing the software requirement specification
document. This includes the purpose, scope, functional and nonfunctional requirements, software
and hardware requirements of the project. In addition to this, it also contains the information about
environmental conditions required, safety and security requirements, software quality attributes of
the project etc.
The purpose of SRS (Software Requirement Specification) document is to describe the external
behavior of the application developed or software. It defines the operations, performance and
interfaces and quality assurance requirement of the application or software. The complete software
requirements for the system are captured by the SRS.
This section introduces the requirement specification document for Storm Forecasting using
Machine Learning which enlists functional as well as non-functional requirements.
2.1.1 Functional Requirements
For documenting the functional requirements, the set of functionalities supported by the system
are to be specified. A function can be specified by identifying the state at which data is to be input
to the system, its input data domain, the output domain, and the type of processing to be carried
on the input data to obtain the output data.
Functional requirements define specific behavior or function of the application. Following are the
functional requirements:

The input design is the link between the information system and user. It compromises the
developing specification and procedures for data preparation and those steps are necessary to put
transaction data into a useable form for processing can be achieved by inspecting the computer to
read data from a written or printed documented or it can occur by having people keying the data
directly into the system. The design of input focuses on controlling the amount of input required,
controlling the errors, avoiding delay, avoiding extra steps and keeping the process simple. The
input is designed in such a way so that it provides security and ease of use with retaining the
privacy. Input design considered for the following things:
 What data should be given as input?

 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input
 Methods for preparing input validations and steps to follow when error occur.
2.1.1.1. Input design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and show
the correct direction to the management correct information from the computerized system.
2.1.1.2. It is achieved by creating user-friendly screens for data entry to handle large volume of
data. The goal of designing input is to make data entry easier and to be free from errors. The data
entry screen is designed in such a way that all the data manipulations can be performed. It also
provides viewing facilities.
2.1.1.3. When the data is entered, it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that user will not be in maize of
instant. Thus the objective of input design is to create an input layout that is easy to follow.

2.1.2 Non-Functional Requirements
A non-functional requirement is a requirement that specifies criteria that can be used to judge the
operation of a system, rather than specific behaviors. Especially these are the constraints the
system must work within. Following are the non-functional requirements:
Performance:
The performance of the developed applications can be calculated by using following methods:
Measuring enables you to identify how the performance of your application stands in relation to
your defined performance goals and helps you to identify the bottlenecks that affect your
application performance. It helps you identify whether your application is moving toward or away
from your performance goals. Defining what you will measure, that is, your metrics, and defining
the objectives for each metric is a critical part of your testing plan.
Performance objectives include the following:
 Response time or latency

 Throughput
 Resource utilization

2.2 Software Requirements
2.2.1 Operating System : Windows 7/8/10
2.2.2 Platform: PyCharm Community Version
2.2.3 Programming Language : Python, HTML
2.2.4 Front End : Flask
2.2.5 External files : Source Code of the application from
www.spc.noaa.gov /APK
2.3 Hardware Requirements
2.3.1 Processor : Intel I3/I5
2.3.2 Hard Disk : 64GB
2.3.3 RAM : 4GB or more

CHAPTER-3

3. LITERATURE SURVEY
The process of testing a software in a well-planned and systematic way is known as software
testing lifecycle (STLC).
Different organizations have different phases in STLC however generic Software Test Life Cycle
(STLC) for waterfall development model consists of the following phases.
1. Requirements Analysis
2. Test Planning
3. Test Analysis
4. Test Design
5. Test Construction and Verification
6. Test Execution and Bug Reporting
7. Final Testing and Implementation
8. Post Implementation

1. Requirements Analysis
In this phase testers analyze the customer requirements and work with developers during the design
phase to see which requirements are testable and how they are going to test those requirements.
It is very important to start testing activities from the requirements phase itself because the cost of
fixing defect is very less if it is found in requirements phase rather than in future phases.
In this phase all the planning about testing is done like what needs to be tested, how the testing
will be done, test strategy to be followed, what will be the test environment, what test
methodologies will be followed, hardware and software availability, resources, risks etc. A high
level test plan document is created which includes all the planning inputs mentioned above and
circulated to the stakeholders.
Usually IEEE 829 test plan template is used for test planning.

Figure 2: IEEE 829 test plan template
3. Test Analysis
After test planning phase is over test analysis phase starts, in this phase we need to dig deeper into
project and figure out what testing needs to be carried out in each SDLC phase.
Automation activities are also decided in this phase, if automation needs to be done for software
product, how will the automation be done, how much time will it take to automate and which
features need to be automated.
Non functional testing areas (Stress and performance testing) are also analyzed and defined in this
phase.

4. Test Design
In this phase various black-box and white-box test design techniques are used to design the test
cases for testing, testers start writing test cases by following those design techniques, if automation
testing needs to be done then automation scripts also needs to written in this phase.
5. Test Construction and Verification
In this phase testers prepare more test cases by keeping in mind the positive and negative scenarios,
end user scenarios etc. All the test cases and automation scripts need to be completed in this phase
and got reviewed by the stakeholders. The test plan document should also be finalized and verified
by reviewers.
6. Test Execution and Bug Reporting
Once the unit testing is done by the developers and test team gets the test build, The test cases are
executed and defects are reported in bug tracking tool, after the test execution is complete and all
the defects are reported. Test execution reports are created and circulated to project stakeholders.
After developers fix the bugs raised by testers they give another build with fixes to testers, testers
do re-testing and regression testing to ensure that the defect has been fixed and not affected any
other areas of software.
Testing is an iterative process i.e. If defect is found and fixed, testing needs to be done after every
defect fix.
After tester assures that defects have been fixed and no more critical defects remain in software
the build is given for final testing.
7. Final Testing and Implementation
In this phase the final testing is done for the software, non functional testing like stress, load and
performance testing are performed in this phase. The software is also verified in the production
kind of environment. Final test execution reports and documents are prepared in this phase.

8. Post Implementation
In this phase the test environment is cleaned up and restored to default state, the process review
meetings are done and lessons learnt are documented. A document is prepared to cope up similar
problems in future releases.
Phase Activities Outcome
Planning Create high level test plan Test plan, Refined Specification
Analysis Create detailed test plan, Revised Test Plan, Functional

Functional Validation Matrix, validation matrix, test cases
test cases
Design Test cases are revised; select Revised test cases, test data sets,
which test cases to automate sets, risk assessment sheet
Construction Scripting of test cases to Test procedures/Scripts, Drivers,

automate, test results, Bug Reports.
Testing cycles Complete testing cycles Test results, Bug Reports
Final testing Execute remaining stress and Test results and different metrics
performance tests, complete on test efforts
documentation
Post implementation Evaluate testing processes Plan for improvement of testing

process

Technologies Used
1. Python
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is

designed to be highly readable. It uses English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other languages.
History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).
Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
Importance of Python
 Python is Interpreted − Python is processed at runtime by the interpreter. You do not
need to compile your program before executing it. This is similar to PERL and PHP.
 Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
 Python is Object-Oriented − Python supports Object-Oriented style or technique of

programming that encapsulates code within objects.
 Python is a Beginner's Language − Python is a great language for the beginner-level

programmers and supports the development of a wide range of applications from simple
text processing to WWW browsers to games.

Features of Python
 Easy-to-learn − Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.
 Easy-to-read − Python code is more clearly defined and visible to the eyes.
 Easy-to-maintain − Python's source code is fairly easy-to-maintain.
 A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
 Interactive Mode − Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
 Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
 Extendable − You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
 Databases − Python provides interfaces to all major commercial databases.
 GUI Programming − Python supports GUI applications that can be created and ported to
many system calls, libraries and windows systems, such as Windows MFC, Macintosh,
and the X Window system of Unix.
 Scalable − Python provides a better structure and support for large programs than shell
scripting.

Apart from the above-mentioned features, Python has a big list of good features, few are listed
below −
 It supports functional and structured programming methods as well as OOP.
 It can be used as a scripting language or can be compiled to byte-code for building large
applications.
 It provides very high-level dynamic data types and supports dynamic type checking.
 IT supports automatic garbage collection.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
Libraries used in python:
 numpy - mainly useful for its N-dimensional array objects.
 pandas - Python data analysis library, including structures such as dataframes.
 matplotlib - 2D plotting library producing publication quality figures.
 scikit-learn - the machine learning algorithms used for data analysis and data mining tasks.
Figure 3: NumPy, Pandas, Matplotlib, Scikit-learn

2. HTML
Hypertext Markup Language (HTML) is the standard markup language for creating web
pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of
cornerstone technologies for the World Wide Web. Web browsers receive HTML documents from
a web server or from local storage and render them into multimedia web pages. HTML describes
the structure of a web page semantically and originally included cues for the appearance of the
document.
HTML elements are the building blocks of HTML pages. With HTML constructs, images and
other objects, such as interactive forms, may be embedded into the rendered page. It provides a
means to create structured documents by denoting structural semantics for text such as headings,
paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written
using angle brackets. Tags such as <img /> and <input /> introduce content into the page directly.
Others such as <p>...</p> surround and provide information about document text and may include
other tags as sub-elements. Browsers do not display the HTML tags, but use them to interpret the
content of the page.
HTML can embed programs written in a scripting language such as JavaScript which affect the
behavior and content of web pages. Inclusion of CSS defines the look and layout of content.
The World Wide Web Consortium (W3C), maintainer of both the HTML and the CSS standards,
has encouraged the use of CSS over explicit presentational HTML since 1997.
History of HTML
From 1991 to 1999, HTML developed from version 1 to version 4.
In year 2000, the World Wide Web Consortium (W3C) recommended XHTML 1.0. The XHTML
syntax was strict, and the developers were forced to write valid and "well-formed" code.
In 2004, W3C's decided to close down the development of HTML, in favor of XHTML.

In 2004, WHATWG (Web Hypertext Application Technology Working Group) was formed. The
WHATWG wanted to develop HTML, consistent with how the web was used, while being
backward compatible with older versions of HTML.
In 2004 - 2006, the WHATWG gained support by the major browser vendors.
In 2006, W3C announced that they would support WHATWG.
In 2008, the first HTML5 public draft was released.
In 2012, WHATWG and W3C decided on a separation.
Features of HTML:
 Web Workers: Certain web applications use heavy scripts to perform functions. Web
Workers use separate background threads for processing and it does not affect the
performance of a web page.
 Video: You can embed video without third-party proprietary plug-ins or codec. Video
becomes as easy as embedding an image.
 Canvas: This feature allows a web developer to render graphics on the fly. As with video,
there is no need for a plug in.
 Application caches: Web pages will start storing more and more information locally on
the visitor's computer. It works like cookies, but where cookies are small, the new feature
allows for much larger files. Google Gears is an excellent example of this in action.
 Geolocation: Best known for use on mobile devices, geolocation is coming with
HTML5.

3. CSS
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a
document written in a markup language. Although most often used to set the visual style of web
pages and user interfaces written in HTML and XHTML, the language can be applied to
any XML document, including plain XML, SVG and XUL, and is applicable to rendering
in speech, or on other media. Along with HTML and JavaScript, CSS is a cornerstone technology
used by most websites to create visually engaging webpages, user interfaces for web applications,
and user interfaces for many mobile applications.
Features of CSS:
Eventually, CSS -- along with HTML5 -- are going to be the future of the web. You should begin
making your Web pages compatible with these latest specifications. In this article, I explore 10 of
the exciting new features in CSS 3, which is going to change the way developers who used CSS2
build websites.
 Selectors
In addition to the selectors that were available in CSS2, CSS introduces some new selectors. Using
these selectors you can choose DOM elements based on their attributes. So you don't need to
specify classes and IDs for every element. Instead, you can utilize the attribute field to style them.
 Rounded Corners
Rounded corner elements can spruce up a website, but creating a rounded corner requires a
designer to write a lot of code. Adjusting the height, width and positioning of these elements
is a never-ending chore because any change in content can break them.
CSS addresses this problem by introducing the border-radius property, which gives you the same
rounded-corner effect and you don't have to write all the code. Here are examples for displaying
rounded corners in different places of a website.
 Border Image
Another exciting feature in CSS 3 is the ability to swap out a border with an image. The property
border-image allows you to specify an image to display instead of a plain solid-coloured border.

 Box Shadow
A box shadow allows you to create a drop shadow for an element. Usually this effect is achieved
using a repeated image around the element. However, with the property box-shadow this can be
achieved by writing a single line of CSS code.
After previously removing this property from the CSS 3 Backgrounds and Borders Module, the
W3C added it back in the last working draft.
 Text Shadow
The new text-shadow property allows you to add drop shadows to the text on a webpage. Prior to
CSS 3, this would be done by either using an image or duplicating a text element and then
positioning it. A similar property called box-shadow is also available in CSS.
 Gradient
While the Gradient effect is a sleek Web design tool, it can be a drain on resources if not
implemented correctly using current CSS techniques. Some designers use a complete image and
put in the background for the gradient effect, which increases the page load time.
 RGBA: Color, Now with Opacity
The RGB property in CSS is used to set colors for different elements. CSS 3 introduces a
modification and added opacity manipulation to this property. RGB has been transformed to
RGBA (Red Green Blue Alpha channels), which simplifies how you control the opacity of
elements.
 Transform (Element Rotation)
CSS 3 also introduced a property called transform, which enables rotating Web elements on a
webpage. As of now, if a designer wants to rotate of an element, he or she uses JavaScript. Many
JavaScript extensions/plugins are available online for this feature, but they can make the code
cumbersome and most importantly consume more resources.
 Multicolumn Layout
Almost every webpage today is divided into columns or boxes, and adjusting these boxes so they
display correctly on different browsers takes a toll on Web designers. CSS 3 solves this problem
with the multicolumn layout property; all you have to do is specify the number of columns you
need and they will be created.

 Web Fonts
CSS 3 also facilitates embedding any custom font on a webpage. Fonts are dependent on the client
system and Web pages can render only fonts that are supported by the browser or the client
machine. By using the @font-face property, you can include the font from a remote location and
can then use it.
Importance of CSS in Web Development
 Improves Website Presentation

The standout advantage of CSS is the added design flexibility and interactivity it brings to web
development. Developers have greater control over the layout allowing them to make precise
section-wise changes.
As customization through CSS is much easier than plain HTML, web developers are able to create
different looks for each page. Complex websites with uniquely presented pages are feasible thanks
to CSS.
 Makes Updates Easier and Smoother
CSS works by creating rules. These rules are simultaneously applied to multiple elements within
the site. Eliminating the repetitive coding style of HTML makes development work faster and less
monotonous. Errors are also reduced considerably.
Since the content is completely separated from the design, changes across the website can be
implemented all at once. This reduces delivery times and costs of future edits.
 Helps Web Pages Load Faster
Improved website loading is an underrated yet important benefit of CSS. Browsers download the
CSS rules once and cache them for loading all the pages of a website. It makes browsing the
website faster and enhances the overall user experience.
This feature comes in handy in making websites work smoothly at lower internet speeds.
Accessibility on low end devices also improves with better loading speeds.

Limitations of CSS Technology
 Browser Dependent
The only major limitation of CSS is that its performance depends largely on browser support.
Besides compatibility, all browsers (and their many versions) function differently. So your CSS
needs to account for all these variations.
However, in case your CSS styling isn’t fully supported by a browser, people will still be able to
experience the HTML functionalities. Therefore, you should always have a well-structured HTML
along with good CSS.
 Difficult to retrofit in old websites
The instinctive reaction after learning the many advantages of CSS is to integrate it into your
existing website. Sadly, this isn’t a simple process. CSS style sheets, especially the latest versions,
have to be integrated into the HTML code at the ground level and must also be compatible with
HTML versions. Retrofitting CSS into older websites is a slow tedious process.
There is also the risk of breaking the old HTML code altogether and thus making the site dead.
It’s best to wait till you redesign your website from scratch.
As you can see from above points, the advantages of CSS development outweigh its limitations. It
is a very useful web development tool that every programmer must master along with basic
HTML.

CHAPTER-4

4. SYSTEM DESIGN
4.1 Introduction to UML
The Unified Modeling Language allows the software engineer to express an analysis model using
the modeling notation that is governed by a set of syntactic, semantic and pragmatic rules.
A UML system is represented using five different views that describe the system from distinctly
different perspective. Each view is defined by a set of diagram, which is as follows:
1. User Model View
i. This view represents the system from the users’ perspective.

ii. The analysis representation describes a usage scenario from the end-users’ perspective.
2. Structural Model View
i. In this model, the data and functionality are arrived from inside the system.
ii. This model view models the static structures.
3. Behavioral Model View
It represents the dynamic of behavioral as parts of the system, depicting he interactions of

collection between various structural elements described in the user model and structural model
view.
4. Implementation Model View
In this view, the structural and behavioral as parts of the system are represented as they are to be
built.
5. Environmental Model View
In this view, the structural and behavioral aspects of the environment in which the system is to be
implemented are represented.

4.2 UML Diagrams
4.2.1 Use Case Diagram
To model a system, the most important aspect is to capture the dynamic behavior. To clarify a bit
in details, dynamic behavior means the behavior of the system when it is running /operating.
So only static behavior is not sufficient to model a system rather dynamic behavior is more
important than static behavior. In UML there are five diagrams available to model dynamic nature
and use case diagram is one of them. Now as we have to discuss that the use case diagram is
dynamic in nature there should be some internal or external factors for making the interaction.
These internal and external agents are known as actors. So use case diagrams are consists of actors,
use cases and their relationships. The diagram is used to model the system/subsystem of an
application. A single use case diagram captures a particular functionality of a system. So to model
the entire system numbers of use case diagrams are used.
Use case diagrams are used to gather the requirements of a system including internal and external
influences. These requirements are mostly design requirements. So when a system is analyzed to
gather its functionalities use cases are prepared and actors are identified.
In brief, the purposes of use case diagrams can be as follows:
a. Used to gather requirements of a system.
b. Used to get an outside view of a system.
c. Identify external and internal factors influencing the system.
d. Show the interacting among the requirements are actors.

Figure 4: Use case Diagram for entire application functionality

4.2.2 Sequence Diagram
Sequence diagrams describe interactions among classes in terms of an exchange of messages over
time. They're also called event diagrams. A sequence diagram is a good way to visualize and
validate various runtime scenarios. These can help to predict how a system will behave and to
discover responsibilities a class may need to have in the process of modeling a new system.
The aim of a sequence diagram is to define event sequences, which would have a desired outcome.
The focus is more on the order in which messages occur than on the message per se. However, the
majority of sequence diagrams will communicate what messages are sent and the order in which
they tend to occur.
Basic Sequence Diagram Notations
 Class Roles or Participants
Class roles describe the way an object will behave in context. Use the UML object symbol to
illustrate class roles, but don't list object attributes.
 Activation or Execution Occurrence
Activation boxes represent the time an object needs to complete a task. When an object is busy
executing a process or waiting for a reply message, use a thin gray rectangle placed vertically on
its lifeline.
 Messages
Messages are arrows that represent communication between objects. Use half-arrowed lines to
represent asynchronous messages.
Asynchronous messages are sent from an object that will not wait for a response from the receiver
before continuing its tasks.
 Lifelines
Lifelines are vertical dashed lines that indicate the object's presence over time.

 Destroying Objects
Objects can be terminated early using an arrow labeled "<< destroy >>" that points to an X. This
object is removed from memory. When that object's lifeline ends, you can place an X at the end of
its lifeline to denote a destruction occurrence.
 Loops
A repetition or loop within a sequence diagram is depicted as a rectangle. Place the condition for
exiting the loop at the bottom left corner in square brackets [ ].
 Guards
When modeling object interactions, there will be times when a condition must be met for a message
to be sent to an object. Guards are conditions that need to be used throughout UML diagrams to
control flow.
Figure 5: Sequence Diagram


CHAPTER-5

5. IMPLEMENTATION
5.1 Pseudo code
Step 1:
Initially select the functionality for which script must be written.
Step 2:
Link the source code with the by importing necessary packages.
import inline as inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import warnings
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import *
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import roc_curve, auc
from sklearn.feature_selection import *

Step 3:
Add the main script in a file in flask (add.py)
Step 4:
Create two HTML pages, one for input and one for output (input.html and result.html)
Step 5:
Write the required classification algorithms (SVM and KNN)
Step 6:
Run the console
Step 7:
Stop
5.2 Code Snippets
/*Code for Main script (app.py)*/
from flask import Flask, render_template, request
from src.knn import Knn

from src.svm import Svm
app = Flask(__name__)
@app.route('/')
def student():
return render_template('input.html')
@app.route('/result', methods=['POST'])
def result():

magnitude = request.form['magnitude']
length=request.form['length']
width=request.form['width']
Svm.getinput(magnitude,length,width)
result1=Svm.svmresult()
s1= result1[0]
a=result1[1][0]
if (s1 == 0 ):
r1="VERY LOW"
elif (s1 ==1):
r1="LOW"
elif (s1 == 2 ):
r1="MEDIUM"
elif (s1 == 3 ):
r1="STRONG"
elif (s1 == 4):
r1="HIGH"
else:
r1="VERY HIGH"
print(r1)
Knn.getinput(magnitude,length,width)
result2=Knn.knnresult()
s2=result2[0]
a2 = result2[1][0]
if (s2 == 0 ):
r="VERY LOW"
elif (s2 ==1):
r="LOW"
elif (s2 == 2 ):

r="MEDIUM"
elif (s2 == 3 ):
r="STRONG"
elif (s2 == 4):
r="HIGH"
else:
r="VERY HIGH"
print(r)
return render_template("result.html",r1=r1,r=r,a=a,a2=a2)
if __name__ == '__main__':
app.run(debug=True)
/*Code for input (input.html)*/
<!DOCTYPE html>
<head>
<style>
div{
background-color: #2c198c;
color: white;
font-size:350%;
align:top;
text-align:center;
}
body{
background-color:#FFFFF;
}

</style>
</head>
<div>STORM FORECASTING USING MACHINE LEARNING</div>
<body>
<center>
<form action = "/result" method = "POST">

<br><br>
<h2><b>Enter the following values to predict the level of storm:</b></h2>
<br><br>
<b> Magnitude value:</b>
<select id="magnitude" name="magnitude">
<option value="0">0</option>
</select>

<br><br>
<b> Length:</b>
<input type="text" id="length" name="length">
<br><br>
<b>Width:</b>

<input type="text" id="width" name="width">
<br><br>
<input type="submit" id="submit" value="Predict">
<input type="reset" id="reset" value="Reset">
<br><br>
</form>
<br><br><br>
<b>Units:<br></b>
Magnitude in Knots  (1 knot=1.15 mph)<br>
Length in miles<br>
Width in yards
</center>
</body>
</html>
/*Code for output (result.html)*/
<!DOCTYPE html>
<html>
<head>
<style>
div{
background-color: #2c198c;
color: white;
font-size:400%;
align:top;
text-align:center;
}
body{
background-color:#FFFFF;

}
</style>
</head>
<div>LEVEL OF STORM PREDICTION</div>
<br>
<body>
<br>
<br>
<center>
<h2>Level of storm {{ r1 }}</h2>
</center>
</body>
</html>
/*Code for SVM (svm.py)*/

import pandas as pd
import numpy as np
import warnings


class Svm():
l1=[]
@staticmethod
def getinput(mag,len,wid):
Svm.l1.append(int(mag))
Svm.l1.append(float(len))
Svm.l1.append(float(wid))
print(Svm.l1)
@staticmethod
def svmresult():
print(Svm.l1)
defadv=pd.read_csv('C:\\Users\\HRIDDHI\\Desktop\\dataset.csv')
number = LabelEncoder()
defadv['imp'] = number.fit_transform(defadv['imp'].astype('str'))
defadv.drop(['mo'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['yr'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['dy'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['time'],axis=1,inplace=True)
defadv[:5]

defadv.drop(['tz'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['stn'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slat'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slon'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['elat'],axis=1,inplace=True)
defadv[:5]
warnings.filterwarnings("ignore")
df=pd.DataFrame({'Magnitude':defadv.mag,'injuries':defadv.inj,'fatalities':defadv.fat,'loss':de
fadv.loss,'croploss':defadv.closs,'length':defadv.len,'width':defadv.wid})
x=defadv[['mag','len','wid']]
y=defadv['imp']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
#svm precision and recall accuracy_score,precision,recall
svc = SVC(probability=True)
svc.fit(x_train,y_train)

ypred=svc.predict(x_test)
yt=[]
yp=[]
for i in y_test :
yt.append(i)
for j in ypred :
yp.append(j)
dec=svc.decision_function(x)
acc=accuracy_score(y_test, ypred)
precision=precision_score(y_test, ypred, average = 'macro')
recall=recall_score(y_test, ypred, average = 'macro')
f1score=f1_score(y_test, ypred, average = 'macro')
l2=[acc,precision,recall,f1score]
cm1=confusion_matrix(y_test,svc.predict(x_test))
print(cm1)
print(l2)
#svm roc curve

y_prob = svc.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_prob, pos_label=1)
roc_auc = auc(fpr, tpr)
plt.title('Support Vector Machines')
plt.plot(fpr, tpr, 'b', label='AUC = %s'% round(roc_auc, 4))
print("\nAUC \t: %s" % round(roc_auc, 4))

plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([-0.05,1.0])
plt.ylim([0.0,1.05])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
return svc.predict([Svm.l1]),l2,yt,yp
/*Code for KNN (knn.py)*/

import pandas as pd
import numpy as np
import warnings

from sklearn.neighbors import KNeighborsClassifier
class Knn():
l1=[]
@staticmethod
def getinput(mag, len, wid):
Knn.l1.append(int(mag))
Knn.l1.append(float(len))
Knn.l1.append(float(wid))

print(Knn.l1)
@staticmethod
def knnresult():
defadv = pd.read_csv('C:\\Users\\HRIDDHI\\Desktop\\dataset.csv')
number = LabelEncoder()
defadv['imp'] = number.fit_transform(defadv['imp'].astype('str'))
defadv.drop(['mo'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['yr'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['dy'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['time'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['tz'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['stn'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slat'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slon'],axis=1,inplace=True)

defadv[:5]
defadv.drop(['elat'],axis=1,inplace=True)
defadv[:5]
warnings.filterwarnings("ignore")
df=pd.DataFrame({'Magnitude':defadv.mag,'injuries':defadv.inj,'fatalities':defadv.fat,'loss':de
fadv.loss,'croploss':defadv.closs,'length':defadv.len,'width':defadv.wid})
x = defadv[['mag', 'len', 'wid']]

y=defadv['imp']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(x_train, y_train)
ypred1=neigh.predict(x_test)
print(ypred1)
cm1=confusion_matrix(y_test,neigh.predict(x_test))
print(cm1)
acc = accuracy_score(y_test, ypred1)

precision = precision_score(y_test, ypred1, average='macro')
recall = recall_score(y_test, ypred1, average='macro')
f1score = f1_score(y_test, ypred1, average='macro')
l2 = [acc, precision, recall, f1score]
print(l2)

y_prob = neigh.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_prob, pos_label=1)
roc_auc = auc(fpr, tpr)
plt.title('K-nearest neighbour')
plt.plot(fpr, tpr, 'b', label='AUC = %s'% round(roc_auc, 4))
print("\nAUC \t: %s" % round(roc_auc, 4))
plt.plot([0,1],[0,1],'r--')
plt.xlim([-0.05,1.0])
plt.ylim([0.0,1.05])
plt.show()
return neigh.predict([Knn.l1]),l2

CHAPTER-6

6. SYSTEM ANALYSIS
Machine learning algorithms used for natural language processing (NLP) currently take too long
to complete their learning function. This slow learning performance tends to make the model
ineffective for an increasing requirement for real time applications such as voice transcription,
language translation, text summarization topic extraction and sentiment analysis. Moreover,
current implementations are run in an offline batch-mode operation and are unfit for real time
needs. Newer machine learning algorithms are being designed that make better use of sampling
and distributed methods to speed up the learning performance.
Libraries imported
Following machine learning libraries are being implemented in this project for testing:
1. sklearn/scikit-learn
Scikit-learn (formerly scikits.learn) is a free software machine learning library for

the Python programming language. It features various classification, regression
and clustering algorithms including support vector machines, random forests, gradient
boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and
scientific libraries NumPy and SciPy.
Following metrics are being imported:

 Classification report
Classification report is used to evaluate a model’s predictive power. It is one of the most critical
steps in machine learning.
After you have trained and fitted your machine learning model it is important to evaluate the
model’s performance.

One way to do this is by using sklearn’s classification report.
It provides the following that will help in evaluating the model:
 Precision
 Recall
 F1-score
 Support
The first step is importing the classification_report library.
Once the library has been imported you can now run the classification report with this Python
command:
print(classification_report(y_test,predictions))
y_test the dependent variable from your test data set. (train-test split of data)
predictions is the data output of your model.
Make sure that the exact arrangement where y_test variable comes before predictions variable in
the Python code is followed.
If not the it will give a wrong model performance which will lead to a wrong evaluation.
Here is a sample output for classification_report:
You can see here that on average the model has predicted 85% of the classification correctly.
For Class 0.0 it has predicted 86% of the test data correctly.
Classification_report is also useful when comparing two models with different specifications
against each other and determining which model is better to use.
 Confusion matrix
A confusion matrix is a summary of prediction results on a classification problem. The number of
correct and incorrect predictions are summarized with count values and broken down by each class.
This is the key to the confusion matrix.

The confusion matrix shows the ways in which your classification model
is confused when it makes predictions. It gives you insight not only into the errors being made by
your classifier but more importantly the types of errors that are being made.
It is this breakdown that overcomes the limitation of using classification accuracy alone.
Below is the process for calculating a confusion Matrix.
1. You need a test dataset or a validation dataset with expected outcome values.
2. Make a prediction for each row in your test dataset.
3. From the expected outcomes and predictions count:
a. The number of correct predictions for each class.
b. The number of incorrect predictions for each class, organized by the class that was predicted.
These numbers are then organized into a table, or a matrix as follows:
Expected down the side: Each row of the matrix corresponds to a predicted class.
Predicted across the top: Each column of the matrix corresponds to an actual class.
The counts of correct and incorrect classification are then filled into the table.
The total number of correct predictions for a class goes into the expected row for that class value
and the predicted column for that class value.
In the same way, the total number of incorrect predictions for a class goes into the expected row
for that class value and the predicted column for that class value.
This matrix can be used for 2-class problems where it is very easy to understand, but can easily be
applied to problems with 3 or more class values, by adding more rows and columns to the
confusion matrix.
 ROC curve
The ROC curve stands for Receiver Operating Characteristic curve, and is used to visualize
the performance of a classifier. When evaluating a new model performance, accuracy can be very
sensitive to unbalanced class proportions. The ROC curve is insensitive to this lack of balance in
the data set.

On the other hand when using precision and recall, we are using a single discrimination threshold
to compute the confusion matrix. The ROC Curve allows the modeler to look at the performance
of his model across all possible thresholds. To understand the ROC curve we need to understand
the x and y axes used to plot this. On the x axis we have the false positive rate, FPR or fall-out
rate. On the y axis we have the true positive rate, TPR or recall.
To test out the Scikit calls that make this curve for us, we use a simple array repeated many times
and a prediction array of the same size with different element. The first thing to notice for the roc
curve is that we need to define the positive value of a prediction. In our case since our example is
binary the class “1” will be the positive class. Second we need the prediction array to contain
probability estimates of the positive class or confidence values. This very important because the
roc_curve call will set repeatedly a threshold to decide in which class to place our predicted
probability. Let’s see the code that does this.
1) Import needed modules
1 from sklearn.metrics import roc_curve, auc
2 import matplotlib.pyplot as plt
3 import random
2) Generate actual and predicted values. First let use a good prediction probabilities array:
1 actual = [1,1,1,0,0,0]
2 predictions = [0.9,0.9,0.9,0.1,0.1,0.1]
3) Then we need to calculated the fpr and tpr for all thresholds of the classification. This is where
the roc_curve call comes into play. In addition we calculate the auc or area under the curve which
is a single summary value in [0,1] that is easier to report and use for other purposes. You usually
want to have a high auc value from your classifier.
1 false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, predictions)
2 roc_auc = auc(false_positive_rate, true_positive_rate)
4) Finally we plot the fpr vs tpr as well as our auc for our very good classifier.

1 plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, true_positive_rate, 'b',
2
label='AUC = %0.2f'% roc_auc)
3
4
plt.plot([0,1],[0,1],'r--')
5
plt.xlim([-0.1,1.2])
6
plt.ylim([-0.1,1.2])
7
8
9
plt.show()
10
The figure show how a perfect classifier roc curve looks like:
Figure 8: Graph 1
Here the classifier did not make a single error. The AUC is maximal at 1.00. Let’s see what
happens when we introduce some errors in the prediction.
1 actual = [1,1,1,0,0,0]
2 predictions = [0.9,0.9,0.1,0.1,0.1,0.1]

Figure 9: Graph 2
As we introduce more errors the AUC value goes down. There are a couple of things to remember
about the roc curve:
1. There is a tradeoff between the TPR and FPR as we move the threshold of the classifier.
2. When the test is more accurate the roc curve is closer to the left top borders
3. A useless classifier is one that has its ROC curve exactly aligned with the diagonal. How does that
look like? Let’s say we have a classifier that always gives 0.5 for the classification probabilities.
1 actual = [1,1,1,0,0,0]
2 predictions = [0.5,0.5,0.5,0.5,0.5,0.5]
The ROC Curve would like this:
Figure 10: Graph

Concerning the AUC, a simple rule of thumb to evaluate a classifier based on this summary
value is the following:
 .90-1 = very good (A)

 .80-.90 = good (B)
 .70-.80 = not so good (C)
 .60-.70 = poor (D)
 .50-.60 = fail (F)
This example dealt with a two class problem (0,1).
 AUC
To use the ROC curve to quantify the performance of a classifier, and give a higher score for
this classifier than this classifier. That is the purpose of AUC, which stands for Area Under the
Curve.

Sklearn PREPROCESSING:
The sklearn.preprocessing package provides several common utility functions and transformer
classes to change raw feature vectors into a representation that is more suitable for the downstream
estimators.
In general, learning algorithms benefit from standardization of the data set. If some outliers are
present in the set, robust scalers or transformers are more appropriate. The behavior of the different
scalers, transformers, and normalizers on a dataset containing marginal outliers is highlighted in
Compare the effect of different scalers on data with outliers.
The preprocessing module further provides a utility class StandardScaler that implements the
Transformer API to compute the mean and standard deviation on a training set so as to be able to
later reapply the same transformation on the testing set. This class is hence suitable for use in the
early steps of a sklearn.pipeline.Pipeline:
Label encoding:
LabelEncoder is a utility class to help normalize labels such that they contain only values
between 0 and n_classes-1. This is sometimes useful for writing efficient Cython routines.
LabelEncoder can be used as follows:

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2])
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable)
to numerical labels:
sklearn.model_selection.train_test_split(*arrays, **options):
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to
input data into a single call for splitting (and optionally subsampling) data in a oneliner.
Feature selection:
The classes in the sklearn.feature_selection module can be used for feature
selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores
or to boost their performance on very high-dimensional datasets.
Removing features with low variance:
VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose
variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e.
features that have the same value in all samples.

As an example, suppose that we have a dataset with boolean features, and we want to remove all
features that are either one or zero (on or off) in more than 80% of the samples. Boolean features
are Bernoulli random variables, and the variance of such variables is given by
{Var}[X] = p(1 - p)
Univariate feature selection:
Univariate feature selection works by selecting the best features based on univariate statistical
tests. It can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection
routines as objects that implement the transform method.
2. matplotlib
Matplotlib:
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments across platforms. Matplotlib can be used in
Python scripts, the Python and IPythonshells, the Jupyter notebook, web application servers, and
four graphical user interface toolkits.
Figure 11: Plot 1 Figure 12: Plot 2

Figure 13: Plot 3 Figure 14: Plot 4
Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.
For examples, see the sample plots and thumbnails gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with IPython. For the power user, you have full control of line styles, font properties,
axes properties, etc, via an object oriented interface or via a set of functions familiar to MATLAB
users.
Pyplot :
Provides a MATLAB-like plotting framework.
Pylab combines pyplot with numpy into a single namespace. This is convenient for interactive
work, but for programming it is recommended that the namespaces be kept separate.
3. NumPy

NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
 a powerful N-dimensional array object

 sophisticated (broadcasting) functions
 tools for integrating C/C++ and Fortran code
 useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly
and speedily integrate with a wide variety of databases.
import numpy as np
4. Pandas
import pandas as pd
Library features
 DataFrame object for data manipulation with integrated indexing.

 Tools for reading and writing data between in-memory data structures and different file
formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of data sets.
 Label-based slicing, fancy indexing, and subsetting of large data sets.
 Data structure column insertion and deletion.
 Group by engine allowing split-apply-combine operations on data sets.
 Data set merging and joining.
 Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data
structure.

 Time series-functionality: Date range generation and frequency conversion, moving window
statistics, moving window linear regressions, date shifting and lagging.
The library is highly optimized for performance, with critical code paths written in Cython or C.
The following Classification techniques are implemented:
1. Support Vector Machines(SVM)
Support Vector Machines(SVMs) have been extensively researched in the data mining and
machine learning communities for the last decade and actively applied to applications in various
domains. SVMs are typically used for learning classification, regression, or ranking functions, for
which they are called classifying SVM, support vector regression (SVR), or ranking SVM (or
RankSVM) respectively. Two special properties of SVMs are that SVMs achieve (1) high
generalization by maximizing the margin and (2) support an efficient learning of nonlinear
functions by kernel trick.
Working:
• Family of machine-learning algorithms that are used for mathematical and
engineering problems including for example handwriting digit recognition, object
recognition, speaker identification, face detections in images and target detection.
• Task: Assume we are given a set S of points xi ? R n with i =1,2,..., N. Each point xi
belongs to either of two classes and thus is given a label yi ? {-1,1}. The goal is to
establish the equation of a hyperplane that divides S leaving all the points of the same
class on the same side.
• SVM performs classification by constructing an N-dimensional hyperplane that
optimally separates the data into two categories.
Classifying the data

• Let’s consider the objects on illustration on the left. We can see that the objects
belong to two different classes. The separating line (2 – dimentional hyperplane) on the
second picture is a decision plane which divides the objects into two subsets such that
in each subset all elements are similar.
Note: There are a lot of possible separating lines for a given set of objects.

Transforming the Data
• The mathematical equation which describes the separating boundary between two
classes should be simple.
• This is why we map the data of input space into feature space. The mapping
(rearranging) involves increasing dimension of the feature space.
• The data points are mapped from the input space to a new feature space before they
are used for training or for classification.
• After transforming the Data and after learning we look for an answer by examing
simpler feature space.
Learning
• Learning can be regarded as finding the maximum margin separating hiperplane
between two classes of points. Suppose that a pair (w,b) defines a hyperplane which
has the following equation:
• Let {x1, ..., xm} be our data set and let yi ? {1,-1} be the class label of xi.
• The decision boundary should classify all points correctly i.e. the following equations
have to be satisfied:
• Among all hyperplanes separating the data, there exists a unique one yielding the
maximum margin of separation between the classes
Kernel Functions:
• Kernel Function computes the similarity of two data points in the feature space using
dot product.
• The selection of an appropriate kernel function is important, since the kernel function
defines the feature space in which the training set examples will be classified.
• The kernel expresses prior knowledge about the phenomenon being modeled,
encoded as a similarity measure between two vectors.
• A support vector machine can locate a separating hyperplane in the feature space and
classify points in that space without even representing the space explicitly, simply by
defining a kernel function, that plays the role of the dot product in the feature space.

How can we identify the right hyper-plane?
 Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and
C). Now, identify the right hyper-plane to classify star and circle
Figure 15: Right hyper-plane 1
 You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-
plane which segregates the two classes better”. In this scenario, hyper-plane “B”
has excellently performed this job.
 Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and
C) and all are segregating the classes well. Now, How can we identify the right hyper-
plane?

Here, maximizing the distances between nearest data point (either class) and hyper-plane
will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look
at the below snapshot:
Above, you can see that the margin for hyper-plane C is high as compared to both A and
B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin
then there is high chance of miss-classification.

 Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in previous
section to identify the right hyper-plane
Some of you may have selected the hyper-plane B as it has higher margin compared to A. But,
here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior
to maximizing margin. Here, hyper-plane B has a classification error and A has classified all
correctly. Therefore, the right hyper-plane is A.
 Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of star lies in the territory of other(circle) class as an
outlier.

Figure 19: Hyper-plane
 As I have already mentioned, one star at other end is like an outlier for star class. SVM has
a feature to ignore outliers and find the hyper-plane that has maximum margin. Hence, we
can say, SVM is robust to outliers.
Figure 20: Figure 19: Hyper-plane
 Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we

can’t have linear hyper-plane between the two classes, so how does SVM classify these
two classes? Till now, we have only looked at the linear hyper-plane.

 SVM can solve this problem. Easily! It solves this problem by introducing additional
feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis
x and z:
In above plot, points to consider are:

o All values for z would be positive always because z is the squared sum of both x
and y
o In the original plot, red circles appear close to the origin of x and y axes, leading to
lower value of z and star relatively away from the origin result to higher value of z.
In SVM, it is easy to have a linear hyper-plane between these two classes. But, another
burning question which arises is, should we need to add this feature manually to have a
hyper-plane. No, SVM has a technique called the kerneltrick These are functions which
takes low dimensional input space and transform it to a higher dimensional space i.e. it
converts not separable problem to separable problem, these functions are called kernels. It
is mostly useful in non-linear separation problem. Simply put, it does some extremely
complex data transformations, then find out the process to separate the data based on the
labels or outputs you’ve defined.
When we look at the hyper-plane in original input space it looks like a circle:

Parameters of SVM:
C Parameter:
The C parameter tells the SVM optimization how much you want to avoid
misclassifying each training example. For large values of C, the optimization will
choose a smaller-margin hyperplane if that hyperplane does a better job of getting all
the training points classified correctly. Conversely, a very small value of C will cause
the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane
misclassifies more points. For very tiny values of C, you should get misclassified
examples, often even if your training data is linearly separable.
Kernel Parameter:
SVM plays an important role in classification. Here different kernel parameters are
used as a tuning parameter to improve the classification accuracy. There are mainly
four different types of kernels (Linear, Polynomial, RBF, and Sigmoid) that are popular
in SVM classifier.
Gamma Parameter:
Gamma is the parameters for a nonlinear support vector machine (SVM) with a
Gaussian radial basis function kernel. A standard SVM seeks to find a margin that
separates all positive and negative examples. Gamma is the free parameter of the
Gaussian radial basis function.
2. K-Nearest Neighbour
KNN isoften referred to as a lazy learningalgorithm. Non-Parametric: KNN makes no

assumptions about the functional form of the problem being solved. As such KNN is referred to
as a non-parametric machine learning algorithm.
• KNN is a remarkably simple algorithm with proven error-rates

• One drawback is that it is not built on any probabilistic framework
• No posterior probabilities of class membership

The k Nearest Neighbour algorithm:
The k Nearest Neighbour algorithm is a way to classify objects with attributes to its nearest
neighbour in the Learning set.
In kNN method, the k nearest neighbours are considered. ”Nearest” is measured as distance in
Euclidean space.
Given a new set of measurements, perform the following test:

Find (using Euclidean distance, for example), the k nearest entities from the training set. These
entities have known labels. The choice of k is left to us.
Among these k entities, the label which is most common, is the label for the unknown entity.
After implementing these classification algorithms, we get the required output i.e; the level of
impact of storm and accuracy with respect to the level of storm.
KNN Model Representation
The model representation for KNN is the entire training dataset.
It is as simple as that.
KNN has no model other than storing the entire dataset, so there is no learning required.
Efficient implementations can store the data using complex data structures like k-d trees to make
look-up and matching of new patterns during prediction efficient.
Because the entire training dataset is stored, you may want to think carefully about the
consistency of your training data. It might be a good idea to curate it, update it often as new data
becomes available and remove erroneous and outlier data.
Making Predictions with KNN
KNN makes predictions using the training dataset directly.

Predictions are made for a new instance (x) by searching through the entire training set for the K
most similar instances (the neighbors) and summarizing the output variable for those K
instances. For regression this might be the mean output variable, in classification this might be
the mode (or most common) class value.
To determine which of the K instances in the training dataset are most similar to a new input a
distance measure is used. For real-valued input variables, the most popular distance measure
is Euclidean distance.
Euclidean distance is calculated as the square root of the sum of the squared differences between
a new point (x) and an existing point (xi) across all input attributes j.
EuclideanDistance(x, xi) = sqrt( sum( (xj – xij)^2 ) )
Euclidean is a good distance measure to use if the input variables are similar in type (e.g. all
measured widths and heights). Manhattan distance is a good measure to use if the input variables
are not similar in type (such as age, gender, height, etc.).
The value for K can be found by algorithm tuning. It is a good idea to try many different values
for K (e.g. values from 1 to 21) and see what works best for your problem.
The computational complexity of KNN increases with the size of the training dataset. For very
large training sets, KNN can be made stochastic by taking a sample from the training dataset
from which to calculate the K-most similar instances.
KNN has been around for a long time and has been very well studied. As such, different
disciplines have different names for it, for example:
KNN for Regression
When KNN is used for regression problems the prediction is based on the mean or the median of
the K-most similar instances.

KNN for Classification
When KNN is used for classification, the output can be calculated as the class with the highest
frequency from the K-most similar instances. Each instance in essence votes for their class and
the class with the most votes is taken as the prediction.
Class probabilities can be calculated as the normalized frequency of samples that belong to each
class in the set of K most similar instances for a new data instance. For example, in a binary
classification problem (class is 0 or 1):
p(class=0) = count(class=0) / (count(class=0)+count(class=1))
If you are using K and you have an even number of classes (e.g. 2) it is a good idea to choose a
K value with an odd number to avoid a tie. And the inverse, use an even number for K when you
have an odd number of classes.
Ties can be broken consistently by expanding K by 1 and looking at the class of the next most
similar instance in the training dataset.
6.2 Test Cases

A test case is a document, which has a set of test data, preconditions, expected results and post
conditions, developed for a particular test scenario in order to verify compliance against a specific
requirement.
From the above table, the efficiency of SVM is greater than the efficiency of KNN. Hence on
comparing the above two algorithms,we say that SVM algorithm is more efficient than KNN.
Therefore we use SVM algorithm to predict the level of storm for the parameters given by the
user.
Test Cases:

Test Case 1
Mag: 10, Length: 67, Width: 999, Date: 26/03/2018
Figure 24: Input 1
Output:

Figure 25: Output 1
Figure 26: ROC curve 1

Test Case 2
Mag: 1, Length: 5.1, Width: 70, Date: 26/03/2018
Figure 27: Input 2
Output:

Figure 28: Output 2
Figure 29: ROC curve 2

CHAPTER-7

7. SCREENSHOTS
7.1 Application logo
Figure 30: Logo
7.2 Flask setup
Figure 31: Flask setup

7.3 Directories, Folders, Files and Packages
Figure 32: Directories, Folders, Files and Packages

7.4 Main script
Figure 33: Main script
7.5 Running script
Figure 34: Running script

7.6 Running Console
Figure 35: Running console
7.7 Input Data
Figure 36: Input data

7.8 Generated Output
Figure 37: Predicted output
Figure 38: SVM ROC curve

FUTURE ENHANCEMENTS
The accuracy of the existing system will be made approximately up to 100% to provide more
accurate information about level of storm.
The anomalies with respect to null input values can be eliminated to predict correct level of storm.
Providingbrief data about amount of estimated loss in dollars, number of injuries and fatalities that
can take place due to impact of the storm to educate users about theconsequences caused by storms.

CONCLUSION
Various techniques were used to predict the storm, based on observation of environmental and
meteorological elements such as clouds, sunlight and animal behavior. These forecasts were not
often very scientific or accurate.
However, in recent years, with the advancement in technology, it has been possible to forecast
storms correctly using machine learning techniques namely Support Vector Machine (SVM) and
K-Nearest Neighbor (KNN).
In our project, we implemented these algorithms to predict the level of storm based on parameters
like magnitude, length and width of the storm and determine the roc curve for the best working
algorithm.

REFERENCES
1. https://www.ncdc.noaa.gov/stormevents/ftp.jsp
2. https://www.kaggle.com/jtennis/spctornado/data
3. https://machinelearningmastery.com/
4. http://stackabuse.com/using-machine-learning-to-predict-the-weather-part-1/
5. http://www.spc.noaa.gov/wcm/data/SPC_severe_database_description.pdf

BIBLIOGRAPHY
1. Mastering Machine Learning with scikit-learn by GavinHackeling
2. Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurelien Geron
3. Data Preprocessing by Jun Du
4. Learning TensorFlow by Tom Hope, Yehezkel S. Resheff & Italy Lieder

Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Main

Uploaded by

Copyright:

Available Formats

CHAPTER-1

KMIT STROM FORECSTING 1

1.1 Purpose of the Project

1.2 Problem with the Existing System

KMIT STROM FORECSTING 2

1.4 Scope of the Project

KMIT STROM FORECSTING 3

Figure 1: Architecture Diagram

KMIT STROM FORECSTING 4

Date in yyyy-mm-dd format

KMIT STROM FORECSTING 5

State – Two-letter postal abbreviation (PR= Puerto Rico, VI=Virgin Islands)

11-(f), or (sz), or (mag)

KMIT STROM FORECSTING 6

Starting latitude in decimal degrees

Starting longitude in decimal degrees

Ending latitude in decimal degrees

Ending longitude in decimal degrees

KMIT STROM FORECSTING 7

This tornado segment information can be thought of as follows:

22- ns= Number of states affected by this tornado: 1, 2 or 3

24- sg= Tornado Segment number: 1, 2, or -9(1=entire track info)

25-(f1) 1st County FIPS code

26-(f2) 2nd County FIPS code

27-(f3) 3rd County FIPS code

KMIT STROM FORECSTING 8

Magnitude is only used for wind data. EG=estimated gust.

MS=measured sustained, ES=estimated sustained (started in 2006).

fc=0 for unaltered (E)F-scale rating.

fc=1 for previous rating was -9(unknown)

F –scale rating of -9 (unknown)

* No F =-9 tornado records met the 8, 9 property loss criteria

KMIT STROM FORECSTING 9

F0: 1038 tornadoes (55.5%)

F1: 742 tornadoes (40.1%)

F2: 26 tornadoes (1.3%)

F3: 52 tornadoes (2.7%)

F4: 6 tornadoes (0.3%)

1864 F =-9 records were changed/modified between 1953 and 1982

Very low/VL (mag=0)

KMIT STROM FORECSTING 10

KMIT STROM FORECSTING 11

The SRS phase consists of two basic activities:

KMIT STROM FORECSTING 12

2.1.1 Functional Requirements

KMIT STROM FORECSTING 13

 What data should be given as input?

KMIT STROM FORECSTING 14

Performance objectives include the following:

 Response time or latency

KMIT STROM FORECSTING 15

2.2.1 Operating System : Windows 7/8/10

2.2.2 Platform: PyCharm Community Version

2.2.3 Programming Language : Python, HTML

2.2.4 Front End : Flask

2.2.5 External files : Source Code of the application from

2.3 Hardware Requirements

2.3.1 Processor : Intel I3/I5

2.3.2 Hard Disk : 64GB

2.3.3 RAM : 4GB or more

KMIT STROM FORECSTING 16

KMIT STROM FORECSTING 17

5. Test Construction and Verification