Professional Documents
Culture Documents
Main
Main
This project is aimed at predicting how strong a storm is and what impact storms have on the
affected place. At the core of forecasting work is climatology, the study of climates and how they
change. A basic understanding of how storm works relies on the historical weather record and a
pretty good understanding of the time of year when parts of the country are at the greatest risk and
which areas of the middle and southern United States are affected. Usually when warm, moist air
left over from winter cyclones meets winds from the jet streams; it creates high winds, tornadoes
and dangerous hail.
The analysis starts from understanding system, the intention and requirements to testing the basic
main functionality and drilling down all the way to many details until all the possible issues are
discovered.
Testing process includes aspects like identifying, documenting, reviewing, scripting and execution
for predicting the level of impact of storm.
The accuracy of the existing system is not up to 100%. But this system can predict values which
are approximate up to 80% to true values.
In the existing system, there are anomalies with respect to null input values i.e., the output is
improper when null, negative or irrational values are inserted for prediction of level of storm.
The prediction of estimated loss in dollars, number of injuries occurred and fatalities that can take
place due to the storm are not being displayed.
In the proposed system, the anomalies respect to null input values are eliminated and correct level
of storm is prediction. The accuracy of the existing system is enhanced so that the output is
approximately up to 100%.
The data about prediction of estimated loss in dollars, number of injuries occurred and fatalities
that can take place due to the storm are given to the user.
This application uses different classification algorithms to predict the level of storm based on
predictor values namely magnitude, length and width of the storm. This also provides information
on the accuracy with which level of storm is predicted.
The different types of classification algorithms are Support Vector Classifier, Random forest
Classifier and K-nearest neighbor Classifier.
Description
1-(om)
Tornado number- A count of tornadoes during the year: Prior to 200, these numbers were assigned
to the tornado as the information arrived in the NWS database. Since 2007, the numbers may have
been assigned in sequential (temporal) order after event date/times are converted to CST. However,
do not use “om” to count the sequence of tornadoes through the year as sometimes new entries
have come in late, or corrections are made, and the data are not re-sequenced.
NOTE: Tornado segments that cross state borders and for more than 4 countries will have same
OM number.
2-(yr)
Year, 1950-2009
3-(mo)
Month, 1-12
4-(dy)
Day, 1-31
5-(date)
Time in HH:MM:SS
7-(tz)
Tine zone – All times, except for? =unknown and 9=GMT, were converted to 3=CST. This should
be accounted for when building queries for GMT summaries such as 12z- 12z.
8-(st)
9-(stf)
State FIPS number (Note some Puerto Rico codes are incorrect)
10-(stn)
State number – number of this tornado, in this state, in this year: May not be sequential in some
years.
NOTE: discontinued in 2008. This number can be calculated in a spreadsheet by sorting and after
accounting for border crossing tornadoes and 4+ county segments.
F-scale (EF-scale after Jan. 2007): values -9, 0, 1, 2, 3, 4, 5 (-9=unknown). Or, hail size in inches.
Or, wind speed in knots (1 knot=1.15 mph).
12-(in)
Injuries - when summing for state totals use sn=1, not sg=1
13-(fat)
Fatalities - when summing for state totals use sn=1, not sg=1
Estimated property loss information- Prior to 1996 this is a categorization of tornado damage by
dollar amount(0 or blank-unknown;1<$50,2=$50-$500,3=$500-$5000,4=$5000-$50,000,
5=$50,000-$500,000, 6=$500,000-$5,000,000, 7=$5,000,000-$50,000,000, 8=$50,000,000-
$500,000,000, 9=5000,000,000.)when summing for state total use sn=1, not sg=1. From 996, this
is tornado property damage in millions of dollars. Note: this may change to whole dollar amounts
in the future. Entry of 0 does not mean $0.
15-(closs)
Estimated crop loss in millions of dollars (started in 2007). Entry of 0 does not mean $0.
16-(slat)
17-(slon)
18-(elat)
19-(elon)
20-(len)
Length in miles
21-(wid)
Width in yards
Understanding these fields is critical to counting state tornadoes, totaling state fatalities/losses.
23- sn= State Number: 1 or) (1= entire track info in this state)
1, 1, 1 = Entire record for the track of the tornado (unless all 4 fips codes are non-zero)
1, 0, -9 = Continuing county fips code information only from 1, 1, 1 record, above (same om)
2, 0, 1 = A two-state tornado (st=state of touchdown, other fields summarize entire track)
2, 1, 2 = First state segment for a two-state (2, 0, 1) tornado (same state as above, same om)
2, 1, 2 = Second state segment for a two-state (2, 0, 1) tornado (state tracked into, same om)
2, 0, -9 = Continuing county fips for a 2, 1, 2 record that exceeds 4 counties (same om)
3, 0, 1 = A three-state tornado (st=state of touchdown, other fields summarize entire track)
3, 1, 2 = First state segment for a three-state (3, 0, 1) tornado (state same as 3, 0, 1, same om)
3, 1, 2 = Second state segment for a three-state (3, 0, 1) tornado (2nd state tracked into, same om
as the initial 3, 0, 1 record)
3, 1, 2 = Third state segment for a three-state (3, 0, 1) tornado (3rd state tracked into, same om as
the initial 3, 0, 1 record)
28-(f4) 4th County FIPS code – Additional counties will be included in sg=-9 records with same
om number
Tornado database file updated to add “fc” filed for estimated F-scale rating in 2016. Valid for
records altered between 1950-1982.
29-(fc)
Between 1953 and 1982, 1864 CONUS tornadoes were coded in the official database with an
The table below explains how these tornado records were modified to provide an estimated F –
scale rating. All changed records are identified in the database by the “fc” field (fc1 if the F –scale
was changed from -9 to another value, fc=0, all unchanged F –scales)
IF property loss in Then set F –scale IF path length <=5 IF path length >5
equal to: equal to: miles add: miles add:
0,1 (<$50) 0 0 +1
2,3 (up to $5K) 1 -1 +1
4,5 (up to $500K) 2 -1 +1
6,7 (up to $50M) 3 -1 +1
8,9 (up to $5B)* 4 -1 +1
F5: None
30-(imp)
Level of storm
Low/L (mag=1)
Medium/M (mag=2)
Serious/S (mag=3)
High/H (mag=4)
Extreme/E (mag=5)
Software Requirement Specification (SRS) is the starting point of the software developing activity.
As system grew more complex it became evident that the goal of the entire system cannot be easily
comprehended. Hence the need for the requirement phase arose. The software project is initiated
by the client needs. The SRS is the means of translating the ideas of the minds of clients (the
input) into a formal document (the output of the requirement phase.)
1) Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the problem, the goal
and constraints.
2) Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as representation,
specification languages and tools, and checking the specifications are addressed during this
activity.
The Requirement phase terminates with the production of the validate SRS document. Producing
the SRS document is the basic goal of this phase.
Role of SRS
The purpose of the Software Requirement Specification is to reduce the communication gap
between the clients and the developers. Software Requirement Specification is the medium though
which the client and user needs are accurately specified. It forms the basis of software
development. A good SRS should satisfy all the parties involved in the system.
A Software Requirements Specification (SRS) is a document that describes the nature of a project,
software or application. In simple words, SRS document is a manual of a project provided it is
prepared before you kick-start a project/application. This document is also known by the names
SRS report, software document. A software document is primarily prepared for a project, software
or any kind of application.
There are a set of guidelines to be followed while preparing the software requirement specification
document. This includes the purpose, scope, functional and nonfunctional requirements, software
and hardware requirements of the project. In addition to this, it also contains the information about
environmental conditions required, safety and security requirements, software quality attributes of
the project etc.
The purpose of SRS (Software Requirement Specification) document is to describe the external
behavior of the application developed or software. It defines the operations, performance and
interfaces and quality assurance requirement of the application or software. The complete software
requirements for the system are captured by the SRS.
This section introduces the requirement specification document for Storm Forecasting using
Machine Learning which enlists functional as well as non-functional requirements.
For documenting the functional requirements, the set of functionalities supported by the system
are to be specified. A function can be specified by identifying the state at which data is to be input
to the system, its input data domain, the output domain, and the type of processing to be carried
on the input data to obtain the output data.
Functional requirements define specific behavior or function of the application. Following are the
functional requirements:
2.1.1.2. It is achieved by creating user-friendly screens for data entry to handle large volume of
data. The goal of designing input is to make data entry easier and to be free from errors. The data
entry screen is designed in such a way that all the data manipulations can be performed. It also
provides viewing facilities.
2.1.1.3. When the data is entered, it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that user will not be in maize of
instant. Thus the objective of input design is to create an input layout that is easy to follow.
A non-functional requirement is a requirement that specifies criteria that can be used to judge the
operation of a system, rather than specific behaviors. Especially these are the constraints the
system must work within. Following are the non-functional requirements:
Performance:
The performance of the developed applications can be calculated by using following methods:
Measuring enables you to identify how the performance of your application stands in relation to
your defined performance goals and helps you to identify the bottlenecks that affect your
application performance. It helps you identify whether your application is moving toward or away
from your performance goals. Defining what you will measure, that is, your metrics, and defining
the objectives for each metric is a critical part of your testing plan.
www.spc.noaa.gov /APK
Different organizations have different phases in STLC however generic Software Test Life Cycle
(STLC) for waterfall development model consists of the following phases.
1. Requirements Analysis
2. Test Planning
3. Test Analysis
4. Test Design
8. Post Implementation
In this phase testers analyze the customer requirements and work with developers during the design
phase to see which requirements are testable and how they are going to test those requirements.
It is very important to start testing activities from the requirements phase itself because the cost of
fixing defect is very less if it is found in requirements phase rather than in future phases.
In this phase all the planning about testing is done like what needs to be tested, how the testing
will be done, test strategy to be followed, what will be the test environment, what test
methodologies will be followed, hardware and software availability, resources, risks etc. A high
level test plan document is created which includes all the planning inputs mentioned above and
circulated to the stakeholders.
Usually IEEE 829 test plan template is used for test planning.
3. Test Analysis
After test planning phase is over test analysis phase starts, in this phase we need to dig deeper into
project and figure out what testing needs to be carried out in each SDLC phase.
Automation activities are also decided in this phase, if automation needs to be done for software
product, how will the automation be done, how much time will it take to automate and which
features need to be automated.
Non functional testing areas (Stress and performance testing) are also analyzed and defined in this
phase.
In this phase various black-box and white-box test design techniques are used to design the test
cases for testing, testers start writing test cases by following those design techniques, if automation
testing needs to be done then automation scripts also needs to written in this phase.
In this phase testers prepare more test cases by keeping in mind the positive and negative scenarios,
end user scenarios etc. All the test cases and automation scripts need to be completed in this phase
and got reviewed by the stakeholders. The test plan document should also be finalized and verified
by reviewers.
Once the unit testing is done by the developers and test team gets the test build, The test cases are
executed and defects are reported in bug tracking tool, after the test execution is complete and all
the defects are reported. Test execution reports are created and circulated to project stakeholders.
After developers fix the bugs raised by testers they give another build with fixes to testers, testers
do re-testing and regression testing to ensure that the defect has been fixed and not affected any
other areas of software.
Testing is an iterative process i.e. If defect is found and fixed, testing needs to be done after every
defect fix.
After tester assures that defects have been fixed and no more critical defects remain in software
the build is given for final testing.
In this phase the final testing is done for the software, non functional testing like stress, load and
performance testing are performed in this phase. The software is also verified in the production
kind of environment. Final test execution reports and documents are prepared in this phase.
In this phase the test environment is cleaned up and restored to default state, the process review
meetings are done and lessons learnt are documented. A document is prepared to cope up similar
problems in future releases.
Planning Create high level test plan Test plan, Refined Specification
Design Test cases are revised; select Revised test cases, test data sets,
which test cases to automate sets, risk assessment sheet
Final testing Execute remaining stress and Test results and different metrics
performance tests, complete on test efforts
documentation
1. Python
History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).
Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
Importance of Python
Python is Interpreted − Python is processed at runtime by the interpreter. You do not
need to compile your program before executing it. This is similar to PERL and PHP.
Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Easy-to-learn − Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable − You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
GUI Programming − Python supports GUI applications that can be created and ported to
many system calls, libraries and windows systems, such as Windows MFC, Macintosh,
and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs than shell
scripting.
It can be used as a scripting language or can be compiled to byte-code for building large
applications.
It provides very high-level dynamic data types and supports dynamic type checking.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
scikit-learn - the machine learning algorithms used for data analysis and data mining tasks.
Hypertext Markup Language (HTML) is the standard markup language for creating web
pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of
cornerstone technologies for the World Wide Web. Web browsers receive HTML documents from
a web server or from local storage and render them into multimedia web pages. HTML describes
the structure of a web page semantically and originally included cues for the appearance of the
document.
HTML elements are the building blocks of HTML pages. With HTML constructs, images and
other objects, such as interactive forms, may be embedded into the rendered page. It provides a
means to create structured documents by denoting structural semantics for text such as headings,
paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written
using angle brackets. Tags such as <img /> and <input /> introduce content into the page directly.
Others such as <p>...</p> surround and provide information about document text and may include
other tags as sub-elements. Browsers do not display the HTML tags, but use them to interpret the
content of the page.
HTML can embed programs written in a scripting language such as JavaScript which affect the
behavior and content of web pages. Inclusion of CSS defines the look and layout of content.
The World Wide Web Consortium (W3C), maintainer of both the HTML and the CSS standards,
has encouraged the use of CSS over explicit presentational HTML since 1997.
History of HTML
From 1991 to 1999, HTML developed from version 1 to version 4.
In year 2000, the World Wide Web Consortium (W3C) recommended XHTML 1.0. The XHTML
syntax was strict, and the developers were forced to write valid and "well-formed" code.
In 2004, W3C's decided to close down the development of HTML, in favor of XHTML.
In 2004 - 2006, the WHATWG gained support by the major browser vendors.
Features of HTML:
Web Workers: Certain web applications use heavy scripts to perform functions. Web
Workers use separate background threads for processing and it does not affect the
performance of a web page.
Video: You can embed video without third-party proprietary plug-ins or codec. Video
becomes as easy as embedding an image.
Canvas: This feature allows a web developer to render graphics on the fly. As with video,
there is no need for a plug in.
Application caches: Web pages will start storing more and more information locally on
the visitor's computer. It works like cookies, but where cookies are small, the new feature
allows for much larger files. Google Gears is an excellent example of this in action.
Geolocation: Best known for use on mobile devices, geolocation is coming with
HTML5.
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a
document written in a markup language. Although most often used to set the visual style of web
pages and user interfaces written in HTML and XHTML, the language can be applied to
any XML document, including plain XML, SVG and XUL, and is applicable to rendering
in speech, or on other media. Along with HTML and JavaScript, CSS is a cornerstone technology
used by most websites to create visually engaging webpages, user interfaces for web applications,
and user interfaces for many mobile applications.
Features of CSS:
Eventually, CSS -- along with HTML5 -- are going to be the future of the web. You should begin
making your Web pages compatible with these latest specifications. In this article, I explore 10 of
the exciting new features in CSS 3, which is going to change the way developers who used CSS2
build websites.
Selectors
In addition to the selectors that were available in CSS2, CSS introduces some new selectors. Using
these selectors you can choose DOM elements based on their attributes. So you don't need to
specify classes and IDs for every element. Instead, you can utilize the attribute field to style them.
Rounded Corners
Rounded corner elements can spruce up a website, but creating a rounded corner requires a
designer to write a lot of code. Adjusting the height, width and positioning of these elements
is a never-ending chore because any change in content can break them.
CSS addresses this problem by introducing the border-radius property, which gives you the same
rounded-corner effect and you don't have to write all the code. Here are examples for displaying
rounded corners in different places of a website.
Border Image
Another exciting feature in CSS 3 is the ability to swap out a border with an image. The property
border-image allows you to specify an image to display instead of a plain solid-coloured border.
Browser Dependent
The only major limitation of CSS is that its performance depends largely on browser support.
Besides compatibility, all browsers (and their many versions) function differently. So your CSS
needs to account for all these variations.
However, in case your CSS styling isn’t fully supported by a browser, people will still be able to
experience the HTML functionalities. Therefore, you should always have a well-structured HTML
along with good CSS.
Difficult to retrofit in old websites
The instinctive reaction after learning the many advantages of CSS is to integrate it into your
existing website. Sadly, this isn’t a simple process. CSS style sheets, especially the latest versions,
have to be integrated into the HTML code at the ground level and must also be compatible with
HTML versions. Retrofitting CSS into older websites is a slow tedious process.
There is also the risk of breaking the old HTML code altogether and thus making the site dead.
It’s best to wait till you redesign your website from scratch.
As you can see from above points, the advantages of CSS development outweigh its limitations. It
is a very useful web development tool that every programmer must master along with basic
HTML.
The Unified Modeling Language allows the software engineer to express an analysis model using
the modeling notation that is governed by a set of syntactic, semantic and pragmatic rules.
A UML system is represented using five different views that describe the system from distinctly
different perspective. Each view is defined by a set of diagram, which is as follows:
i. In this model, the data and functionality are arrived from inside the system.
ii. This model view models the static structures.
In this view, the structural and behavioral as parts of the system are represented as they are to be
built.
In this view, the structural and behavioral aspects of the environment in which the system is to be
implemented are represented.
To model a system, the most important aspect is to capture the dynamic behavior. To clarify a bit
in details, dynamic behavior means the behavior of the system when it is running /operating.
So only static behavior is not sufficient to model a system rather dynamic behavior is more
important than static behavior. In UML there are five diagrams available to model dynamic nature
and use case diagram is one of them. Now as we have to discuss that the use case diagram is
dynamic in nature there should be some internal or external factors for making the interaction.
These internal and external agents are known as actors. So use case diagrams are consists of actors,
use cases and their relationships. The diagram is used to model the system/subsystem of an
application. A single use case diagram captures a particular functionality of a system. So to model
the entire system numbers of use case diagrams are used.
Use case diagrams are used to gather the requirements of a system including internal and external
influences. These requirements are mostly design requirements. So when a system is analyzed to
gather its functionalities use cases are prepared and actors are identified.
Sequence diagrams describe interactions among classes in terms of an exchange of messages over
time. They're also called event diagrams. A sequence diagram is a good way to visualize and
validate various runtime scenarios. These can help to predict how a system will behave and to
discover responsibilities a class may need to have in the process of modeling a new system.
The aim of a sequence diagram is to define event sequences, which would have a desired outcome.
The focus is more on the order in which messages occur than on the message per se. However, the
majority of sequence diagrams will communicate what messages are sent and the order in which
they tend to occur.
Class roles describe the way an object will behave in context. Use the UML object symbol to
illustrate class roles, but don't list object attributes.
Activation boxes represent the time an object needs to complete a task. When an object is busy
executing a process or waiting for a reply message, use a thin gray rectangle placed vertically on
its lifeline.
Messages
Messages are arrows that represent communication between objects. Use half-arrowed lines to
represent asynchronous messages.
Asynchronous messages are sent from an object that will not wait for a response from the receiver
before continuing its tasks.
Lifelines
Lifelines are vertical dashed lines that indicate the object's presence over time.
Objects can be terminated early using an arrow labeled "<< destroy >>" that points to an X. This
object is removed from memory. When that object's lifeline ends, you can place an X at the end of
its lifeline to denote a destruction occurrence.
Loops
A repetition or loop within a sequence diagram is depicted as a rectangle. Place the condition for
exiting the loop at the bottom left corner in square brackets [ ].
Guards
When modeling object interactions, there will be times when a condition must be met for a message
to be sent to an object. Guards are conditions that need to be used throughout UML diagrams to
control flow.
Step 1:
Step 2:
import pandas as pd
import numpy as np
import warnings
Step 4:
Create two HTML pages, one for input and one for output (input.html and result.html)
Step 5:
Step 6:
Step 7:
Stop
app = Flask(__name__)
@app.route('/')
def student():
return render_template('input.html')
@app.route('/result', methods=['POST'])
def result():
if (s1 == 0 ):
r1="VERY LOW"
elif (s1 ==1):
r1="LOW"
elif (s1 == 2 ):
r1="MEDIUM"
elif (s1 == 3 ):
r1="STRONG"
elif (s1 == 4):
r1="HIGH"
else:
r1="VERY HIGH"
print(r1)
Knn.getinput(magnitude,length,width)
result2=Knn.knnresult()
s2=result2[0]
a2 = result2[1][0]
if (s2 == 0 ):
r="VERY LOW"
elif (s2 ==1):
r="LOW"
elif (s2 == 2 ):
return render_template("result.html",r1=r1,r=r,a=a,a2=a2)
if __name__ == '__main__':
app.run(debug=True)
<!DOCTYPE html>
<head>
<style>
div{
background-color: #2c198c;
color: white;
font-size:350%;
align:top;
text-align:center;
}
body{
background-color:#FFFFF;
}
<center>
<br><br>
</form>
<br><br><br>
<b>Units:<br></b>
Magnitude in Knots (1 knot=1.15 mph)<br>
Length in miles<br>
Width in yards
</center>
</body>
</html>
<!DOCTYPE html>
<html>
<head>
<style>
div{
background-color: #2c198c;
color: white;
font-size:400%;
align:top;
text-align:center;
}
body{
background-color:#FFFFF;
</center>
</body>
</html>
class Svm():
l1=[]
@staticmethod
def getinput(mag,len,wid):
Svm.l1.append(int(mag))
Svm.l1.append(float(len))
Svm.l1.append(float(wid))
print(Svm.l1)
@staticmethod
def svmresult():
print(Svm.l1)
defadv=pd.read_csv('C:\\Users\\HRIDDHI\\Desktop\\dataset.csv')
number = LabelEncoder()
defadv['imp'] = number.fit_transform(defadv['imp'].astype('str'))
defadv.drop(['mo'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['yr'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['dy'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['time'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['stn'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slat'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slon'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['elat'],axis=1,inplace=True)
defadv[:5]
warnings.filterwarnings("ignore")
df=pd.DataFrame({'Magnitude':defadv.mag,'injuries':defadv.inj,'fatalities':defadv.fat,'loss':de
fadv.loss,'croploss':defadv.closs,'length':defadv.len,'width':defadv.wid})
x=defadv[['mag','len','wid']]
y=defadv['imp']
svc = SVC(probability=True)
svc.fit(x_train,y_train)
yt=[]
yp=[]
for i in y_test :
yt.append(i)
for j in ypred :
yp.append(j)
dec=svc.decision_function(x)
acc=accuracy_score(y_test, ypred)
precision=precision_score(y_test, ypred, average = 'macro')
recall=recall_score(y_test, ypred, average = 'macro')
f1score=f1_score(y_test, ypred, average = 'macro')
l2=[acc,precision,recall,f1score]
cm1=confusion_matrix(y_test,svc.predict(x_test))
print(cm1)
print(l2)
return svc.predict([Svm.l1]),l2,yt,yp
class Knn():
l1=[]
@staticmethod
def getinput(mag, len, wid):
Knn.l1.append(int(mag))
Knn.l1.append(float(len))
Knn.l1.append(float(wid))
@staticmethod
def knnresult():
defadv = pd.read_csv('C:\\Users\\HRIDDHI\\Desktop\\dataset.csv')
number = LabelEncoder()
defadv['imp'] = number.fit_transform(defadv['imp'].astype('str'))
defadv.drop(['mo'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['yr'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['dy'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['time'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['tz'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['stn'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slat'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slon'],axis=1,inplace=True)
defadv.drop(['elat'],axis=1,inplace=True)
defadv[:5]
warnings.filterwarnings("ignore")
df=pd.DataFrame({'Magnitude':defadv.mag,'injuries':defadv.inj,'fatalities':defadv.fat,'loss':de
fadv.loss,'croploss':defadv.closs,'length':defadv.len,'width':defadv.wid})
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(x_train, y_train)
ypred1=neigh.predict(x_test)
print(ypred1)
cm1=confusion_matrix(y_test,neigh.predict(x_test))
print(cm1)
return neigh.predict([Knn.l1]),l2
Libraries imported
Following machine learning libraries are being implemented in this project for testing:
1. sklearn/scikit-learn
Classification report
Classification report is used to evaluate a model’s predictive power. It is one of the most critical
steps in machine learning.
After you have trained and fitted your machine learning model it is important to evaluate the
model’s performance.
Precision
Recall
F1-score
Support
The first step is importing the classification_report library.
from sklearn.metrics import classification_report
Once the library has been imported you can now run the classification report with this Python
command:
print(classification_report(y_test,predictions))
y_test the dependent variable from your test data set. (train-test split of data)
predictions is the data output of your model.
Make sure that the exact arrangement where y_test variable comes before predictions variable in
the Python code is followed.
If not the it will give a wrong model performance which will lead to a wrong evaluation.
Here is a sample output for classification_report:
You can see here that on average the model has predicted 85% of the classification correctly.
For Class 0.0 it has predicted 86% of the test data correctly.
Classification_report is also useful when comparing two models with different specifications
against each other and determining which model is better to use.
Confusion matrix
A confusion matrix is a summary of prediction results on a classification problem. The number of
correct and incorrect predictions are summarized with count values and broken down by each class.
This is the key to the confusion matrix.
1. You need a test dataset or a validation dataset with expected outcome values.
2. Make a prediction for each row in your test dataset.
3. From the expected outcomes and predictions count:
a. The number of correct predictions for each class.
b. The number of incorrect predictions for each class, organized by the class that was predicted.
These numbers are then organized into a table, or a matrix as follows:
Expected down the side: Each row of the matrix corresponds to a predicted class.
Predicted across the top: Each column of the matrix corresponds to an actual class.
The counts of correct and incorrect classification are then filled into the table.
The total number of correct predictions for a class goes into the expected row for that class value
and the predicted column for that class value.
In the same way, the total number of incorrect predictions for a class goes into the expected row
for that class value and the predicted column for that class value.
This matrix can be used for 2-class problems where it is very easy to understand, but can easily be
applied to problems with 3 or more class values, by adding more rows and columns to the
confusion matrix.
ROC curve
The ROC curve stands for Receiver Operating Characteristic curve, and is used to visualize
the performance of a classifier. When evaluating a new model performance, accuracy can be very
sensitive to unbalanced class proportions. The ROC curve is insensitive to this lack of balance in
the data set.
2) Generate actual and predicted values. First let use a good prediction probabilities array:
1 actual = [1,1,1,0,0,0]
2 predictions = [0.9,0.9,0.9,0.1,0.1,0.1]
3) Then we need to calculated the fpr and tpr for all thresholds of the classification. This is where
the roc_curve call comes into play. In addition we calculate the auc or area under the curve which
is a single summary value in [0,1] that is easier to report and use for other purposes. You usually
want to have a high auc value from your classifier.
4) Finally we plot the fpr vs tpr as well as our auc for our very good classifier.
The figure show how a perfect classifier roc curve looks like:
Figure 8: Graph 1
Here the classifier did not make a single error. The AUC is maximal at 1.00. Let’s see what
happens when we introduce some errors in the prediction.
1 actual = [1,1,1,0,0,0]
2 predictions = [0.9,0.9,0.1,0.1,0.1,0.1]
The sklearn.preprocessing package provides several common utility functions and transformer
classes to change raw feature vectors into a representation that is more suitable for the downstream
estimators.
In general, learning algorithms benefit from standardization of the data set. If some outliers are
present in the set, robust scalers or transformers are more appropriate. The behavior of the different
scalers, transformers, and normalizers on a dataset containing marginal outliers is highlighted in
Compare the effect of different scalers on data with outliers.
The preprocessing module further provides a utility class StandardScaler that implements the
Transformer API to compute the mean and standard deviation on a training set so as to be able to
later reapply the same transformation on the testing set. This class is hence suitable for use in the
early steps of a sklearn.pipeline.Pipeline:
Label encoding:
LabelEncoder is a utility class to help normalize labels such that they contain only values
between 0 and n_classes-1. This is sometimes useful for writing efficient Cython routines.
LabelEncoder can be used as follows:
It can also be used to transform non-numerical labels (as long as they are hashable and comparable)
to numerical labels:
sklearn.model_selection.train_test_split(*arrays, **options):
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to
input data into a single call for splitting (and optionally subsampling) data in a oneliner.
Feature selection:
The classes in the sklearn.feature_selection module can be used for feature
selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores
or to boost their performance on very high-dimensional datasets.
Removing features with low variance:
VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose
variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e.
features that have the same value in all samples.
Univariate feature selection works by selecting the best features based on univariate statistical
tests. It can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection
routines as objects that implement the transform method.
2. matplotlib
Matplotlib:
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments across platforms. Matplotlib can be used in
Python scripts, the Python and IPythonshells, the Jupyter notebook, web application servers, and
four graphical user interface toolkits.
Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.
For examples, see the sample plots and thumbnails gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with IPython. For the power user, you have full control of line styles, font properties,
axes properties, etc, via an object oriented interface or via a set of functions familiar to MATLAB
users.
Pyplot :
Pylab combines pyplot with numpy into a single namespace. This is convenient for interactive
work, but for programming it is recommended that the namespaces be kept separate.
3. NumPy
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly
and speedily integrate with a wide variety of databases.
import numpy as np
4. Pandas
import pandas as pd
Library features
The library is highly optimized for performance, with critical code paths written in Cython or C.
Support Vector Machines(SVMs) have been extensively researched in the data mining and
machine learning communities for the last decade and actively applied to applications in various
domains. SVMs are typically used for learning classification, regression, or ranking functions, for
which they are called classifying SVM, support vector regression (SVR), or ranking SVM (or
RankSVM) respectively. Two special properties of SVMs are that SVMs achieve (1) high
generalization by maximizing the margin and (2) support an efficient learning of nonlinear
functions by kernel trick.
Working:
• Family of machine-learning algorithms that are used for mathematical and
engineering problems including for example handwriting digit recognition, object
recognition, speaker identification, face detections in images and target detection.
• Task: Assume we are given a set S of points xi ? R n with i =1,2,..., N. Each point xi
belongs to either of two classes and thus is given a label yi ? {-1,1}. The goal is to
establish the equation of a hyperplane that divides S leaving all the points of the same
class on the same side.
• SVM performs classification by constructing an N-dimensional hyperplane that
optimally separates the data into two categories.
Kernel Functions:
• Kernel Function computes the similarity of two data points in the feature space using
dot product.
• The selection of an appropriate kernel function is important, since the kernel function
defines the feature space in which the training set examples will be classified.
• The kernel expresses prior knowledge about the phenomenon being modeled,
encoded as a similarity measure between two vectors.
• A support vector machine can locate a separating hyperplane in the feature space and
classify points in that space without even representing the space explicitly, simply by
defining a kernel function, that plays the role of the dot product in the feature space.
Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and
C). Now, identify the right hyper-plane to classify star and circle
You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-
plane which segregates the two classes better”. In this scenario, hyper-plane “B”
has excellently performed this job.
Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and
C) and all are segregating the classes well. Now, How can we identify the right hyper-
plane?
Here, maximizing the distances between nearest data point (either class) and hyper-plane
will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look
at the below snapshot:
Above, you can see that the margin for hyper-plane C is high as compared to both A and
B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin
then there is high chance of miss-classification.
Some of you may have selected the hyper-plane B as it has higher margin compared to A. But,
here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior
to maximizing margin. Here, hyper-plane B has a classification error and A has classified all
correctly. Therefore, the right hyper-plane is A.
Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of star lies in the territory of other(circle) class as an
outlier.
As I have already mentioned, one star at other end is like an outlier for star class. SVM has
a feature to ignore outliers and find the hyper-plane that has maximum margin. Hence, we
can say, SVM is robust to outliers.
SVM can solve this problem. Easily! It solves this problem by introducing additional
feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis
x and z:
In SVM, it is easy to have a linear hyper-plane between these two classes. But, another
burning question which arises is, should we need to add this feature manually to have a
hyper-plane. No, SVM has a technique called the kerneltrick These are functions which
takes low dimensional input space and transform it to a higher dimensional space i.e. it
converts not separable problem to separable problem, these functions are called kernels. It
is mostly useful in non-linear separation problem. Simply put, it does some extremely
complex data transformations, then find out the process to separate the data based on the
labels or outputs you’ve defined.
When we look at the hyper-plane in original input space it looks like a circle:
Kernel Parameter:
SVM plays an important role in classification. Here different kernel parameters are
used as a tuning parameter to improve the classification accuracy. There are mainly
four different types of kernels (Linear, Polynomial, RBF, and Sigmoid) that are popular
in SVM classifier.
Gamma Parameter:
Gamma is the parameters for a nonlinear support vector machine (SVM) with a
Gaussian radial basis function kernel. A standard SVM seeks to find a margin that
separates all positive and negative examples. Gamma is the free parameter of the
Gaussian radial basis function.
2. K-Nearest Neighbour
It is as simple as that.
KNN has no model other than storing the entire dataset, so there is no learning required.
Efficient implementations can store the data using complex data structures like k-d trees to make
look-up and matching of new patterns during prediction efficient.
Because the entire training dataset is stored, you may want to think carefully about the
consistency of your training data. It might be a good idea to curate it, update it often as new data
becomes available and remove erroneous and outlier data.
To determine which of the K instances in the training dataset are most similar to a new input a
distance measure is used. For real-valued input variables, the most popular distance measure
is Euclidean distance.
Euclidean distance is calculated as the square root of the sum of the squared differences between
a new point (x) and an existing point (xi) across all input attributes j.
Euclidean is a good distance measure to use if the input variables are similar in type (e.g. all
measured widths and heights). Manhattan distance is a good measure to use if the input variables
are not similar in type (such as age, gender, height, etc.).
The value for K can be found by algorithm tuning. It is a good idea to try many different values
for K (e.g. values from 1 to 21) and see what works best for your problem.
The computational complexity of KNN increases with the size of the training dataset. For very
large training sets, KNN can be made stochastic by taking a sample from the training dataset
from which to calculate the K-most similar instances.
KNN has been around for a long time and has been very well studied. As such, different
disciplines have different names for it, for example:
When KNN is used for regression problems the prediction is based on the mean or the median of
the K-most similar instances.
When KNN is used for classification, the output can be calculated as the class with the highest
frequency from the K-most similar instances. Each instance in essence votes for their class and
the class with the most votes is taken as the prediction.
Class probabilities can be calculated as the normalized frequency of samples that belong to each
class in the set of K most similar instances for a new data instance. For example, in a binary
classification problem (class is 0 or 1):
If you are using K and you have an even number of classes (e.g. 2) it is a good idea to choose a
K value with an odd number to avoid a tie. And the inverse, use an even number for K when you
have an odd number of classes.
Ties can be broken consistently by expanding K by 1 and looking at the class of the next most
similar instance in the training dataset.
From the above table, the efficiency of SVM is greater than the efficiency of KNN. Hence on
comparing the above two algorithms,we say that SVM algorithm is more efficient than KNN.
Therefore we use SVM algorithm to predict the level of storm for the parameters given by the
user.
Test Cases:
Output:
Output:
The accuracy of the existing system will be made approximately up to 100% to provide more
accurate information about level of storm.
The anomalies with respect to null input values can be eliminated to predict correct level of storm.
Providingbrief data about amount of estimated loss in dollars, number of injuries and fatalities that
can take place due to impact of the storm to educate users about theconsequences caused by storms.
However, in recent years, with the advancement in technology, it has been possible to forecast
storms correctly using machine learning techniques namely Support Vector Machine (SVM) and
K-Nearest Neighbor (KNN).
In our project, we implemented these algorithms to predict the level of storm based on parameters
like magnitude, length and width of the storm and determine the roc curve for the best working
algorithm.
1. https://www.ncdc.noaa.gov/stormevents/ftp.jsp
2. https://www.kaggle.com/jtennis/spctornado/data
3. https://machinelearningmastery.com/
4. http://stackabuse.com/using-machine-learning-to-predict-the-weather-part-1/
5. http://www.spc.noaa.gov/wcm/data/SPC_severe_database_description.pdf