Professional Documents
Culture Documents
INTRODUCTION
Data Mining
Classification
A classification task begins with a data set in which the class assignments are
known. For example, a classification model that predicts credit risk could be developed
based on observed data for many loan applicants over a period of time. In addition to the
historical credit rating, the data might track employment history, home ownership or
rental, years of residence, number and type of investments, and so on. Credit rating would
1
be the target, the other attributes would be the predictors, and the data for each customer
would constitute a case.
Regression
A Regression task begins with a data set in which the target values are known. For
example, a regression model that predicts house values could be developed based on
observed data for many houses over a period of time. In addition to the value, the data
might track the age of the house, square footage and number of rooms and so on.
House value would be the target, the other attributes would be the predictors, and the data
for each house would constitute a case.
Clustering
Clustering is the process of making group of abstract objects into classes of similar
objects.
While doing the cluster analysis, we first partition the set of data into groups based on
data similarity and then assign the label to the groups.
Summarization
2
level, summary information, instead of detailed calls, is presented to the sales manager of
customer analysis.
Dependency Modeling
CLUSTERING
Clustering is the process of dividing the data elements into classes or clusters so
that the items in the same class are as similar as possible, and the items in different
classes are as dissimilar as possible. We divide them depending on the time series i.e., the
behavior of the user.
Clustering analysis finds clusters of data objects that are similar in some sense to
one another. The members of a cluster are more like each other than they are like
members of other clusters. The goal of clustering analysis is to find high-quality clusters
such that the inter-cluster similarity is low and the intra-cluster similarity is high.
Clustering, like classification, is used to segment the data. Unlike classification,
clustering models segment data into groups that were not previously defined.
Classification models segment data by assigning it to previously-defined classes, which
are specified in a target. Clustering models do not use a target.
3
Clustering is useful for exploring data. If there are many cases and no obvious
groupings, clustering algorithms can be used to find natural groupings. Clustering can
also serve as a useful data pre-processing step to identify homogeneous groups to build
supervised models.
Requirements of clustering
In data mining, efforts have focused on finding methods for efficient and effective
cluster analysis in larger database. Active themes of research focus on:
Scalability: Many clustering algorithms work well on small data sets containing
fewer than several hundred data objects; however a large database may contain
millions of objects. Highly scalable clustering algorithms are needed.
Ability to deal with different types of attributes: Many algorithms are designed
to cluster interval-based data. However, applications may require clustering for
other types of data, such as binary, nominal, and ordinary data or mixtures of
these data types.
Discovery of clusters with arbitrary shapes: Many clustering algorithms
determine cluster based on Euclidean or Manhattan distance measures.
Algorithms based on such distance measures tend to find spherical clusters with
similar size and density. However, a cluster could be of any shape. It is important
to develop algorithms that can detect clusters of arbitrary shape.
4
Advantages of Clustering
Disadvantages of Clustering
5
1.2 Existing System
1.2.1 Disadvantages
The K-mean algorithm cannot handle the high dimensional datasets such as
image, videos etc.,
Traditional fuzzy ARM algorithms are not able to mine the large amount of
data efficiently.
The traditional algorithms are work with very less number of attributes/
dimensions.
Classification of the large amount of data is also very difficult.
6
1.3 Proposed System
1.3.1 Description of Proposed System
The proposed system is an extension to the existing system which improves the
Fuzzy ARM algorithm and builds the frame work of fuzzy based clustering for good
accurate results.
After that we perform the clustering by using Fuzzy c-mean. Fuzzy c-means is a
method of clustering which allows one piece of data to belong to two or more clusters.
This method is frequently used in pattern recognition. It is based on minimization of the
objective function.
SURF Algorithm:
7
scenes, object tracking and extraction of points of interest. This algorithm is part of that
artificial intelligence, able to train a system to interpret images and determine the content.
Detection:
The SURF algorithm is based on the same principles and steps off SIFT, but it
uses a different scheme and should provide better results: it works much faster. In order
to detect characteristic points on a scale invariably SIFT approach it uses cascaded filters,
where the difference Gaussian, DOG, is calculated on rescaled images progressively.
Integral Image
Instead of using Gaussian averaging the image, squares are used (approximation).
Making the convolution of the image with a square is much faster if the integral image is
used. The integral image is defined as:
𝑗≤𝑦
S(x,y) = ∑𝑖≤𝑥
𝑖=0 ∑𝑗=0 𝐼 (𝑖, 𝑗)
The sum of the original image within a rectangle D image can be evaluated
quickly using this integrated image. I(x, y) added over the selected area requires four
evaluations S(x, y) (A, B, C, D)
SURF uses a BLOB detector based on the Hessian to find points of interest. The
determinant of the Hessian matrix expresses the extent of the response and is an
expression of a local change around the area.
The detector is based on the Hessian matrix, due to its high accuracy. More
precisely, BLOB structures are detected in places where the determining factor is the
maximum. In contrast to the detector Hess - Laplace Mikolajczyk and Schmid, also is
based on the determinant of the Hessian for selecting scale, as it is done by Lindeberg.
Given a point x = (x, y) in an image I, the Hessian matrix H (x, σ) in x at scale σ , is
defined as follows:
8
𝐿𝑥𝑥 (𝑥, 𝜎) 𝐿𝑥𝑦(𝑥, 𝜎)
H(x,σ)=( )
𝐿𝑥𝑦(𝑥, 𝜎) 𝐿𝑦𝑦(𝑥, 𝜎)
The Gaussian filters are optimal for scale space analysis, but in practice should be
quantized and clipped. This leads to a loss of repeatability image rotations around the odd
multiple of π / 4. This weakness is true for Hessian-based detectors in general.
Repeatability of peaks around multiples of π / 2. This is due to the square shape of the
filter. However, the detectors still work well, the discretization has a slight effect on
performance. As real filters are not ideal, in any case, given the success of Lowe with
logarithmic approximations, they push the approximation of the Hessian further matrix
with square filters. These second order Gaussian filters' approximate can be evaluated
with a cost very low with the use of integrated computer images. Therefore, the
calculation time is independent of the filter size. Here are some approaches: Gyy and
Gxy.
The box filters 9x9 are approximations of a Gaussian with σ = 1.2 and represents
the lowest level (higher spatial resolution ) for computerized maps BLOB response. Is
denoted Dxx, Dyy, Dxy . The weights applied to the rectangular regions are maintained
by the efficiency of the CPU.
Det(Haprox)=DxxDyy-(wDxy)2
The relative weighting (w) of the filter response is used to balance the expression
for the Hessian determinant. It is necessary for the conservation of energy between
Gaussian kernels and Gaussian kernels approximate.
9
||𝐿𝑥𝑦 (1.2)||𝐹||𝐷𝑦𝑦 (9)||𝐹
W= 𝐿𝑦𝑦 (1.2)||𝐹||𝐷𝑥𝑦(9)||𝐹
0.9 factor appears such a correction factor using squares instead of gaussians. it can
generate several images det (H) for several filter sizes . This is called multi- resolution
analysis.
The approximation of the determinant of the Hessian matrix representing the response
BLOB image on location x . These responses are stored in the BLOB response map on
different scales.
The attractions can be found in different scales, partly because the search for
correspondences often requires comparison images where they are seen at different
scales. The scale spaces are generally applied as a pyramid image . Images are repeatedly
smoothed with a Gaussian filter, then, is sub sampled to achieve a higher level of the
pyramid. Therefore, several floors or stairs "det H" with various measures of the masks
are calculated:
10
The scale -space is divided into a number of octaves, Where an octave refers to a
series of response maps of covering a doubling of scale . In SURF The Lowest level of
the Scale- space is Obtained from the output of the 9 × 9 filters.
Scale spaces are implemented by applying box filters of different size. Therefore,
the scale space is analyzed by up-scaling the filter size rather than iteratively reducing the
image size. The output of the above 9*9 filter is considered as the initial scale layer, to
which we will refer as scale s=1.2 (corresponding to Gaussian derivatives withσ=1.2).
The following layers are obtained by filtering the image with gradually bigger masks,
taking into account the discrete nature of integral images and the specific structure of or
filters. Specifically, this results in filters of size 9*9, 15*15, 21*21, 27*27, etc. In order
to localize interest points in the image and over scales, non-maximum suppression in a
3*3*3 neighborhood is applied. The maxima of the determinant of the Hessian matrix are
then interpolated in scale and image space with the method proposed by Brown et al.
Scale space interpolation is especially important in our case, as the difference in scale
between the first layers of every octave is relatively large.
11
Fuzzy C-means Clustering
In fuzzy clustering, each point has a degree of belonging to clusters, as in fuzzy logic,
rather than belonging completely to just one cluster. Thus, points on the edge of a cluster,
may be in the cluster to a lesser degree than points in the center of cluster. For each point
x we have a coefficient giving the degree of being in the kth cluster uk(x). Usually, the
sum of those coefficients is defined to be 1:
∀𝑥 ∑𝑛𝑢𝑚.𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠
𝑘=1 𝑢𝑘 (x) = 1.
With fuzzy c-means, the centroid of a cluster is the mean of all points, weighted by their
degree of belonging to the cluster:
∑𝑥 𝑢𝑘(𝑥)𝑚 𝑥
Centerk = ∑ 𝑚
𝑥 𝑢𝑘 (𝑥)
The degree of belonging is related to the inverse of the distance to the cluster center:
1
Uk(x) = 𝑑(𝐶𝑒𝑛𝑡𝑒𝑟,𝑥)
Then the coefficients are normalized and fuzzy fied with a real parameter m > 1 so that
their sum is 1. So
For m equal to 2, this is equivalent to normalizing the coefficient linearly to make their
sum 1. When m is close to 1, then cluster center closest to the point is given much more
weight than the others, and the algorithm is similar to k-means.
12
The fuzzy c-means algorithm is very similar to the k-means algorithm:
Compute the centroid for each cluster, using the formula above. For each point, compute
its coefficients of being in the clusters, using the formula.
The algorithm minimizes intra-cluster variance as well, but has the same problems as k-
means, the minimum is a local minimum, and the results depend on the initial choice of
weights. The Expectation-maximization algorithm is a more statistically formalized
method which includes some of these ideas: partial membership in classes. It has better
convergence properties and is in general preferred to fuzzy-c-means.
1.3.3 Advantages:
We are using SURF algorithm which is several times faster than the SIFT
algorithm.
SURF algorithm is used to detect structures or very significant points in an image
for a discriminating description of these areas from its neighbouring points.
The recognition of images or objects, is one of the most important applications of
computer vision, becomes a part of local descriptors SIFT.
Unlike k-means where data point must exclusively belong to one cluster center
here data point is assigned membership to each cluster center as a result of which
data point may belong to more than one cluster center.
Gives best result for overlapped data set and comparatively better than k-means
algorithm.
13
2. LITERATURE SURVEY
14
association rules can also be derived from high-dimensional numerical datasets,
like image datasets, in order to train fuzzy associative classifiers or clustering
algorithms. Traditional Fuzzy ARM algorithms are not able to mine rules from them
efficiently, since such algorithms are meant to deal with datasets with relatively much
less number of attributes/dimensions. Hence, FAR-HD which is a Fuzzy ARM
algorithm designed specifically for large high-dimensional datasets. FAR-HD processes
fuzzy frequent item sets in a DFS manner using a two-phased multiple-partition tidlist-
based strategy. It also uses a byte-vector representation of tidlists, with the tidlists
stored in the main memory in a Compressed form (using a fast generic compression
method). Additionally, FAR-HD uses Fuzzy Clustering to convert each numerical vector
of the original input dataset to a fuzzy-cluster based representation, which is ultimately
used for the actual Fuzzy ARM process. FAR-HD has been compared experimentally
with Fuzzy Apriori (7–15 times faster), which is the most popular Fuzzy ARM algorithm.
The important features of FAR-HD are that it uses a two phased processing technique,
and a tidlist approach for calculating the frequency of item sets. It also uses a
generic compression algorithm (zlib) to compress tidlists while processing them in order
to fit more tidlists in the same amount of memory allocated/available. zlib provides
very good compression ratio on all kinds of data and datasets. The distinctive feature of
datasets with high dimensions is that they have association rules with many items. i.e. the
average rule length is very high. In order to deal with such association rules, the item set
generation and processing in FAR-HD is done in a DFS like fashion as in ARMOR, as
opposed to BFS-like in Apriori, which is optimized for large datasets with less number of
attributes/dimensions.
2.4 SIFT:
For any object there are many features, interesting points on the object that can be
extracted to provide a “feature” description of the object. This description can then be
used when attempting to locate the object in an image containing many other objects.
There are many considerations when extracting these features and how to record them.
SIFT image features provide a set of features of an object that are not affected by many
of the complications experienced in other methods, such as object scaling and rotation.
While allowing for an object to be recognised in a larger image SIFT image features also
allow for objects in multiple images of the same location, taken from different positions
within the environment, to be recognised. SIFT features are also very resilient to the
effects of “noise” in the image.
The SIFT approach, for image feature generation, takes an image and transforms it into a
large collection of local feature vectors. Each of these feature vectors is invariant to any
scaling, rotation or translation of the image. To aid the extraction of these features the
SIFT algorithm applies a 4 stage filtering approach:
This stage of the filtering attempts to identify those locations and scales that are
identifiable from different views of the same object. This can be efficiently
achieved using a “scale space” function. Further it has been shown under
reasonable assumptions it must be based on the Gaussian function. The scale
space is defined by the function:
16
L(x, y, σ) = G(x, y, σ) * I(x, y)
Various techniques can then be used to detect stable keypoint locations in the
scale-space. Difference of Gaussians is one such technique, locating scale-space
extrema, D(x, y, σ) by computing the difference between two images, one with
scale k times the other. D(x, y, σ) is then given by:
To detect the local maxima and minima of D(x, y, σ) each point is compared with
its 8 neighbours at the same scale, and its 9 neighbours up and down one scale. If
this value is the minimum or maximum of all these points then this point is an
extrema.
This stage attempts to eliminate more points from the list of keypoints by finding
those that have low contrast or are poorly localised on an edge. This is achieved
by calculating the Laplacian.
If the function value is below a threshold value then this point is excluded. This
removes extrema with low contrast. To eliminate extrema based on poor
localisation it is noted that in these cases there is a large principle curvature across
the edge but a small curvature in the perpendicular direction in the defference of
Gaussian function. If this difference is below the ratio of largest to smallest
eigenvector, from the 2x2 Hessian matrix at the location and scale of the
keypoint, the keypoint is rejected.
17
Step 3: Orientation Assignment
This step aims to assign a consistent orientation to the keypoints based on local
image properties. The keypoint descriptor, described below, can then be
represented relative to this orientation, achieving invariance to rotation. The
approach taken to find an orientation is:
The local gradient data, used above, is also used to create keypoint descriptors.
The gradient information is rotated to line up with the orientation of the keypoint
and then weighted by a Gaussian with variance of 1.5 * keypoint scale. This data
is then used to create a set of histograms over a window centred on the keypoint.
These resulting vectors are known as SIFT keys and are used in a nearest-
neigbours approach to identify possible objects in an image.
18
3. SYSTEM ANALYSIS
A requirement is a feature that the system must have or a constraint that it must to
be accepted by the client. Requirement engineering aims at defining the requirements of
the system under construction .Requirement engineering include two main activities,
requirement elicitation, which results in the specification of the system that the client
understands, and analysis which in analysis model that the developer can unambiguously
interpret. A requirement is a statement about what the proposed system will do.
Requirements can be divided into two major categories: Functional requirements and
Non Functional requirements.
Functional requirements describe the interactions between the system and its
environment independent of its implementation. The environment includes the user and
any other external system with which the system interacts. Functional requirements
capture the intended to behavior of the system, this behavior may be expressed as
services, tasks or functions the system is required to perform.
Consider the two images such that image2 is similar that of image1.Now we have
to find the interesting points from that image by applying SURF algorithm. SURF
algorithm is used for the object reorganization. The standard version of SURF is several
times faster than SIFT.
19
After that we perform the clustering by using Fuzzy Cmean. In fuzzy clustering,
each point has a degree of belonging to cluster, as in fuzzy logic rather than belonging to
just one cluster. FAR-HD is the process which is used for high dimensional datasets to
generate frequent item sets. Traditional Fuzzy ARM algorithms have failed to mine rules
from high-dimensional data efficiently, since those are meant to deal with relatively
much less number of attributes so we use the FAR-HD which processes frequent item
sets using a two-phased multiple-partition approach especially for large high-dimensional
datasets. By using FAC we did the classification of the above clusters.
Non-functional requirements describe the aspects of the system that are not
directly related to the functional behavior of the system. Non-functional requirements
include a broad variety of requirements that apply to many different aspects of the
system, from usability to performance.
20
3.2 Object Oriented Analysis
In the case of object oriented analysis the process is varies. But these two are
identical at use case analysis. Actually the steps involved in the analysis phase are
System models
• Scenarios
UML Diagrams
21
3.2.1 Use-Case Diagram
An important part of the Unified Modeling Language (UML) is the facilities for
drawing use case diagrams. Use cases are used during the analysis phase of a project to
identify and partition system functionality. They separate the system into actors and use
cases. Actors represent roles that can play by users of the system. Those users can be
humans, other computers, pieces of hardware, or even other software systems. Use cases
describe the behavior of the system.
22
System You can draw a rectangle around the use cases,
System
Image1
Image2(similar to Image1)
Apply SURF
Apply FCM
Administrator
Apply FAR-HD
23
3.3 System Requirements
3.3.2Software Requirements
24
4. SYSTEM DESIGN
4.1 Introduction
Purpose
The purpose of the class diagram is to model the static view of an application .The
class diagrams are only diagrams which can be directly mapped with object oriented
languages and thus widely used at the time of construction.
The UML diagrams like activity diagram, sequence diagram can only give the
sequence flow of the application but class diagram is a bit different. So it is the most
popular UML diagram in the coder community .So the purpose of the class diagram can
be summarized as:
25
Forward and reverse engineering.
Active Class
Active classes initiate and control the flow of activity, while passive classes store
data and serve other classes. Illustrate active classes with a thicker border.
Visibility
Use visibility markers to signify who can access the information which is in a
class. Private visibility hides information from anything outside the class partition. Public
visibility allows all other classes to view the marked information. Protected visibility
allows child classes to access information which is in inherited from a parent class.
Associations
Multiplicity (Cardinality)
Constraint
26
Generalization
In our class diagram classes are Admin, SURF algorithm, Fuzzy c-mean, Feature
detector, descriptor extractor, feature 2D,SURF. The responsibility of admin is to load the
images. Feature detector have the methods such as detect and detect implementation
which are used to detect the features in an image and to implement. Descriptor extractor
compute these feature points and implement the computed values. By using SURF we
implement the detect and compute values. After we get the interesting points we make
them clusters.
27
Admin Fuzzy C-mean
Apply SURF algorithm
+Detect() +Compute()
#Detect Implementation() #Compute Implementation()
Feature 2D
SURF
~Detect Implementation()
~Compute Implementation()
28
4.1.2 Sequence Diagram
29
Activation Activation modeled as rectangular boxes
on the lifeline indicate when the object is
performing an action.
In our sequence diagram we have four objects admin, apply SURF, Fuzzy c-
mean, Fuzzy association. Firstly Admin upload two images. After that apply the SURF
algorithm and we get interesting points and after that we apply the Fuzzy c-mean so that
the similar points are grouped into a clusters and we apply the FAR-HD algorithm to
generate the frequent item sets.
30
Administrator Apply SURF Fuzzy Clustering Fuzzy Association
1 : Upload Image1()
2 : Upload Image2()
5 : Apply FCM()
7 : Apply FAR-HD()
31
4.1.3 Activity Diagram
32
Final state An activity may have more than one final
node. The first one reached stops all flow in
the activity.
In our activity diagram the flow is started from loading the two similar
images, later we appply the SURF algorithm to get the interesting points. After we have
to apply FCM and we get the clusterd values later we apply the FAR-HD to generate the
frequent item sets
33
START
Apply SURF
Apply FCM
Apply FAR-HD
34
4.1.4 StateChart Diagram
35
In our state chart diagram we have four states starting from idle state and
later we upload the images and apply the SURF algorithm after that we get the interesting
points and later we apply the FCM to make the clusters.
Idle
Apply FCM
entry/Apply clustering
exit/Clusters are generated
36
4.2Architecture Design
System architecture is the conceptual model that defines the structure, behavior,
and more views of a system. An architecture description is a formal description and
representation of a system, organized in a way that supports reasoning about
the structures of the system. System architecture can comprise system components, the
externally visible properties of those components, the relationships (e.g. the behavior)
between them.
An architectural design is the design of the entire software system; it gives a high-
level overview of the software system, such that the reader can more easily follow the
more detailed descriptions in the later sections. It provides information on the
decomposition of the system into modules (classes), dependencies between modules,
hierarchy and partitioning of the software modules.
Interesting Points
FCM
37
Representation of SURF: In this section first we consider high dimensional data
sets such as images, videos etc. by applying SURF algorithm we get the interesting
points. Traditionally we use SIFT to get the Interesting points. But the SURF is
extension of the SIFT which is several times faster than the SIFT.
Fuzzy C-Mean: The Fuzzy C-Mean (FCM) algorithm is commonly used for
clustering the performance of the FCM algorithm depends on the selection of initial
cluster. If the initial cluster is good then the final cluster can be found very quickly
and the processing time can be drastically reduced. It is a data clustering technique
in which a dataset is grouped into n clusters with every data point in the dataset
belonging to every cluster to a certain degree. Consider an example, a certain data
point that lies close to the center of a cluster will have a high degree of belonging to
that cluster and another data point that lies far away from the center of a cluster will
have a low degree of belonging to that cluster.
38
4.3 User Interface Design
Later, a user was provided the ability to interact with a computer online and the user
interface was a nearly blank display screen with a command line, a keyboard, and a set of
commands and computer responses that were exchanged. This command line interface
led to one in which menus (list of choices written in text) predominated. And, finally, the
graphical user interface (GUI) arrived, originating mainly in Xerox's Palo Alto Research
Centre, adopted and enhanced by Apple Computer, and finally effectively standardized
by Microsoft in its Windows operating systems.
The user interface can arguably include the total "user experience," which may include
the aesthetic appearance of the device, response time, and the content that is presented to
the user within the context of the user interface.
39
5. IMPLEMENTATION
About MATLAB
MATLAB has an excellent set of graphic tools. Plotting a given data set or the
results of computation is possible with very few commands. You are highly encouraged
to plot mathematical functions and results of analysis as often as possible. Trying to
understand mathematical equations with graphics is an efficient way of learning
mathematics. Being able to plot mathematical functions and data freely is the most
important step.
Key Features
40
Mathematical functions for linear algebra, statistics, Fourier analysis, filtering,
optimization, numerical integration, and solving ordinary differential equations
Built-in graphics for visualizing data and tools for creating custom plots
Development tools for improving code quality and maintainability and
maximizing performance
Tools for building applications with custom graphical interfaces
Advantages
A very large (and growing) database of built-in algorithms for image processing
and computer vision applications
MATLAB allows you to test algorithms immediately without recompilation. You
can type something at the command line or execute a section in the editor and
immediately see the results, greatly facilitating algorithm development.
The MATLAB Desktop environment, which allows you to work interactively
with your data, helps you to keep track of files and variables, and simplifies
common programming/debugging tasks
The ability to read in a wide variety of both common and domain-specific image
formats.
The ability to call external libraries, such as OpenCV
Clearly written documentation with many examples, as well as online resources
such as web seminars ("webinars").
Bi-annual updates with new algorithms, features, and performance enhancements
If you are already using MATLAB for other purposes, such as simulation,
optimation, statistics, or data analysis, then there is a very quick learning curve
for using it in image processing.
The ability to process both still images and video.
Technical support from a well-staffed, professional organization (assuming your
maintenance is up-to-date)
A large user community with lots of free code and knowledge sharing
41
The ability to auto-generate C code, using MATLAB Coder, for a large (and
growing) subset of image processing and mathematical functions, which you
could then use in other environments, such as embedded systems or as a
component in other software.
MATLAB is a software development environment that offers high-performance
numerical computation, data analysis, visualization capabilities and application
development tools.
MATLAB’s built-in graphing tools and GUI builder ensure that you customise
your data and models to help you interpret your data more easily for quicker
decision making.
42
5.2 Sample Source Code
function ipts=OpenSurf(img,Options)
% ,scaling and noise. It can be used in the same way as SIFT (Scale-invariant
% inputs,
% (optional)
% outputs,
% Ipts : A structure with the information about all detected Landmark points
43
% Add subfunctions to Matlab Search path
functionname='OpenSurf.m';
functiondir=which(functionname);
functiondir=functiondir(1:end-length(functionname));
addpath([functiondir '/SubFunctions'])
% Process inputs
defaultoptions=struct('tresh',0.0002,'octaves',5,'init_sample',2,'upright',false,'extended',fal
se,'verbose',false);
if(~exist('Options','var')),
Options=defaultoptions;
else
tags = fieldnames(defaultoptions);
for i=1:length(tags)
end
if(length(tags)~=length(fieldnames(Options))),
end
end
44
% Create Integral Image
iimg=IntegralImage_IntegralImage(img);
FastHessianData.thresh = Options.tresh;
FastHessianData.octaves = Options.octaves;
FastHessianData.init_sample = Options.init_sample;
FastHessianData.img = iimg;
ipts = FastHessian_getIpoints(FastHessianData,Options.verbose);
if(~isempty(ipts))
end
45
Sample Code to get Intresting Points in Two Similar Images
% Load images
I1=im2double(imread('TestImages/11KD1A0549.jpg'));
I2=im2double(imread('TestImages/11KD1A0550.jpg'));
Options.upright=true;
Options.tresh=0.0001;
Ipts1=OpenSurf(I1,Options);
Ipts2=OpenSurf(I2,Options);
D1 = reshape([Ipts1.descriptor],64,[]);
D2 = reshape([Ipts2.descriptor],64,[]);
err=zeros(1,length(Ipts1));
cor1=1:length(Ipts1);
46
cor2=zeros(1,length(Ipts1));
distance=sum((D2-repmat(D1(:,i),[1 length(Ipts2)])).^2,1);
[err(i),cor2(i)]=min(distance);
end
[err, ind]=sort(err);
cor1=cor1(ind);
cor2=cor2(ind);
Pos1=[[Ipts1(cor1).y]',[Ipts1(cor1).x]'];
Pos2=[[Ipts2(cor2).y]',[Ipts2(cor2).x]'];
Pos1=Pos1(1:30,:);
Pos2=Pos2(1:30,:);
no1=(numel(Pos1));
no2=(numel(Pos2));
47
I(:,1:size(I1,2),:)=I1; I(:,size(I1,2)+1:size(I1,2)+size(I2,2),:)=I2;
Pos1(:,3)=1; Pos2(:,3)=1;
M=Pos1'/Pos2';
functionname='OpenSurf.m';
functiondir=which(functionname);
functiondir=functiondir(1:end-length(functionname));
addpath([functiondir '/WarpFunctions'])
I1_warped=affine_warp(I1,M,'bicubic');
48
% Show the result
figure,
49
Sample Code for Clustering
maxU = max(U);
'none','marker', 'o','color','g');
line(Pos1(index2,1),Pos1(index2,2),'linestyle',...
'none','marker', 'x','color','r');
hold on
plot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)
plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)
50
Sample Code for Wraping of Images
function Iout=affine_warp(Iin,M,mode)
% inputs,
% output,
[x,y]=ndgrid(0:size(Iin,1)-1,0:size(Iin,2)-1);
% mean= size(Iin)/2;
%xd=x-mean(1);
51
%yd=y-mean(2);
xd=x;
yd=y;
switch(mode)
case 0
Interpolation='bilinear';
Boundary='replicate';
case 1
Interpolation='bilinear';
Boundary='zero';
case 2
Interpolation='bicubic';
Boundary='replicate';
otherwise
Interpolation='bicubic';
Boundary='zero';
52
end
Iout=image_interpolation(Iin,Tlocalx,Tlocaly,Interpolation,Boundary);
% inputs,
% Interpolation:
% Boundary:
% (optional)
% outputs,
53
if(~isa(Iin,'double')), Iin=double(Iin); end
switch(lower(Interpolation))
case 'nearest'
xBas0=round(Tlocalx);
yBas0=round(Tlocaly);
case 'bilinear'
xBas0=floor(Tlocalx);
yBas0=floor(Tlocaly);
xBas1=xBas0+1;
yBas1=yBas0+1;
tx=Tlocalx-xBas0;
ty=Tlocaly-yBas0;
perc0=(1-tx).*(1-ty);
perc1=(1-tx).*ty;
perc2=tx.*(1-ty);
perc3=tx.*ty;
54
case 'bicubic'
xBas0=floor(Tlocalx);
yBas0=floor(Tlocaly);
tx=Tlocalx-xBas0;
ty=Tlocaly-yBas0;
55
% Determine 1D neighbour coordinates
otherwise
end
switch(lower(Interpolation))
case 'nearest'
check_xBas0=(xBas0<0)|(xBas0>(size(Iin,1)-1));
check_yBas0=(yBas0<0)|(yBas0>(size(Iin,2)-1));
xBas0=min(max(xBas0,0),size(Iin,1)-1);
yBas0=min(max(yBas0,0),size(Iin,2)-1);
case 'bilinear'
check_xBas0=(xBas0<0)|(xBas0>(size(Iin,1)-1));
check_yBas0=(yBas0<0)|(yBas0>(size(Iin,2)-1));
check_xBas1=(xBas1<0)|(xBas1>(size(Iin,1)-1));
check_yBas1=(yBas1<0)|(yBas1>(size(Iin,2)-1));
xBas0=min(max(xBas0,0),size(Iin,1)-1);
yBas0=min(max(yBas0,0),size(Iin,2)-1);
56
xBas1=min(max(xBas1,0),size(Iin,1)-1);
yBas1=min(max(yBas1,0),size(Iin,2)-1);
case 'bicubic'
check_xn0=(xn0<0)|(xn0>(size(Iin,1)-1));
check_xn1=(xn1<0)|(xn1>(size(Iin,1)-1));
check_xn2=(xn2<0)|(xn2>(size(Iin,1)-1));
check_xn3=(xn3<0)|(xn3>(size(Iin,1)-1));
check_yn0=(yn0<0)|(yn0>(size(Iin,2)-1));
check_yn1=(yn1<0)|(yn1>(size(Iin,2)-1));
check_yn2=(yn2<0)|(yn2>(size(Iin,2)-1));
check_yn3=(yn3<0)|(yn3>(size(Iin,2)-1));
xn0=min(max(xn0,0),size(Iin,1)-1);
xn1=min(max(xn1,0),size(Iin,1)-1);
xn2=min(max(xn2,0),size(Iin,1)-1);
xn3=min(max(xn3,0),size(Iin,1)-1);
yn0=min(max(yn0,0),size(Iin,2)-1);
yn1=min(max(yn1,0),size(Iin,2)-1);
yn2=min(max(yn2,0),size(Iin,2)-1);
yn3=min(max(yn3,0),size(Iin,2)-1);
end
57
Iout=zeros([ImageSize(1:2) lo]);
Iin_one=Iin(:,:,i);
switch(lower(Interpolation))
case 'nearest'
intensity_xyz0=Iin_one(1+xBas0+yBas0*size(Iin,1));
switch(lower(Boundary))
case 'zero'
intensity_xyz0(check_xBas0|check_yBas0)=0;
otherwise
end
Iout_one=intensity_xyz0;
case 'bilinear'
intensity_xyz0=Iin_one(1+xBas0+yBas0*size(Iin,1));
intensity_xyz1=Iin_one(1+xBas0+yBas1*size(Iin,1));
58
intensity_xyz2=Iin_one(1+xBas1+yBas0*size(Iin,1));
intensity_xyz3=Iin_one(1+xBas1+yBas1*size(Iin,1));
switch(lower(Boundary))
case 'zero'
intensity_xyz0(check_xBas0|check_yBas0)=0;
intensity_xyz1(check_xBas0|check_yBas1)=0;
intensity_xyz2(check_xBas1|check_yBas0)=0;
intensity_xyz3(check_xBas1|check_yBas1)=0;
otherwise
end
Iout_one=intensity_xyz0.*perc0+intensity_xyz1.*perc1+intensity_xyz2.*perc2+intensity
_xyz3.*perc3;
case 'bicubic'
Iy0x0=Iin_one(1+xn0+yn0*size(Iin,1));Iy0x1=Iin_one(1+xn1+yn0*size(Iin,1));
Iy0x2=Iin_one(1+xn2+yn0*size(Iin,1));Iy0x3=Iin_one(1+xn3+yn0*size(Iin,1));
Iy1x0=Iin_one(1+xn0+yn1*size(Iin,1));Iy1x1=Iin_one(1+xn1+yn1*size(Iin,1));
59
Iy1x2=Iin_one(1+xn2+yn1*size(Iin,1));Iy1x3=Iin_one(1+xn3+yn1*size(Iin,1));
Iy2x0=Iin_one(1+xn0+yn2*size(Iin,1));Iy2x1=Iin_one(1+xn1+yn2*size(Iin,1));
Iy2x2=Iin_one(1+xn2+yn2*size(Iin,1));Iy2x3=Iin_one(1+xn3+yn2*size(Iin,1));
Iy3x0=Iin_one(1+xn0+yn3*size(Iin,1));Iy3x1=Iin_one(1+xn1+yn3*size(Iin,1));
Iy3x2=Iin_one(1+xn2+yn3*size(Iin,1));Iy3x3=Iin_one(1+xn3+yn3*size(Iin,1));
switch(lower(Boundary))
case 'zero'
Iy0x0(check_yn0|check_xn0)=0;Iy0x1(check_yn0|check_xn1)=0;
Iy0x2(check_yn0|check_xn2)=0;Iy0x3(check_yn0|check_xn3)=0;
Iy1x0(check_yn1|check_xn0)=0;Iy1x1(check_yn1|check_xn1)=0;
Iy1x2(check_yn1|check_xn2)=0;Iy1x3(check_yn1|check_xn3)=0;
Iy2x0(check_yn2|check_xn0)=0;Iy2x1(check_yn2|check_xn1)=0;
Iy2x2(check_yn2|check_xn2)=0;Iy2x3(check_yn2|check_xn3)=0;
Iy3x0(check_yn3|check_xn0)=0;Iy3x1(check_yn3|check_xn1)=0;
Iy3x2(check_yn3|check_xn2)=0;Iy3x3(check_yn3|check_xn3)=0;
otherwise
end
60
Iout_one=vec_qy0.*(vec_qx0.*Iy0x0+vec_qx1.*Iy0x1+vec_qx2.*Iy0x2+vec_qx3.*Iy0x
3)+...
vec_qy1.*(vec_qx0.*Iy1x0+vec_qx1.*Iy1x1+vec_qx2.*Iy1x2+vec_qx3.*Iy1x3)+...
vec_qy2.*(vec_qx0.*Iy2x0+vec_qx1.*Iy2x1+vec_qx2.*Iy2x2+vec_qx3.*Iy2x3)+...
vec_qy3.*(vec_qx0.*Iy3x0+vec_qx1.*Iy3x1+vec_qx2.*Iy3x2+vec_qx3.*Iy3x3);
end
Iout(:,:,i)=reshape(Iout_one, ImageSize);
end
61
6. TESTING
For the testing to be successful, proper selection of the test case is essential.
There are two different approaches for selecting test case. The software or the module to
be tested is treated as a black box, and the test cases are decided based on the
specifications of the system or module. For this reason, this form of testing is also called
“black box testing”.
The focus here is on testing the external behavior of the system. In structural
testing, the test cases are decided based on the logic of the module to be tested. A
common approach here is to achieve some type of coverage of the statements in the
code. The two forms of testing are complementary: one tests the external behavior, the
other tests the internal structure. Often structural testing is used for lower levels of
testing, while functional testing is used for higher levels.
62
plan. This plan identifies all testing related activities that must be performed and
specifies the schedule, allocates the resources, and specifies guidelines for testing. The
test plan specifies conditions that should be tested; different units to be tested, and the
manner in which the module will be integrated together. Then for different test unit, a
test case specification document is produced, which lists all the different test cases,
together with the expected outputs, that will be used for testing. During the testing of the
unit the specified test cases are executed and the actual results are compared with the
expected outputs. The final output of the testing phase is the testing report and the error
report, or a set of such reports. Each test report contains a set of test cases and the result
of executing the code with the test cases. The error report describes the errors
encountered and the action taken to remove the error.
Testing is a process, which reveals the errors in a program. It is the major quality
measure employed during software development. During testing, the program is executed
with a set of conditions known as test cases and the output is evaluated to determine
whether the program is performing as expected. In order to make sure that the system
does not have errors, the different levels of testing strategies are applied at differing
phases of software development are as follows.
Unit Testing is done on individual modules as they are completed and become
executable. It is confined only to the designer's requirements.
63
6.1.2 Each module can be tested using the following two strategies
Internal system design is not considered in this type of testing. Tests are based on
the requirements and the functionality. This testing is used to find the errors in the
following categories:
Integration testing ensures that the software and the subsystems work together as
a whole. It tests the interface of all the modules to make sure that the modules behave
properly or not when integrated together.
64
6.1.4 System Testing
It involves in-house testing of the entire system before the delivery to the user.
Its aim is to satisfy the user and the system that meets all the requirements of the client's
specifications.
It is a pre-delivery testing in which the entire system is tested at the client's site
on the real world data to find errors.
The system is tested and implemented successfully and thus ensured that all the
requirements as listed in the software requirements specification are completely
fulfilled. In case of erroneous input corresponding error messages are displayed.
It was a good idea to do our stress testing early, because it gives us time to fix
some of the unexpected exceptions and stability problems that only occur when the
components were exposed to very high transaction volumes.
The successful output screens are placed in the output screens section.
65
6.2 Test cases
66
7.SCREEN SHOTS
67
7.3 Generating Frequent Itemsets
68
7.5 Analysis of Similarity between Images
69
7.6 Getting of Interesting Points for Different Images
70
7.8 Generating Frequent Itemsets
71
7.10 Analysis of Similarity between Images
72
8.CONCLUSION
This project provides a solution to handle the high dimensional data such as
images into clusters. Fuzzy mechanism is used to arrange only the similar data into
clusters.
The purpose of current study was to present a method to cluster high dimensional
data efficiently. The association rule mining was studied for very large high dimensional
data in the image domain. The SURF algorithm is capable of finding interesting points
from two similar image datasets and we are using some of the clustering techniques to the
datasets to increase the speed and efficiency in large high dimensional data. Now we are
considered image as data in future we can also apply with videos also.
73
REFERENCES
74