You are on page 1of 18

Introduction to Machine Learning

DATA COLLECTION, STATISTICAL AND NETWORK ANALYSIS

INSTRUCTOR: MR. SUNEEL SARSWAT

MADE BY: BUSHRA KAMBO


ROLL NO: 6

COURSE: PGDDS 2017-2018


INSTITUTE: NISM & UNIVERSITY OF MUMBAI
INTRODUCTION
• Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience
without being explicitly programmed.

• Machine learning focuses on the development of computer programs that


can access data and use it learn for themselves.

• The process of learning begins with observations or data, such as examples,


direct experience, or instruction, in order to look for patterns in data and
make better decisions in the future based on the examples that we provide.

• The primary aim is to allow the computers learn automatically without


human intervention or assistance and adjust actions accordingly.
Some machine learning methods
• Machine learning algorithms are often categorized as supervised or
unsupervised.

• Supervised machine learning algorithms can apply what has been learned
in the past to new data using labeled examples to predict future events.
Starting from the analysis of a known training dataset, the learning
algorithm produces an inferred function to make predictions about the
output values. The system is able to provide targets for any new input after
sufficient training. The learning algorithm can also compare its output with
the correct, intended output and find errors in order to modify the model
accordingly.
• In contrast, unsupervised machine learning algorithms are used when the
information used to train is neither classified nor labeled. Unsupervised
learning studies how systems can infer a function to describe a hidden
structure from unlabeled data. The system doesn’t figure out the right
output, but it explores the data and can draw inferences from datasets to
describe hidden structures from unlabeled data.

• Semi-supervised machine learning algorithms fall somewhere in between


supervised and unsupervised learning, since they use both labeled and
unlabeled data for training – typically a small amount of labeled data and a
large amount of unlabeled data. The systems that use this method are able
to considerably improve learning accuracy. Usually, semi-supervised learning
is chosen when the acquired labeled data requires skilled and relevant
resources in order to train it / learn from it. Otherwise, acquiringunlabeled
data generally doesn’t require additional resources.
• Reinforcement machine learning algorithms is a learning method that
interacts with its environment by producing actions and discovers errors or
rewards. Trial and error search and delayed reward are the most relevant
characteristics of reinforcement learning. This method allows machines and
software agents to automatically determine the ideal behavior within a
specific context in order to maximize its performance. Simple reward
feedback is required for the agent to learn which action is best; this is known
as the reinforcement signal.

• Machine learning enables analysis of massive quantities of data. While it


generally delivers faster, more accurate results in order to identify profitable
opportunities or dangerous risks, it may also require additional time and
resources to train it properly. Combining machine learning with AI and
cognitive technologies can make it even more effective in processing large
volumes of information.
ABSTRACT

• This presentation is prepared to explain the analysis done on the project data
and the output.

• The analysis includes the type of data used, the patterns identified in the
data, the data when represented as an output in various forms, different
types of results / insights extracted from the data and the high level
explanation of the output data results.
FACEBOOK DATA
SOURCE DATA:
• -When we connect with the facebook data center using the API and retrieve the
user data, the first format that we get the data in is the raw format.

• - This data is stored in a variable to process further, below is an output of such


data.

• - This raw data is in a python dictionary data type. A deeper analysis of the data
reveals that the dictionary contains key and value pairs of data, but certain
values have 2 or 3 levels of data, so these value pairs are further stored as sub-
dictionaries in a list in the main dictionary.

• - Such is the complexity of the raw data available. **(Briefly explained in report)
CLEANED AND ARRANGED DATA FOR FURTHER USE:

• - The data shown in the above step is difficult to gauge in one go. It is
understandable to professionals only. For using in further analysis, the data
must be cleansed, structured and arranged.

• - Only the useful part is to be extracted from the data that makes sense,
provide impactful insights on analysis and useful visualizations.

• - Below is a sneak peek into the data that is worked upon as per the above-
mentioned points:
• - A certain code was made in python to achieve this data from the complex
dictionary tree shown in step 1.

• -This code has been explained in the solution analysis phase.

• - Also, data in step 1 contained a lot if information, in a way the user’s entire
facebook profile information.

• -So we have to cut down to the information that we want to analyse and then
work towards achieving it in the structured array.

**(Briefly explained in report)


REPRESENTATION AS A GRAPH CHART FOR STATISTICAL ANALYSIS

• - Now that we have some meaningful data to analyse, we can perform various
steps to do so. One such step performed here is the statistical analysis using a pie
chart.

• - From the data above in step 2, let us understand what is it that we aim to analyse
here through the chart.

• - One entity of facebook data is the number of friends of a person.

• - So for the first user, the number of friends has been charted down in the graph.
The labels are the names of those friends. For a pie chart, the measure by how
much each pie must weigh must be provided. But we cannot do that just for a list
of friends.
• - To add this weight to each pie in the pie chart, we have derived a level 2 of
measurement. This is the relative number of friends each friend of the user
has.

• - Using a command in the pie chart plotter, we can assign the numbers to
each label as a “relative percentage” of the other.

• - This is like the concept of “percentile”.

• - Using this technique, we can gauge the number of friends of a friend of a


user and… how much more friends does that person have, as compared to
other friends of the user,
To understand deeply, let
us look at the graph
output first:
Next we look at the data, we list the - So the plot function will give the data with the highest
number the highest percentage, relative to others.
number of friends for each person This way, percentile weight of each person is calculated.
- Total of all the people =
• Aditya Prabhu – 949 949 + 352 + 853 + 1412 + 621 + 2788 + 448 + 0 + 836 = 8259

• Sudhir Jain – 352 Hence, percentile ratio for each person


• Sunny Sharma – 853 is as follows:
• Keyur Golani – 1412 • Aditya Prabhu – 949/8259 = 11.5%
• Sudhir Jain – 352/8259 = 4.3%
• Nupur Desai – 621 • Sunny Sharma – 853/8259 = 10.3%
• Kunal Patel – 2788 • Keyur Golani – 1412/8259 = 17.1%
• Monika Asawa – 448 • Nupur Desai – 621/8259 = 7.5%
• Kunal Patel – 2788/8259 = 33.8%
• Parth Makwana – 0
• Monika Asawa – 448/8259 = 5.4%
• Avadhesh Patel – 836 • Parth Makwana – 0/8259 = 0%
• Avadhesh Patel – 836/8259 = 10.1%
• - The pie is formed because of this weight. This provides us a very important
insight of data analysis in percentile view.
REPRESENTATION OF THE NETWORK GRAPH – NODES AND EDGES PLOT

• - This data is fetched using a facebook API that has all its data stored as
graphs. The graphs are made up of nodes and edges. Hence, as part of our
analysis, we will plot this connection of nodes and edges.
• - The plot is developed using a python code. It is shown below:
• - The nodes represent various entities. And the edges show the relation between these entities.

• - As you can see here, “me”, who is the user under analysis is at the center.

• - Since we are analysing the relationship with the facebook friends of the user, each friend that
we saw in the earlier sections is represented as a node.
• - The red dot is the node, with name of the person displayed on it.

• - As a next step, the graph shows the connection between the user “me” and each of his friends.
This is shown using a line joining the two red nodes.

• - This overall representation of connection of the nodes is called a network graph of the user
“me”.

• - It also has several output parameters like number of nodes, number of edges, average degree,
etc.
CONCLUSION

• Hence, we saw at the various sections of the project and the data used at
each one of them.
• Also, we saw the output at these modules data processing part and
meaning of various terms associated with them. We took a detailed path
of the data analysis process for this project.
Thank You

You might also like