Professional Documents
Culture Documents
• Supervised machine learning algorithms can apply what has been learned
in the past to new data using labeled examples to predict future events.
Starting from the analysis of a known training dataset, the learning
algorithm produces an inferred function to make predictions about the
output values. The system is able to provide targets for any new input after
sufficient training. The learning algorithm can also compare its output with
the correct, intended output and find errors in order to modify the model
accordingly.
• In contrast, unsupervised machine learning algorithms are used when the
information used to train is neither classified nor labeled. Unsupervised
learning studies how systems can infer a function to describe a hidden
structure from unlabeled data. The system doesn’t figure out the right
output, but it explores the data and can draw inferences from datasets to
describe hidden structures from unlabeled data.
• This presentation is prepared to explain the analysis done on the project data
and the output.
• The analysis includes the type of data used, the patterns identified in the
data, the data when represented as an output in various forms, different
types of results / insights extracted from the data and the high level
explanation of the output data results.
FACEBOOK DATA
SOURCE DATA:
• -When we connect with the facebook data center using the API and retrieve the
user data, the first format that we get the data in is the raw format.
• - This raw data is in a python dictionary data type. A deeper analysis of the data
reveals that the dictionary contains key and value pairs of data, but certain
values have 2 or 3 levels of data, so these value pairs are further stored as sub-
dictionaries in a list in the main dictionary.
• - Such is the complexity of the raw data available. **(Briefly explained in report)
CLEANED AND ARRANGED DATA FOR FURTHER USE:
• - The data shown in the above step is difficult to gauge in one go. It is
understandable to professionals only. For using in further analysis, the data
must be cleansed, structured and arranged.
• - Only the useful part is to be extracted from the data that makes sense,
provide impactful insights on analysis and useful visualizations.
• - Below is a sneak peek into the data that is worked upon as per the above-
mentioned points:
• - A certain code was made in python to achieve this data from the complex
dictionary tree shown in step 1.
• - Also, data in step 1 contained a lot if information, in a way the user’s entire
facebook profile information.
• -So we have to cut down to the information that we want to analyse and then
work towards achieving it in the structured array.
• - Now that we have some meaningful data to analyse, we can perform various
steps to do so. One such step performed here is the statistical analysis using a pie
chart.
• - From the data above in step 2, let us understand what is it that we aim to analyse
here through the chart.
• - So for the first user, the number of friends has been charted down in the graph.
The labels are the names of those friends. For a pie chart, the measure by how
much each pie must weigh must be provided. But we cannot do that just for a list
of friends.
• - To add this weight to each pie in the pie chart, we have derived a level 2 of
measurement. This is the relative number of friends each friend of the user
has.
• - Using a command in the pie chart plotter, we can assign the numbers to
each label as a “relative percentage” of the other.
• - This data is fetched using a facebook API that has all its data stored as
graphs. The graphs are made up of nodes and edges. Hence, as part of our
analysis, we will plot this connection of nodes and edges.
• - The plot is developed using a python code. It is shown below:
• - The nodes represent various entities. And the edges show the relation between these entities.
• - As you can see here, “me”, who is the user under analysis is at the center.
• - Since we are analysing the relationship with the facebook friends of the user, each friend that
we saw in the earlier sections is represented as a node.
• - The red dot is the node, with name of the person displayed on it.
• - As a next step, the graph shows the connection between the user “me” and each of his friends.
This is shown using a line joining the two red nodes.
• - This overall representation of connection of the nodes is called a network graph of the user
“me”.
• - It also has several output parameters like number of nodes, number of edges, average degree,
etc.
CONCLUSION
• Hence, we saw at the various sections of the project and the data used at
each one of them.
• Also, we saw the output at these modules data processing part and
meaning of various terms associated with them. We took a detailed path
of the data analysis process for this project.
Thank You