Professional Documents
Culture Documents
Student ID : 48790983
Report 3
The report shows data from 3 sources and datasets are analyzed at
the same time. With over 300 data entries with all the type being
numerical, it is a structured data as all the sources are directly from
the Australian government website which is accurate and well
organized. Hence the data quality is high. Data size is 195mb and the
file is in pdf format. Attributes with missing data were there so they
have replaced it with “0” and others removed those values. Data
format was changed from months name to months number.
At first data preprocessing was done where they had cut short data
according to their requirements. Data exploration was done for
factors like covid-10, education and age. Ain order to find the
relationships between these factors using linear analysis. Further
analytic methods used her are regression lines, logistic regression
and graphs.
The questions I would like to ask is, what initiatives the government
planned to address unemployment, in what ways can government
support could help reduce unemployment, what challenges do they
anticipate for acquiring a job in their own field, how long and how
sure they are about getting a job after education/any additional skill
required to for employment.
REPORT 4
B. What was the format of the data used? What can you say about
the data size and specific format? Is missing data involved?
In this report, the two different datasets from same sources one
being month by month crime record and other being district wise
crime record are both structured data in a tabular format. From the
district wise data, it allows for comparison not only between months
but also between different districts which can be useful for
identifying patterns or difference in crime rates across different
areas. Using the csv data into python, they have cleared the
formatting. The file format is in pdf and the about 100 data entries-
most of data type is numeric. The Data size is 99.5mb. As the data is
acquired from the government there won’t be any missing data and
data quality is high.
While analyzing this report, its shown that firstly data cleaning was
done using python for the dataset and further merged the month
data with years. With the use of different types of graphs, plots and
map visualization analysis, they have checked if there is any
relationship by making the data confess. At the end, linear regression
model was created for best fit to find a trend of how the number of
assaults cases changes over time.
D. How can such analysis affect decisions? For application problem,
what are the important questions that need to be solved?
These kinds of analysis can affect the initial decision made by other.
After deeply analysis the data and with the relationship found, better
decisions can be made. The most necessary problem that need to be
solved is a way to control the crime rates in the state.