You are on page 1of 5

DOCENA, FRANCIS C.

BSIT 3D

MODULE 1 Answer:

Assessment 1. Introduction to Data Science/ Evolution of Data Science

1. Identify at least five skill areas of a data scientist

Teamwork
Python
Advanced Statistics
Data Visualization
Business Savvy

2. Identify the seven main categories of data.

Nominal
Ordinal
Binary
Count
Time
Interval
Useless

3. Identify the year when the significant events in the evolution of data science took place.

Event Year

Leo Breiman published the paper (Statistical Modeling: the 2001

two cultures) His distinction between a statistical focus on

models that explain the data versus an algorithmic focus on

models that can actually predict the role of a data scientist has
become very broad

The term “data science” came to prominence in discussions of 1990


the need for statisticians to join with computer scientists to
bring mathematical rigor to computational analysis of large
data sets.

C.F. Jeff Wu’s public lecture (Statistics = Data Science) 1997

William S. Cleveland published an action plan for creating a 2001


university department.

Papers by Alan Turing on the topics of computable numbers 1936, 1950


and artificial intelligence were published ( 2 different years)

Assessment 2. Introduction to Data Science (2)

1. List down major differences between Supervised and Unsupervised Machine Learning

Supervised Machine Learning Unsupervised Machine Learning

-The goal of supervised learning is to -The goal of unsupervised learning is

train the model so that it can predict the to find the hidden patterns and useful

output when it is given new data. insights from the unknown dataset

-Supervised learning model takes direct -Unsupervised learning model does not

feedback to check if it is predicting take any feedback.


correct output or not. - Unsupervised learning model finds

the hidden patterns in data.


-Supervised learning model predicts the

output.

In supervised learning, input data is

provided to the model along with the

output.

2. What are the drawbacks of having too much information?

Information overload can lead to many disadvantages such as it can

cause our brain to become less productive, easily get tired and

distracted. There are several ways a student or a researcher can do

to manage information and make a better use of internet resources in

order to avoid information overload.

Module Assessment

1. Identify and discuss the facets of data in Data Science?


- Data science is focused on making sense of complex datasets and in building predictive
models from those data. There are many facets of data science, including;

 Cleaning, filtering, reorganizing, augmenting, and aggregating data

 Visualizing data

 Data analysis, statistics, and modeling

2. Among the data scientists, who do you think has the greatest contribution in the
existence of data science? Support your answer with a brief explanation.

- Over the past few years, there’s been a lot of hype in the media about “data science”.

Geoffrey Hinton has the greatest contribution in the existence of data science because

Geoffrey Hilton is called the Godfather of Deep Learning in the field of data science. Mr.

Hinton is best known for his work on neural networks and artificial intelligence. A Ph.D.

in artificial intelligence, he is accredited for his exemplary work on neural nets.

3. What is data science, and what are the skills needed for you to be data scientist.
- A Data Scientist is responsible for compiling and analyzing large data sets — both
structured and unstructured. These roles combine math, statistics, and computer
science skills to make sense of big data and then use the information to create business
solutions.

4. Enumerate the 4 V’s in big data, and expound why data science in essential?

Velocity

- is accelerating. Streams of tweets, Facebook posts, financial data, and other data are
being generated at an ever-increasing rate by more individuals. While velocity increases
data volume (sometimes enormously), it also has the potential to shorten the data
retention or application window.
- Variety is much greater than ever before. As processing power has increased, models
that formerly relied on only a few variables now have access to hundreds of them.

- Volume You may have heard on more than one occasion that Big Data is nothing more
than business intelligence, but in a very large format. More data, however, does not
necessarily mean it is Big Data.Obviously, the Big Data, needs a certain amount of data,
but having a huge amount of data, does not necessarily mean that you are working on
Big Data.
-
- Varacity This V will refer to both data quality and availability. When it comes to
traditional business analytics, the source of the data is going to be much smaller in both
quantity and variety. However, the organization will have more control over them, and
their veracity will be greater.

You might also like