You are on page 1of 36

AI for

Future
Workforce
Module 17: Data Import and
Processing
Legal
LegalDisclaimers
Disclaimers
The Intel® Digital Readiness Programs and Intel® AI for Future Workforce program are developed by Intel Corporation.
The Intel® Digital Readiness Programs and Intel® AI for Future Workforce program are developed by Intel Corporation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands
© Intel
may Corporation.
be claimed Intel, theofIntel
as the property logo,
others. Alland other
rights Intel marks
reserved. are trademarks
Program of Intelplans
dates and lesson Corporation or its
are subject subsidiaries. Other names and brands
to change.
may be claimed as the property of others. All rights reserved. Program dates and lesson plans are subject to change.

Intel technologies may require enabled hardware, software, or service activation.


Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.


No product or component can be absolutely secure.

Results have been estimated or simulated.


Results have been estimated or simulated.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Your costs and results may vary.


Your costs and results may vary.

2
Learning Outcomes
At the end of this workshop, you will be able to:
• Demonstrate importing data in Python
• Demonstrate automation of data downloading
• Learn how to use percentile ranges
• Understand boxplots and histograms
• Understand difference between errors and outliers
• Investigate and handle erroneous and missing data

3
Acquiring and
Exploring Data
AI for Future Workforce
Self-Directed Learning
Notebook: Data import and processing

5
Key Takeaways

6
Obtaining Data
• There are numerous sources of data for your machine
learning project. List down some of them!
• List down several ways to download data from internet
to your computer!
• What can Pandas be used for?

7
Sequential Coding Game
Sequential Coding Notebook

8
Sequential Coding Game
• Open “Seq_Coding” notebook
• You will code sequentially in teams of 4
• Each person has 4 minutes of coding

9
Sequential Coding Game
• First member will be called to the computer. He/She will read the task on
the computer and start coding for 4 mins
• After 4 mins, the second member will approach the computer and the first
member will have 30 seconds to explain the task and what he/she is coding
for
• The second member will code for another 4 minutes before the next
member approaches the computer
• The last member of the team is expected to finish the code and understand
the code in the given 4 minutes. He/She will then be given 1 minute to
explain to the class the code that was written by the team

10
Congratulations!

11
Self-Directed Learning
Notebook: Basic Data Processing and Visualization

12
Key Takeaways

13
Basic Data Processing and Visualization
• We can use describe() to learn more about our data. What can we
learn about our data using describe()?
• Boxplot is useful to help visualize the distribution of the data.
What is the percentile value and the interquartile range?
• What is a histogram for?
• What is a scatter plot for?

14
Self-Directed Learning
Notebook: Handling Erroneous and Missing Data

15
Key Takeaways

16
Handling Erroneous and Missing Data
• What is the difference between erroneous data and
outliers?
• How to handle erroneous and missing data?

17
Project (Part 1 of 2)
Its time get into teams of 4 for a project!

18
The Project
The project is for each team to collect, process and present the data they have
obtained

19
Do Plan and Strategize Before Starting to Code
• What topic are you interested in?
• Where do you plan to obtain your data from?
• How would you determine what you need to do to process the
data?
• How would you divide tasks for your team to ensure that you can
complete the task in the time given?

20
What are the Different Levels?
• Level 1: Download dataset of at least 5,000 rows
• Level 2: Explore data and determine how to pre-process data
• Level 3: Pre-process data: Outliers, missing data etc
• Level 4: Present technique used during data pre-processing and
the result
• Level 5: Suggest at least 5 ways how data might be used in an
industry of your choice

21
Half-Time!

22
Each Team Will Share Their Progress
• Which level(s) do you think you have achieved?
• Describe your biggest success so far
• Describe your biggest obstacle that you want to overcome before
your project is ready to be showcased
• Does anyone have similar challenges?
• Would any others like to help/offer advice?

23
Project (Part 2 of 2)

24
The Project
The project is for each team to collect, process and present the data they have
obtained

25
What are the Different Levels?
• Level 1: Download dataset of at least 5,000 rows
• Level 2: Explore data and determine how to pre-process data
• Level 3: Pre-process data: Outliers, missing data etc
• Level 4: Present technique used during data pre-processing and
the result
• Level 5: Suggest at least 5 ways how data might be used in an
industry of your choice

26
Project Presentation

27
Possible Room Layout
10 x 4 tables per team

28
Presentation Sequence for Each Team
1. What dataset did you download?
2. What made you download that particular dataset?
3. What application do you think the dataset can be used for?
4. How did you delegate your job?
5. Which technique did you use to download, process and analyze the data?

29
Congratulations to all teams for
your efforts!

30
Let’s Discuss the Projects!
• What was your approach?
• What challenge did you face?
• How did you overcome it?
• How would you improve your process?

31
Recap
Learning Outcomes
At the end of this workshop, you will be able to:
• Demonstrate importing data in Python
• Demonstrate automation of data downloading
• Learn how to use percentile ranges
• Understand boxplots and histograms
• Understand difference between errors and outliers
• Investigate and handle erroneous and missing data

33
Quiz

Link here

34
Reflection
• How would I apply what I have learnt today beyond the
context of the class?
• What do I need to be careful about when building an
artificial intelligence applications? Are there privacy safety
considerations?
• How do you see what you have learnt today helping out in
the current world industry?

35

You might also like