Professional Documents
Culture Documents
Pavlos Antoniou
Data Science job ads
• Indeed.com is the biggest job
site in the U.S
• Most job ads in data science
involve Python & SQL
• People frequently compare R &
Python, but when it comes to
getting a data science job,
there are only half as many for
R as for Python.
– Statisticians → R
– Machine Learning people → Python
Source: https://www.kdnuggets.com/2019/06/data-science-jobs-report.html
Why Python?
• One of the best languages used by data scientists for various data
science projects/applications
• Open source, high level language
• Simple syntax, easy to use
– Python’s syntax, or the words and symbols used in order to make a program
are simple and intuitive. They're basically English words!
– Easy to learn even without having any programming background
• Provides great libraries to deal with data:
– import, store, handle, visualize
– perform statistical analysis
– train algorithms, build models to cluster or predict quantities (outside of the
scope of the current course)
Fundamental knowledge pillars
• Understand data types
– integers, strings, floating point numbers, booleans, lists, dictionaries, tuples,
sets
• Learn loops and conditionals
– Loops execute a block of code several times and conditionals tell the
program when to stop executing that block of code.
• Learn data structures for handling data
– Read data from files, perform basic statistical analysis, filter, slice, select,
sort, fill missing values, aggregate
• Learn useful libraries for visualizing data
Course Schedule
• Lecture 1: Setting up your Python Environment – Install Anaconda
• Lecture 2: Introduction to Jupyter
• Lectures 3-4: Python Standard Library (data types, conditionals, loops, built-in
functions)
• Lecture 5: Exercises on Python Standard Library
• Lecture 6: Introduction to NumPy
• Lecture 7: Introduction to SciPy
• Lecture 8: Exercises on NumPy and SciPy
• Lectures 9-10: Introduction to Pandas (Data Manipulation)
• Lecture 11: Exercises on Pandas
• Lectures 12-13: Introduction to Matplotlib, Seaborn & Plotly (Data Visualization)
• Lectures 14: Introduction to Timeseries
• Lecture 15: Exercises on Timeseries (Data Handling and Visualization)
This Lecture
• Set up a Python computing environment for scientific computing
• There are two main ways people set up Python for scientific
computing on their own machine:
1. By downloading the official Python distribution and installing package by
package with tools like apt-get, pip, etc.
2. By downloading and installing a Python distribution that contains binaries of
many of the scientific packages needed. The major distributions of these
are Anaconda and Enthought Canopy.
• In this course, we will use Anaconda, with its associated package
manager, conda. It has become the de facto package
manager/distribution for scientific use.
Available Python Distributions
• Official Python website: https://www.python.org/
– Available for Windows, macOS, Unix: https://www.python.org/downloads/
– Includes basic data types and associated functions to operate on them
– However, … you need to install important Data Science packages (numpy,
pandas, sklearn) using pip (package management system)
• Anaconda Data Science Platform: https://www.anaconda.com/
– Free for individuals, open-source, easy-to-install Python distribution with
over a 1500 packages (including the package management system conda)
and a GUI named Anaconda Navigator
• Through Anaconda Navigator you can gain access to some very useful applications
such as Jupyter Notebook and Spyder IDE
– Powered by Anaconda company, but with free community support
Install Anaconda
• Step 1: Go to https://www.anaconda.com
• Step 2: Download the version of Anaconda targeting your Operating
System (prefer 64 bit). For all available installers see here:
https://www.anaconda.com/products/distribution#Downloads
Install Anaconda
• Step 3: Double-click on the executable file
– To get the installation of Anaconda started on your operating system open
the executable file in your Download folder
• Step 4: Click Next
Install Anaconda
• Step 5: Click I agree to the terms and conditions
Install Anaconda
• Step 6: Select Who You Want To Give Anaconda To
– This step will ask you if you want to install Anaconda just for you or for all
the users using this PC. Click “Just-Me”, or “All users”, depending on your
preference. Both options will do, but to select “all users” you will need admin
privileges.
Install Anaconda
• Step 7: Select the installation location
– Make sure that you have at least the right amount of space available to
install Anaconda subdirectory comparing it to the space required.
– Type the following command and hit the Enter key “python --version”
– If nothing happens, you don’t have Python installed. Otherwise, you will get the version.
Install Anaconda
• Step 8: Select the environment variables
– If You Are Installing Python For The First Time
• Check the Add Anaconda to my PATH environment variable. This will let you
use Anaconda in your command prompt.
Install Anaconda
• Step 8: Select the environment variables
– If You Already Have Python Installed
• Leave Add Anaconda to my PATH environment variable unchecked.
• Leaving it unchecked means that you will have to use Anaconda Command Prompt in
order to use Anaconda.
Courtesy of https://medium.com/@kumarankita764/new-features-of-anaconda-5-3-5bfdfe9b4240
Getting started with Anaconda
• Conda works as a command line interface called Anaconda Prompt
on Windows and via terminal on macOS and Linux.
• Navigator is a desktop graphical user interface (GUI) that allows you
to launch applications and easily manage conda packages,
environments, and channels without using command-line
commands.
• You can try both conda and Navigator to see which is right for you
to manage your packages and environments.
– You can even switch between them, and the work you do with one can be
viewed in the other
• We will try a simple programming exercise in Python using both
Anaconda Navigator and conda to get familiar with them
Python program using Anaconda Navigator
• Launch Anaconda Navigator on Windows:
– Go to Start Menu and type “anaconda navigator” →
• Launch Anaconda Navigator on macOS:
– Open Launchpad, then click the Anaconda Navigator
icon:
(*) Applications that can be used to write and run programs (Integrated Development Environment – IDE)
Anaconda Navigator
Run Python in Spyder IDE
• Launch Spyder by clicking Spyder’s Launch button.
• In the new file on the left,
delete any placeholder text,
then type or copy/paste
print("Hello Anaconda")
• In the top menu, click
File → Save As and name
your new program hello.py
• Run your new program by
clicking the triangle Run button.
• You can see your program’s output in the bottom right Console pane.
• From Spyder’s top menu bar, select File → Quit
Run Python in JupyterLab
• Launch JupyterLab by clicking JupyterLab’s Launch button.
• This will launch a new browser window (or a new tab on an existing
browser window) showing the Notebook Dashboard.
• In the middle of the Launcher tab there is a button labeled with the
Python version you installed. Click it and create a new Notebook.
Run Python in JupyterLab
• Rename your Notebook
– Right click on the filename (shown on the tab). You can name it to whatever
you’d like, but for this example we’ll use MyFirstAnacondaNotebook.ipynb
• In the first line of the Notebook, type print("Hello Anaconda")
• Save your Notebook by either clicking the save and create
checkpoint icon or by selecting File → Save Notebook in top menu.
• Run your new program by clicking the Run button or selecting Run
from the top menu.
Run Python in JupyterLab
• From JupyterLab’s top
menu bar, select File →
Shutdown
• Click the Show Down
button in the popup window