You are on page 1of 28

Python

• Python is a general-purpose programming language.


• created in the late 1980s by Guido van Rossum, it is now one of the
most popular languages in the world.
• It is routinely used by system administrators and web developers.
• scientists are using Python because of rich python libraries such as
NumPy, SciPy, pandas, and matplotlib.
• The ease of use of Python and its dynamic nature make it a very
productive language.
• IPython : an interactive command-line terminal for Python.
• created by Fernando Perez in 2001. IPython offers an enhanced read-eval-print
loop (REPL) environment particularly well adapted to scientific computing.
• In other words, IPython is a powerful interface to the Python language.

• Scripts : Besides IPython, the most common way to use Python is to


write scripts, files with the .py extension.
• A script contains a list of commands to execute in order.
• It runs from start to finish and display some output.
with IPython, you generally write one command at a time and you get the
results instantly. This is a completely different way of working with Python.
When analyzing data or running computational models, you need this sort of
interactivity to explore them efficiently.
• Notebook
• In 2011, IPython introduced a new tool named the Notebook.
• Inspired by scientific programs like Mathematica or Sage, the Notebook offers a modern and
powerful web interface to Python.
• Compared to the original IPython terminal, the Notebook offers a more convenient text editor,
the possibility to write rich text, and improved graphical capabilities.
• Since it is a web interface, it can integrate many of the existing web libraries for data visualization.
• Jupyter Notebook: In 2015, the IPython developers made a major code reorganization of their
ever-growing project.
• The Notebook is now called the Jupyter Notebook. This interface can be used not only with
Python but with dozens of other languages such as R and Julia. IPython is now the name of the
Python backend (aka kernel).
In conclusion, IPython and Jupyter are great interfaces to the Python language. If you're learning
Python, using the IPython terminal or the Jupyter Notebook is highly recommended.
Jupyter Notebook

• Jupyter Notebook is a web application that


allows you to create and share documents
that contain:
• live code (e.g. Python code)
• visualizations
• explanatory text (written in markdown
syntax)
Setting Up Jupyter Notebook
• visit the project’s website at http://www.jupyter.org
• Two options:
• Try it in your browser: you can access a hosted version of Jupyter
Notebook. This will get you direct access without needing to install it
on your computer.
• Install the Notebook (2 methods)
• Installing Jupyter Notebook by using the Python’s package manager pip
• Installing Jupyter Notebook by installing the Anaconda distribution
Anaconda Distribution
• Anaconda download page: https://www.anaconda.com/download/
• Download and execute the installer of your choice.
What is an ipynb File?
• Quick answer: each .ipynb file is one notebook, so each time you create a
new notebook, a new .ipynb file will be created.
• Detail answer: Each .ipynb file is a text file that describes the contents of
your notebook in a format called JSON.
• Each cell and its contents, including image attachments that have been
converted into strings of text, is listed therein along with some metadata.
• You can edit this yourself —by selecting “Edit > Edit Notebook Metadata”
from the menu bar in the notebook.
• You can also view the contents of your notebook files by selecting “Edit”
from the controls on the dashboard
In most cases, there's no reason you should ever need to edit your
notebook metadata manually.
Cells
• Cells form the body of a notebook. There are two main cell types:
• A code cell contains code to be executed in the kernel. When the
code is run, the notebook displays the output below the code cell
that generated it.
• A Markdown cell contains text formatted using Markdown and
displays its output in-place when the Markdown cell is run.
https://www.markdownguide.org/cheat-sheet/
Keyboard Shortcuts
• In a Jupyter Notebook, there is always one “active” cell highlighted
with a border whose color denotes its current mode:
• Green outline — cell is in "edit mode"
• Blue outline — cell is in "command mode"
Keyboard Shortcuts
• Toggle between edit and command mode with Esc and Enter, respectively.
• Once in command mode:
• Scroll up and down your cells with your Up and Down keys.
• Press A or B to insert a new cell above or below the active cell.
• M will transform the active cell to a Markdown cell.
• Y will set the active cell to a code cell.
• D + D (D twice) will delete the active cell.
• Z will undo cell deletion.
• Hold Shift and press Up or Down to select multiple cells at once. With multiple cells
selected, Shift + M will merge your selection.
• Ctrl + Shift + -, in edit mode, will split the active cell at the cursor.
• Type the library name
• Type period symbol ’.’
• Press tab
Loading Python Libraries
In [ ]: #Import Python Libraries
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns

Press Shift+Enter to execute the jupyter cell

19
Popular Python Data Science Libraries
Data manipulation and analysis
• Pandas: for the manipulation and analysis of
data.
• provides many easy to use functions to perform
data manipulation and analysis with the help of
data structures
• the data structures supported by Pandas are Data
frames (which handles two-dimensional data) and
series (which handles one-dimensional data).
• It offers to work with labelled and relational
datasets.
• It offers to convert the data structures into the data
frame and to provide some data analysis tasks to
find out the missing values, plot the data with
histogram, drop the null values columns, and more.
• It has all the data manipulation, optimization,
visualization, and data wrangling features.
Data processing
• Numpy : can be considered as an
abbreviated form for Numerical Python
and it is used for scientific computing.
• It provides a large number of functions to
deal with high dimensional arrays, metrics
and linear algebra.
• A wide range of operations can be
performed on array and matrices using the
methods provided by this python package.
• It also provides various tools for integrating
the code of C, C++, and Fortron.
Data processing
• Scipy: Python Scipy library employs in both
Data Science and Scientific Computing.
• libraries for math, science and engineering.
• NumPy, Matplotlib and pandas are libraries
that fall under the SciPy project umbrella.
• It includes different modules for image
processing, linear algebra, integration and
interpolation of data, etc.
Machine Learning and NLP
• Scikit-learn: popular python library for
implementing machine learning algorithms.
• It helps in quickly implementing several popular
machine learning algorithms like linear
regression, logistic regression, data
preprocessing and dimensionality reduction
tasks etc.
• This python library is developed around Numpy,
Scipy, and Matplotlib.
• NLTK: NLTK stands for Natural Language Toolkit.
It is an open-source library to work with the
human language data sets. It is very useful for
problems like text analytics, sentiment analysis,
analyzing linguistic structure, etc.
Deep Learning
• TensorFlow: an open-source framework
by Google for an end to end machine
learning and deep learning solutions.
• It gives low-level controls to the users to
design and train highly scalable and
complex neural networks.
• available for both desktop and mobile and
supports an extensive number of
programming languages through
wrappers.
Deep Learning

• Keras: an open-source high level deep learning


library.
• It gives the flexibility of using either tensorflow
or theano as backend.
• Keras provides simple high-level API for
developing deep learning models.
• It is suitable for quick prototyping and
developing neural network models for industrial
use.
• The primary usage of Keras is in classification,
text generation, and summarization, tagging,
and translation, speech recognition, etc.
Visualization
• Matplotlib: popular 2D plotting Python library for data visualization inspired by MATLAB.
• Provides high-quality two-dimensional figures like a bar chart, distribution plots,
histograms, scatterplot, etc. with few lines of code.
• Like MATLAB, it also gives users the flexibility of choosing low-level functionalities like line
styles, font properties, axes properties, etc, via an object-oriented interface or via a set of
functions.
• Plotly: popular open-source python graphing library for high quality, interactive
visualization.
• In addition to 2D graphs, it also supports 3D plotting. Plotly is used extensively for in-
browser visualization of data.

• Seaborn: It is built on the Matplotlib package.


• It is a python library that provides tools for statistical graphics.
You can use any of combination of above for effective data visualization.
• https://jupyter.org/try
Hello Data Science World! With IRIS Dataset
• Iris dataset is the Hello World for the Data Science.
• Iris dataset contains five columns such as Petal Length, Petal Width,
Sepal Length, Sepal Width and Species Type.
• Iris is a flowering plant, the researchers have measured various
features of the different iris flowers and recorded digitally.
• Import pandas as pd
• data=pd.read_csv(‘../data/iris.csv)
• dataset.head() : Display top rows of the dataset with their columns
• data.sample(10):Display the number of rows randomly.
• data.columns : Display the number and names of columns.
• data.shape: Display the shape of data
• data.info() : Display all columns info
• Print(data): Display the whole dataset
Try to read the first 10, 20, 50 records!
Can you guess how to view the last few records?
Slicing the rows
• #data[start:end]
• #start is inclusive whereas end is exclusive
• print(data[10:21]) # it will print the rows from 10 to 20.
• # you can also save it in a variable for further use in analysis
• sliced_data=data[10:21]
• print(sliced_data)
Slicing and dicing (2)
• It is also possible not to specify these indexes.
• If you don't specify the begin index, Python figures out that you want
to start your slicing starts at the beginning
• If you don't specify the end index, the slicing will go till last element

Try:
• Create first_six: first 6 rows by omitting the begin index.
• Create last_four: last 4 rows omitting the end index.
Selecting columns
• Specific_data=data[“Species”]
• specific_data=data[["Id","Species"]]
• #data[["column_name1","column_name2","column_name3"]]

• #now we will print the first 10 columns of the specific_data dataframe.


• print(specific_data.head(10))

• data['name'].unique()
• data.groupby('name').size()
Select a column or columns in a dataframe
• For pandas objects (Series, DataFrame), the indexing operator [] only accepts:
1. column name or list of column names to select column(s)
2. slicing or Boolean array to select row(s), i.e. it only refers to one dimension of the dataframe.
• So if you’re choosing one column, you can get away with passing in just the name
of the column.
df[“columnname”]
• But if you’re choosing multiple columns, you have to pass in a container that
contains multiple values. The most commonly used data type is the list, which
also happens to be defined using square brackets.
columns = [“c1”, “c2”]
df[columns]
• Or to simplify
df[[“c1”, “c2”]]

You might also like