You are on page 1of 16

Data Science

BS CS 6th/ADP
By
Mr. Shakeel Waris
Introduction To Python in Data Science
Data is the new Oil. This statement shows how every modern IT
system is driven by capturing, storing and analyzing data for
various needs. Be it about making decision for business,
forecasting weather, studying protein structures in biology or
designing a marketing campaign. All of these scenarios involve
a multidisciplinary approach of using mathematical models,
statistics, graphs, databases and of course the business or
scientific logic behind the data analysis. So we need a
programming language which can cater to all these diverse
needs of data science. Python shines bright as one such language
as it has numerous libraries and built in features which makes it
easy to tackle the needs of Data science.
 In this tutorial we will cover these the various techniques used in
data science using the Python programming language.
Introduction To Python in Data Science
Data science is the process of deriving knowledge
and insights from a huge and diverse set of data
through organizing, processing and analyzing
the data. It involves many different disciplines
like mathematical and statistical modeling,
extracting data from it source and applying data
visualization techniques. Often it also involves
handling big data technologies to gather both
structured and unstructured data. Below we will
see some example scenarios where Data science
is used.
Python in Data Science

The programming requirements of data science


demands a very versatile yet flexible language which
is simple to write the code but can handle highly
complex mathematical processing. Python is most
suited for such requirements as it has already
established itself both as a language for general
computing as well as scientific computing. More
over it is being continuously upgraded in form of
new addition to its plethora of libraries aimed at
different programming requirements. Below we will
discuss such features of python which makes it the
preferred language for data science.
Python in Data Science

 A simple and easy to learn language which achieves result in


fewer lines of code than other similar languages like R. Its
simplicity also makes it robust to handle complex scenarios
with minimal code and much less confusion on the general
flow of the program.
 It is cross platform, so the same code works in multiple
environments without needing any change. That makes it
perfect to be used in a multi-environment setup easily.
 It executes faster than other similar languages used for data
analysis like R and MATLAB.
 Its excellent memory management capability, especially
garbage collection makes it versatile in gracefully managing
very large volume of data transformation, slicing, dicing and
visualization.
Python in Data Science

 Most importantly Python has got a very large


collection of libraries which serve as special purpose
analysis tools. For example – the NumPy package
deals with scientific computing and its array needs
much less memory than the conventional python list
for managing numeric data. And the number of such
packages is continuously growing.
 Python has packages which can directly use the code
from other languages like Java or C. This helps in
optimizing the code performance by using existing
code of other languages, whenever it gives a better
result.
Getting Python

The most up-to-date and current source


code, binaries, documentation, news, etc.,
is available on the official website of
Python https://www.python.org/
You can download Python documentation
from https://www.python.org/doc/. The
documentation is available in HTML,
PDF, and PostScript formats.
Installing Python

 Python distribution is available for a wide


variety of platforms. You need to download
only the binary code applicable for your
platform and install Python.
 If the binary code for your platform is not
available, you need a C compiler to compile the
source code manually. Compiling the source
code offers more flexibility in terms of choice
of features that you require in your installation.
 Here is a quick overview of installing Python
on various platforms −
Unix and Linux Installation

 Here are the simple steps to install Python on


Unix/Linux machine.
 Open a Web browser and go to
https://www.python.org/downloads/.
 Follow the link to download zipped source code
available for Unix/Linux.
 Download and extract files.
 Editing the Modules/Setup file if you want to customize
some options.
 run ./configure script
 make
 make install
Windows Installation

 Here are the steps to install Python on Windows machine.


 Open a Web browser and go to
https://www.python.org/downloads/.
 Follow the link for the Windows installer python-XYZ.msi
file where XYZ is the version you need to install.
 To use this installer python-XYZ.msi, the Windows system
must support Microsoft Installer 2.0. Save the installer file
to your local machine and then run it to find out if your
machine supports MSI.
 Run the downloaded file. This brings up the Python install
wizard, which is really easy to use. Just accept the default
settings, wait until the install is finished, and you are done.
Setting path at Windows

To add the Python directory to the path


for a particular session in Windows −
At the command prompt − type path
%path%;C:\Python and press Enter.
Note − C:\Python is the path of the
Python directory.
Python Environment Variables
Pandas
 Pandas is an open-source Python Library used for high-
performance data manipulation and data analysis using
its powerful data structures. Python with pandas is in use
in a variety of academic and commercial domains,
including Finance, Economics, Statistics, Advertising,
Web Analytics, and more. Using Pandas, we can
accomplish five typical steps in the processing and
analysis of data, regardless of the origin of data — load,
organize, manipulate, model, and analyse the data.
 Below are the some of the important features of Pandas
which is used specifically for Data processing and Data
analysis work.
Key Features of Pandas

 Fast and efficient Data Frame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file
formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and sub setting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
 Time Series functionality.

Pandas deals with the following three data structures −


 Series
 Data Frame
 These data structures are built on top of Numpy array, making them fast and
efficient.
Dimension & Description
Data Frame

You might also like