You are on page 1of 11

Python for data Sciences

Data Structures allows to store collections of data, relate them and perform operations on
them accordingly. Eg. Arrays, lists, dictionary etc.
Python packages which help us in accessing structured data inside the code. Let us take a
look at some of these packages:
1.Pandas
Pandas is a software library for data manipulation and analysis. The name is derived
from the term ‘Panel data’. The two primary data structures of Pandas are series(1 D)
and Data frames(2 D).
Pandas works well with different kinds of data:
1. tabular form in SQL table or Excel spreadsheet
2. ordered or unordered time- series data
3.Observational /statistical data sets
2.NumPy

NumPy, which stands for Numerical Python, is the fundamental package for
Mathematical and logical operations in Python. NumPy works with arrays, which is
nothing but a homogenous(similar) collection of Data.

Run the following code cell to import the NumPy module: import numpy as np

two-dimensional arrays

You can also use `np.array` to create a two-dimensional matrix. To create a two-dimensional matrix,
specify an extra layer of square brackets. For example, the following call creates a 3x2 matrix:
Three-dimensional arrays

To populate a matrix with all zeroes, call `np.zeros`. To populate a matrix with all ones, call `np.ones`.
 Creation of array:

Populate arrays with sequences of numbers


You can populate an array with a sequence of numbers:

import numpy as np

sequence_of_integers = np.arange(5, 12)

print(sequence_of_integers)

[ 5 6 7 8 9 10 11]

Notice that `np.arange` generates a sequence that includes the lower bound (5) but not the upper bound (12).

## Populate arrays with random numbers


NumPy provides various functions to populate matrices with random numbers across certain ranges. For
example, `np.random.randint` generates random integers between a low and high value. The following call
populates a 6-element vector with random integers between 50 and 100.

import numpy as np

random_integers_between_50_and_100 = np.random.randint(low=50, high=101, size=(6))

print(random_integers_between_50_and_100)
[72 76 63 95 64 83]

Statistical Methods:

1. Mean- Average of all values

2. Mode- Most repeated value

3.Median-Middle values after all values are sorted

4. Variance- calculated by taking the differences between each number in the data set and the mean,
then squaring the differences to make them positive, and finally dividing the sum of the squares by the
number of values in the data set.

(The variance is the average of squared differences from the mean value.)

5. Standard deviation - The standard deviation is calculated as the square root of variance by determining
each data point's deviation relative to the mean.

Activity: Apply the statistical methods-


1) Create and store the height of any 5 of your class mates.
2) Create and store marks of any of 5 subjects.
Pandas

At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays
in which the rows and columns are identified with labels rather than simple integer indices.

Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but
nearly everything that follows will require an understanding of what these structures are.

Thus, before we go any further, let's introduce these three fundamental Pandas data structures: the
``Series``, ``DataFrame``, and ``Index``.

## The Pandas Series Object

A Pandas ``Series`` is a one-dimensional array of indexed data.

It can be created from a list or array as follows:

We will start our code sessions with the standard NumPy and Pandas imports:

import numpy as np

import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1.0])

data

As we see in the output, the ``Series`` wraps both a sequence of values and a sequence of indices, which
we can access with the ``values`` and ``index`` attributes.
The ``values`` are simply a familiar NumPy array

### Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many
aspects of set arithmetic.

The ``Index`` object follows many of the conventions used by Python's built-in ``set`` data structure, so that
unions, intersections, differences, and other combinations can be computed in a familiar way:

Matplotlib-Data Visualisation

1. Bar chart creation


Activity:

Practice the concepts taught in the session through jupyter notebook given in the link –

https://colab.research.google.com/drive/1tHKuj-geNoJVTYGnBoVchXuln35nQ4ot

You might also like