You are on page 1of 4

Python for Data Science: Numpy and Pandas-2

1. What are NumPy and Pandas?

Tags: #dataframe #data #array 

NumPy is a Python library used for mathematical and scientific computing and can deal with
multidimensional data. NumPy allows us to create ndarrays, which are n-dimensional array objects
optimized for computational efficiency, and provides a variety of functions for accessing, manipulating,
performing different operations, and exporting ndarrays.

Pandas is a Python library that provides fast, flexible, and expressive data structures designed to make
working with structured (tabular, multidimensional, potentially heterogeneous) data both easy and
intuitive. It is built on top of NumPy and provides a variety of functions for accessing,
manipulating, analyzing, and exporting structured data.

2. What are the differences between a Python list and a NumPy array?

Tags: #datatypes #difference 

List NumPy Array (ndarray)

Can have elements of different datatypes All elements are of the same datatype

Not included in core Python and a part of


Included in of core Python 
the NumPy library

Do not support element-wise operations Support element-wise operations

Require more memory for storage as they Require less memory for storage as they do
have to store the datatype of each element not store the datatype of each element
separately separately

Computationally less efficient Computationally more efficient

3. I used the drop() function to drop a column from the dataframe but the changes are not
reflected in the original data. How can I resolve this?

Tags: #dataframe #function #column

The drop() function has a parameter 'inplace' which is set to False by default. This ensures that the
function does not make changes to the original dataframe and returns a copy of the dataframe with
specified changes. The inplace parameter can be set to True to makes changes in the original data.
 

4. What is Data Aggregation?

Tags: #addition #maximum 

Aggregation refers to performing an operation on multiple rows corresponding to a single column. Some
of the aggregation examples are as follows:

 sum:It is used to return the sum of the values for the requested axis.
 min:It is used to return a minimum of the values for the requested axis.
 max:It is used to return maximum values for the requested axis.

5. I am trying to read a CSV file but am getting this error:

FileNotFoundError: [Errno 2] No such file or directory:

How to resolve?

Tags: #error #import #df 

The FileNotFound Error is a very common error caused by a mismatch in the directory that the code
searches for and the directory of the actual file location.

The error is usually resolved by ensuring the following:

1. The data to be loaded and the code notebook on which the data needs to be loaded are stored
in the same folder.
2. Ensure that the name of the dataset is correct (check for lowercase and uppercase, check for
spaces, etc.)

6. What is the difference between .iloc and .loc commands in pandas?

Tags: #pandas #select #filter #access 

loc is a label-based indexing method to access rows and columns of pandas objects. When using the loc
method on a dataframe, we specify which rows and columns to access by using the following format:

dataframe.loc[row labels, column labels]

iloc is an integer-based indexing method to access rows and columns of pandas objects. When using the
loc method on a dataframe, we specify which rows and which columns we want by using the following
format:

dataframe.iloc[row indices, column indices]

7. Why does `head' need parenthesis `( )` but `shape` does not?

Tags: #attribute #error #rows #columns

'shape' is an attribute of a pandas object, which means that every time you create a dataframe, pandas
would have to precompute the shape of the data and store it in the 'shape' attribute.
'head' is a function provided by pandas and is computed only when we call it. Whenever called, it will
return the first 5 rows of the data by default.

8. What is the use of axis in dataframe.drop function?

Ans: Axis parameter is used to define whether column or rows are to be dropped.

df.drop("EmpID",axis=1) # here axis=1 denotes you want to drop a column and

axis=0 denotes you want to remove a row .

9. Why do we get [10,15,20] as the output of below code? Please elaborate.

vec=np.linspace(10,20,3) print(vec)

Linspace function of numpy is used to get evenly spaced samples in the given interval.

vec=np.linspace(10,20,3)

In the above code, 10 denotes starting point, 20 denotes ending point and 3 denotes the number of
samples required. Please make a note that both start and end point are included while calculating the
evenly spaced samples.

Below formula can be used to calculate the evenly spaced distance: ((End – Start)/ (N-1))

Where End – Ending point Start – Staring Point

N- Number of points

10. Please explain what is meant by Uniform Random Variable and Standard Normal Random Variable.

Ans: Uniform Random Variable (np.random.rand()) - this will generate uniform random numbers in the
range of 0-1. If you will generate a large number of these random numbers, the mean of all the numbers
will get close to 0.5. Try running the below code: d=np.random.rand(100,100) d.mean() Standard Normal
Random Variables(np.random.randn() - this will generate random variables that will have a mean of 0 and
a standard deviation of 1. You will study Standard Normal Distribution in-depth in the upcoming course.
Try observing the mean and std of a randn function: e=np.random.randn(100,100)

e.mean()

e.std()

11. Encountered an error while running a np.replace function.

Here the issue is that we have defined the vec as a list. Logically when we pass the np.where function on
a list that has multiple values, the python is unable to decide which value should be checked for the
given statement. In the above case you are comparing one element at at a time to get the desired
output. Check the below code:

vec=[2, 4, 22, 67, 30, 77, 40, 28]

print(vec) vec=np.where(vec[3]>25,2,vec) print(vec)


The above step is to be performed for each variable of the list.

An alternative easy way is to convert the given list into an array and then write the where code. If an
array is given, where a statement will compare and check for the values one by one. This is an
application where the arrays are handy. Try using the below code where I have converted the given list
into an array and see check the output

vec=np.array([2, 4, 22, 67, 30, 77, 40, 28])

print(vec)

vec=np.where(vec >25,2,vec) print(vec)

12. What is the difference between “del” and “drop” function?

 Below are the main differences between del and drop function:

 drop operates on both columns and rows; del operates on column only.
 drop can operate on multiple items at a time; del operates only on one at a time.
 drop can operate in-place or return a copy; del is an in-place operation only.

  

13. Got the below error while defining a Matrix.

The error occurred as a square bracket was missing in the code

-np.array ([1,2,3,1],[4,5,6,4],[7,8,9,7])

Correct code-np.array([[1,2,3,1],[4,5,6,4],[7,8,9,7]]) # have added the required square brackets

14. When to use iloc command and when to use loc command?

Both loc() and iloc() functions are used for data slicing of a pandas dataframe. The difference in loc() and
iloc() function is loc() function. is label based data selecting method which means that we have to pass
the name of the row or column which we want to select whereas iloc() is a indexed based selecting
method which means that we have to pass integer index in the method to select specific row/column.

You might also like