You are on page 1of 5

9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Indian Startup Case Study


Importing neccessary Libraries
Problem Statement : To perform an Indian startup case study analysis

In [1]: #importing necessary libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

Reading Data
In [2]: data_1 = pd.read_csv('./Datasets/startup_funding.csv')

data = data_1.copy()

data.head()

Out[2]: Sr Date Industry City Investors


Startup Name SubVertical
No dd/mm/yyyy Vertical Location Name

Tiger Global
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru
Management

App based Susquehanna


1 2 13/01/2020 Shuttl Transportation shuttle Gurgaon Growth
service Equity

Retailer of
baby and Sequoia
2 3 09/01/2020 Mamaearth E-commerce Bengaluru
toddler Capital India
products

Online New Vinod


3 4 02/01/2020 https://www.wealthbucket.in/ FinTech
Investment Delhi Khatumal

Embroiled Sprout
Fashion and
4 5 02/01/2020 Fashor Clothes For Mumbai Venture
Apparel
Women Partners

In [3]: data.shape

Out[3]: (3044, 10)

Cleaning Data
In [4]: data.isnull().sum()

Out[4]: Sr No 0

Date dd/mm/yyyy 0

Startup Name 0

Industry Vertical 171

SubVertical 936

City Location 180

Investors Name 24

InvestmentnType 4

Amount in USD 960

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 1/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Remarks 2625

dtype: int64

In [5]: # changing the names of the columns inside the data

data.columns = ["SNo", "Date", "StartupName", "IndustryVertical", "SubVertical",

"City", "InvestorsName", "InvestmentType", "AmountInUSD", "R


# need to extract year from Date column

data.Date.dtype

Out[5]: dtype('O')

In [6]: # lets clean the strings

def clean_string(x):

return str(x).replace("\\xc2\\xa0","").replace("\\\\xc2\\\\xa0", "")

# lets apply the function to clean the data

for col in ["StartupName", "IndustryVertical", "SubVertical", "City",

"InvestorsName", "InvestmentType", "AmountInUSD", "Remarks"]:

data[col] = data[col].apply(lambda x: clean_string(x))

Checking the trend of investments by plotting


number of fundings done in each year.
In [9]: # to find out issues in Date column like . and // in place of / in some dates .

unique_dates = data.Date.unique().tolist()

# unique_dates

In [12]: # removing issue in Date column

data.Date = data.Date.str.replace('.','/' )

data.Date = data.Date.str.replace('//','/')

# extracting year from date column

year = data.Date.str.split('/' , expand = True)[2]

# sorting year in chronological order

year = year.value_counts().sort_index()

x = year.index

y = year.values

# plotting line plot

plt.plot(x,y)

plt.title('Trend of investments')

plt.xlabel("Year")

plt.ylabel("Number of Fundings")

plt.show()

for i in range(3):

print('Year : ' , x[i],', No. of fundings : ' , y[i])

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 2/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Year : 015 , No. of fundings : 1


Year : 2015 , No. of fundings : 935

Year : 2016 , No. of fundings : 993

In [13]: # function to clean the AmounInUsd Column

def clean_amount(x):

x = ''.join([c for c in str(x) if c in ['0', '1', '2', '3', '4', '5', '6', '7',
x = str(x).replace(",","").replace("+","")

x = str(x).lower().replace("undisclosed","")

x = str(x).lower().replace("n/a","")

if x == '':

x = '-999'

return x

# lets apply the function on the column

data["AmountInUSD"] = data["AmountInUSD"].apply(lambda x: float(clean_amount(x)))

# lets check the head of the column after cleaning it


plt.rcParams['figure.figsize'] = (15, 3)

data['AmountInUSD'].plot(kind = 'line', color = 'black')

plt.title('Distribution of Amount', fontsize = 15)

plt.show()

Top 10 Indian cities which have most number of


startups
In [14]: # droppping rows having NaN values in CityLocation column

data_temp = data.copy()

data_temp = data_temp[data_temp['City'].notnull()]

data_temp.City.dropna(inplace = True)

# sorting out issues in city names

def separateCity(city):

return city.split('/')[0].strip()

data_temp.City = data_temp.City.apply(separateCity)

data_temp.City.replace('Delhi','New Delhi' , inplace = True)

data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)

In [15]: ## Counting startups in each city

city_num = data.City.value_counts()[0:10]

city = city_num.index

num_city = city_num.values

## plotting a pie chart shwoing percentage share of each city in no. of startups the
plt.rcParams['figure.figsize'] = (15,9)

plt.pie(num_city , labels = city , autopct='%.2f%%' , startangle = 90 , wedgeprops


plt.show()

for i in range(len(city)):

print('City : ' , city[i] ,' , Number of Startups :' , num_city[i])

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 3/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

City : Bangalore , Number of Startups : 701

City : Mumbai , Number of Startups : 568

City : New Delhi , Number of Startups : 424

City : Gurgaon , Number of Startups : 291

City : nan , Number of Startups : 180

City : Bengaluru , Number of Startups : 141

City : Pune , Number of Startups : 105

City : Hyderabad , Number of Startups : 99

City : Chennai , Number of Startups : 97

City : Noida , Number of Startups : 93

Calculating percentage of funding each city has got!


In [16]: data_temp.City = data_temp.City.apply(separateCity)

data_temp.City.replace('Delhi','New Delhi' , inplace = True)

data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)

# Removing ',' in Amount column and converting it to integer

data_temp.AmountInUSD = data_temp.AmountInUSD.apply(lambda x : float(str(x).replace(


data_temp.AmountInUSD = pd.to_numeric(data_temp.AmountInUSD)

# Calculating citywise amount of funding received.

city_amount = data_temp.groupby('City')['AmountInUSD'].sum().sort_values(ascending =
city = city_amount.index

amountCity = city_amount.values

## calculating percentage of the funding each city has received .

perAmount = np.true_divide(amountCity , amountCity.sum())*100

for i in range(len(city)):

print(city[i] , format(perAmount[i], '.2f'),'%')

plt.bar(city, perAmount, color = sns.color_palette("flare"))

Bangalore 31.10 %

Bengaluru 23.45 %

Mumbai 13.51 %

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 4/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Gurgaon 9.52 %

New Delhi 9.18 %

Noida 3.50 %

nan 3.46 %

Gurugram 2.36 %

Chennai 1.96 %

Pune 1.95 %

Out[16]: <BarContainer object of 10 artists>

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 5/5

You might also like