Startup Case Study

9/7/2021 STARTUP_CASE_STUDY(GRP - 3)
Indian Startup Case Study

Importing neccessary Libraries
Problem Statement : To perform an Indian startup case study analysis
In [1]: #importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Reading Data
In [2]: data_1 = pd.read_csv('./Datasets/startup_funding.csv')
data = data_1.copy()
data.head()
Out[2]: Sr Date Industry City Investors

Startup Name SubVertical
No dd/mm/yyyy Vertical Location Name
Tiger Global
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru
Management
App based Susquehanna

1 2 13/01/2020 Shuttl Transportation shuttle Gurgaon Growth
service Equity
Retailer of
baby and Sequoia
2 3 09/01/2020 Mamaearth E-commerce Bengaluru
toddler Capital India
products
Online New Vinod

3 4 02/01/2020 https://www.wealthbucket.in/ FinTech
Investment Delhi Khatumal
Embroiled Sprout
Fashion and
4 5 02/01/2020 Fashor Clothes For Mumbai Venture
Apparel
Women Partners
In [3]: data.shape
Out[3]: (3044, 10)
Cleaning Data
In [4]: data.isnull().sum()
Out[4]: Sr No 0
Date dd/mm/yyyy 0
Startup Name 0
Industry Vertical 171
SubVertical 936
City Location 180
Investors Name 24
InvestmentnType 4
Amount in USD 960
localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 1/5

Remarks 2625
dtype: int64
In [5]: # changing the names of the columns inside the data
data.columns = ["SNo", "Date", "StartupName", "IndustryVertical", "SubVertical",
"City", "InvestorsName", "InvestmentType", "AmountInUSD", "R

# need to extract year from Date column
data.Date.dtype
Out[5]: dtype('O')
In [6]: # lets clean the strings
def clean_string(x):
return str(x).replace("\\xc2\\xa0","").replace("\\\\xc2\\\\xa0", "")
# lets apply the function to clean the data
for col in ["StartupName", "IndustryVertical", "SubVertical", "City",
"InvestorsName", "InvestmentType", "AmountInUSD", "Remarks"]:
data[col] = data[col].apply(lambda x: clean_string(x))
Checking the trend of investments by plotting

number of fundings done in each year.
In [9]: # to find out issues in Date column like . and // in place of / in some dates .
unique_dates = data.Date.unique().tolist()
# unique_dates
In [12]: # removing issue in Date column
data.Date = data.Date.str.replace('.','/' )
data.Date = data.Date.str.replace('//','/')
# extracting year from date column
year = data.Date.str.split('/' , expand = True)[2]
# sorting year in chronological order
year = year.value_counts().sort_index()
x = year.index
y = year.values
# plotting line plot
plt.plot(x,y)
plt.title('Trend of investments')
plt.xlabel("Year")
plt.ylabel("Number of Fundings")
plt.show()
for i in range(3):
print('Year : ' , x[i],', No. of fundings : ' , y[i])

Year : 015 , No. of fundings : 1

In [13]: # function to clean the AmounInUsd Column
def clean_amount(x):
x = ''.join([c for c in str(x) if c in ['0', '1', '2', '3', '4', '5', '6', '7',
x = str(x).replace(",","").replace("+","")
x = str(x).lower().replace("undisclosed","")
x = str(x).lower().replace("n/a","")
if x == '':
x = '-999'
return x
# lets apply the function on the column
data["AmountInUSD"] = data["AmountInUSD"].apply(lambda x: float(clean_amount(x)))
# lets check the head of the column after cleaning it

plt.rcParams['figure.figsize'] = (15, 3)
data['AmountInUSD'].plot(kind = 'line', color = 'black')
plt.title('Distribution of Amount', fontsize = 15)
plt.show()
Top 10 Indian cities which have most number of

startups
In [14]: # droppping rows having NaN values in CityLocation column
data_temp = data.copy()
data_temp = data_temp[data_temp['City'].notnull()]
data_temp.City.dropna(inplace = True)
# sorting out issues in city names
def separateCity(city):
return city.split('/')[0].strip()
data_temp.City = data_temp.City.apply(separateCity)
data_temp.City.replace('Delhi','New Delhi' , inplace = True)
data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)
In [15]: ## Counting startups in each city
city_num = data.City.value_counts()[0:10]
city = city_num.index
num_city = city_num.values
## plotting a pie chart shwoing percentage share of each city in no. of startups the
plt.rcParams['figure.figsize'] = (15,9)
plt.pie(num_city , labels = city , autopct='%.2f%%' , startangle = 90 , wedgeprops

plt.show()
for i in range(len(city)):
print('City : ' , city[i] ,' , Number of Startups :' , num_city[i])

City : Bangalore , Number of Startups : 701
City : Mumbai , Number of Startups : 568
City : New Delhi , Number of Startups : 424
City : Gurgaon , Number of Startups : 291
City : nan , Number of Startups : 180
City : Bengaluru , Number of Startups : 141
City : Pune , Number of Startups : 105
City : Hyderabad , Number of Startups : 99
City : Chennai , Number of Startups : 97
City : Noida , Number of Startups : 93
Calculating percentage of funding each city has got!

In [16]: data_temp.City = data_temp.City.apply(separateCity)
data_temp.City.replace('Delhi','New Delhi' , inplace = True)
data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)
# Removing ',' in Amount column and converting it to integer
data_temp.AmountInUSD = data_temp.AmountInUSD.apply(lambda x : float(str(x).replace(

data_temp.AmountInUSD = pd.to_numeric(data_temp.AmountInUSD)
# Calculating citywise amount of funding received.
city_amount = data_temp.groupby('City')['AmountInUSD'].sum().sort_values(ascending =
city = city_amount.index
amountCity = city_amount.values
## calculating percentage of the funding each city has received .
perAmount = np.true_divide(amountCity , amountCity.sum())*100
for i in range(len(city)):
print(city[i] , format(perAmount[i], '.2f'),'%')
plt.bar(city, perAmount, color = sns.color_palette("flare"))
Bangalore 31.10 %
Bengaluru 23.45 %
Mumbai 13.51 %

Gurgaon 9.52 %
New Delhi 9.18 %
Noida 3.50 %
nan 3.46 %
Gurugram 2.36 %
Chennai 1.96 %
Pune 1.95 %
Out[16]: <BarContainer object of 10 artists>

Startup Case Study

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Startup Case Study

Uploaded by

Copyright:

Available Formats

9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Indian Startup Case Study

In [1]: #importing necessary libraries

import matplotlib.pyplot as plt

import seaborn as sns

Out[2]: Sr Date Industry City Investors

App based Susquehanna

Online New Vinod

Out[3]: (3044, 10)

Industry Vertical 171

City Location 180

Amount in USD 960

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 1/5

In [5]: # changing the names of the columns inside the data

data.columns = ["SNo", "Date", "StartupName", "IndustryVertical", "SubVertical",

"City", "InvestorsName", "InvestmentType", "AmountInUSD", "R

In [6]: # lets clean the strings

return str(x).replace("\\xc2\\xa0","").replace("\\\\xc2\\\\xa0", "")

# lets apply the function to clean the data

for col in ["StartupName", "IndustryVertical", "SubVertical", "City",

"InvestorsName", "InvestmentType", "AmountInUSD", "Remarks"]:

data[col] = data[col].apply(lambda x: clean_string(x))

Checking the trend of investments by plotting

In [12]: # removing issue in Date column

# extracting year from date column

year = data.Date.str.split('/' , expand = True)[2]

# sorting year in chronological order

# plotting line plot

print('Year : ' , x[i],', No. of fundings : ' , y[i])

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 2/5

Year : 015 , No. of fundings : 1

Year : 2016 , No. of fundings : 993

In [13]: # function to clean the AmounInUsd Column

# lets apply the function on the column

data["AmountInUSD"] = data["AmountInUSD"].apply(lambda x: float(clean_amount(x)))

# lets check the head of the column after cleaning it

data['AmountInUSD'].plot(kind = 'line', color = 'black')

plt.title('Distribution of Amount', fontsize = 15)

Top 10 Indian cities which have most number of

# sorting out issues in city names

data_temp.City.replace('Delhi','New Delhi' , inplace = True)

data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)

In [15]: ## Counting startups in each city

plt.pie(num_city , labels = city , autopct='%.2f%%' , startangle = 90 , wedgeprops

print('City : ' , city[i] ,' , Number of Startups :' , num_city[i])

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 3/5

City : Bangalore , Number of Startups : 701

City : Mumbai , Number of Startups : 568

City : New Delhi , Number of Startups : 424

City : Gurgaon , Number of Startups : 291

City : nan , Number of Startups : 180

City : Bengaluru , Number of Startups : 141

City : Pune , Number of Startups : 105

City : Hyderabad , Number of Startups : 99

City : Chennai , Number of Startups : 97

City : Noida , Number of Startups : 93

Calculating percentage of funding each city has got!

data_temp.City.replace('Delhi','New Delhi' , inplace = True)

data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)

# Removing ',' in Amount column and converting it to integer

data_temp.AmountInUSD = data_temp.AmountInUSD.apply(lambda x : float(str(x).replace(

# Calculating citywise amount of funding received.

## calculating percentage of the funding each city has received .