You are on page 1of 19

India is home of unspeakable crimes against women.

The report attempts to build a predictive model which explains the factors which influence the crime against women.

CRIME AGAINST WOMEN IN INDIA


THE INFLUENCING FACTORS
PREPARED BY: Shobhit Bhatnagar Divya Verma Noopur Gupta Prashant Dabas

TABLE OF CONTENTS
1. 2. 3. INTRODUCTION .............................................................................................................. 2 OBJECTIVES OF THE STUDY ............................................................................................. 3 METHODOLOGY.............................................................................................................. 4 3.1. 3.2. 4. SOURCES OF DATA ........................................................................................................... 4 RESOURCES USED ............................................................................................................. 4

DATA SET........................................................................................................................ 5 4.1. 4.2. 4.3. SIZE OF DATA .................................................................................................................... 5 DATA CLEANING ............................................................................................................... 5 DATA DESCRIPTION .......................................................................................................... 5

5.

EXPLORATION................................................................................................................. 7 5.1. 5.2. 5.3. 5.4. HISTOGRAM ..................................................................................................................... 7 CORRELATIONS ................................................................................................................. 8 BOXPLOTS ......................................................................................................................... 9 PARALLEL PLOTS ............................................................................................................... 9

6. 7.

REGRESSION MODEL ..................................................................................................... 11 CONCLUSION & RECOMMENDATIONS........................................................................... 16 7.1. 7.2. CONCLUSIONS ................................................................................................................ 16 RECOMMENDATIONS ..................................................................................................... 16

8. 9.

LIMITATIONS ................................................................................................................ 17 REFERENCES ................................................................................................................. 18

1. INTRODUCTION
India is home of unspeakable crimes against women. Centuries have come, and centuries have gone, but the plight of women in India is not likely to change. Time has helplessly watched women suffering in the form of discrimination, oppression, exploitation, degradation, aggression, humiliation. Indian women through the countries remained subjugated and oppressed because society believed in clinging on to orthodox beliefs for the brunt of violencedomestic as well as public, physical, emotional and mental. Although, women may be victims of all kinds of crime, be it cheating, murder, robbery, etc., yet the crimes in which only women are victims and which are directed specifically against them are characterized as "crime against women". Broadly, crimes against women are classified under two categories: Crimes under the Indian Penal Code (IPC), which include seven crimes: (i) rape, (ii) kidnapping and abduction, (iii) dowry deaths, (iv) torture physical and mental (including wife battering), (v) molestation, and (vi) sexual harassment, and (vii) importation of girls. Crimes under Special and Local Laws (SLL), which include seventeen crimes, of which the important ones are: (i) immoral traffic (1956 and 1978 Act), (ii) dowry prohibition (1961 Act), (iii) committing Sati (1987 Act), and (iv) indecent representation of women (1986 Act). Today the crime against women in India is increasing at a very higher rate. National Crime Record Bureau statistics show crimes against women increased by 7.1 percent nationwide since 2010. There has been a rise in the number of incidents of rape recorded too. In 2011, 24,206 incidents were recorded, a rise of 9 percent from the previous year. More than half of the victims are between 18 and 30 years of age. A total of 2,28,650 incidents of crimes against women were reported in the country during 2011. So facts like this pressurize us to know about the factors that are leading to such higher crime rates in India. This study was conducted to know what are the most relevant factors influencing the crime rates against women.

2. OBJECTIVES OF THE STUDY


The objectives of this study are: To identify the factors that influences the crime rates against women in India. To better understand what are the factors that are most important leading to higher crime rates against women in India. To analyses all factors, their correlation with the increasing crime rate in various states. To build a predictive regression model using factors which are highly significant in affecting the incidents of crime against women.

3. METHODOLOGY
3.1. SOURCES OF DATA

Data has been collected from credible and verifiable sources which mainly include government websites which keep records for these variables. Some of the key sources of data are Indias census data for 2011, Reserve Bank of India (RBI) data on banks in each state, and other government databases.

3.2.

RESOURCES USED

The report utilized several resources for data collection and also for processing of data for the final analysis. These resources include: IBM SPSS (v19), it is a software comprising of comprehensive set of predictive analytic tools for business users, analysts and statistical programmers. R Statistical computing (v3.0.1), it is a language and environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques, and is highly extensible. Microsoft Excel (2013), it has grid of cells arranged in numbered rows and letter-named columns to organize data manipulations like arithmetic operations. It has a battery of supplied functions to answer statistical, engineering and financial needs.

4. DATA SET
4.1. SIZE OF DATA

The data was organized so that each row represented each state and union territory and each column represented one single of the variables, satisfying the prerequisites for Tidy Data1. The collected data contained details about each of the 35 states and union territories of India in regard to 17 different variables. This indicates a total of 595 data cells.

4.2.

DATA CLEANING

Before carrying out any analysis the collected was cleaned to avoid any future complications during the actual analysis of the data. To accomplish this following data cleaning steps were undertaken: All variable names were changed in acceptable formats for data processing software such as SPSS, R Statistical Computing, etc. All variables were named in small caps. New variables were also created to a more coherent form so as to make valid inferences. The range for all variables was also set appropriate and this was especially relevant as most of the variables had numeric values. Descriptive labeling was used to describe each variable succinctly yet effectively so as to produce easily understandable graphs, tables, etc. Outliers were identified using boxplots and summary statistics and any invalid entries were traced back to the source and corrected. All the data was also converted to a more coherent form so as to make valid inferences.

4.3.
NAME
incidence congOrNot
1

DATA DESCRIPTION
DESCRIPTION Incidents of Crime Against Women (per 1,00,000 of population) Congress / Non-congress TYPE
Numeric Categorical

The following is the list of variables used for data collection and analysis:

Wickham H., Tidy data, Journal of Statistical Software

pop pop_growth pop_density poverty literacy edu power sex_r Feticide gsdp unemployment bank villages

Population of the State/UT (X1,00,000) Population (% growth rate) Density of population Poverty ratio (% of Population below poverty line) Literacy ratio (in %) Number of education centers (X1,000) Power consumption (kwh per capita) Females per 1 male Female feticide (0-6 years females per 1000 0-6 years males) Per capita Gross Domestic Product of a state Unemployment ratios [(per 1000) for persons of age 15 years & above] No. of commercial bank branches (x1,000) No. of villages (X1,000)

Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric

The following table details about the number of entries, range, mean and standard deviation of all of the variables in the above table.

Incidents of crime against women (per 1,00,000 of population) (log) Total Population (X1,00,000) Population Growth (%) Population Density Poverty Ratio (% of Population bpl) Literacy Rate (%) Power Consumption (kWh) Sex Ratio (Female per 1 male) Female feticide (0-6 years females per 1000 0-6 years males) Per capita GDP of a state Unemployment ratios [(per 1000) for persons of age 15 years & above] Valid N (listwise)

N Minimum Maximum 35 .00 5.20 35 35 35 35 35 35 35 35 35 35 35

Std. Mean Deviation 3.0951 1.19164

.65 1998.12 345.8770 444.58358 -.58 55.88 18.9434 11.15292 17.00 11320.00 1095.8000 2390.03028 1.00 39.93 18.4820 11.58778 61.80 94.00 77.9171 8.59525 122.11 11863.64 1379.6394 2168.11137 .62 1.08 .9312 .07988 830.00 971.00 921.8000 38.56759 4.08 6.00 5.12 209.00 4.2234 55.0571 1.33170 47.84838

5. EXPLORATION
Several figures, graphs and tables were generated to undertake exploration of the data and get a better sense of the data. These figures helped in recognizing any inconsistencies in the data and also enabled better analysis. Following are some of the initial graphs that were made to better understand the data.

5.1.

HISTOGRAM

As the incidents of crimes against women is a count variable the histogram for the total incidents of crimes against women as left skewed as evident from the below figure.

This indicates that the data for the dependent variable is not normally distributed and is left skewed. This violates the condition of the data being normally distributed for the purposes of building a linear regression. To counter this logarithmic transformation can be applied to the data for the dependent variable. Doing this will result in normally distributed data for incidents of crimes against women in India. This can be depicted by the histogram on the next page.

5.2.

CORRELATIONS

Correlation can be applied to do an exploratory check whether any of the independent variables have correlations, this is done to verify that colinearity does not exist between the variables.
pop pop pop_growth pop_density poverty literacy edu power sex_r feticide gsdp unemployment banks villages cong_not 1 -.078 -.129 .237 -.474 .908 -.223 .123 -.127 .759 -.276 .934 .900 -.213 pop_growth pop_density poverty literacy edu power sex_r feticide gsdp unemployment -.078 -.129 .237 -.474 .908 -.223 .123 -.127 .759 -.276 1 .091 .322 -.192 -.056 .775 -.617 -.064 -.139 -.429 .091 1 -.120 .307 -.204 .088 -.314 -.346 -.052 -.032 .322 -.120 1 -.575 .328 .128 .003 .229 -.012 -.372 -.192 .307 -.575 1 -.580 .108 -.020 .068 -.211 .400 -.056 -.204 .328 -.580 1 -.226 .134 -.126 .666 -.349 .775 .088 .128 .108 -.226 1 -.633 -.113 -.149 -.206 -.617 -.314 .003 -.020 .134 -.633 1 .429 .147 .180 -.064 -.346 .229 .068 -.126 -.113 .429 1 -.275 .211 -.139 -.052 -.012 -.211 .666 -.149 .147 -.275 1 -.355 -.429 -.032 -.372 .400 -.349 -.206 .180 .211 -.355 1 -.168 -.093 .069 -.346 .832 -.213 .196 -.184 .881 -.326 -.040 -.204 .403 -.584 .951 -.235 .120 -.094 .558 -.306 .156 .008 -.172 .203 -.194 .175 -.077 .008 .037 -.065 banks villages cong_not .934 .900 -.213 -.168 -.040 .156 -.093 -.204 .008 .069 .403 -.172 -.346 -.584 .203 .832 .951 -.194 -.213 -.235 .175 .196 .120 -.077 -.184 -.094 .008 .881 .558 .037 -.326 -.306 -.065 1 .770 -.110 .770 1 -.250 -.110 -.250 1

Barring a few statistically significant and also high correlations (in yellow) most of the variables appear not to be correlated. To ensure that the data meets the colinearity condition of

regression model, the variables which have correlations between them can be removed for the analysis. The following are the variables which were removed: i. ii. iii. Number of village in a state (variable name villages) Number of banks in a state (variable name banks) Number of education centers in a state (variable name edu)

Each of these variables had high correlation with some other variable and for the purpose of the regression model only one of these variables would be required as a high correlation indicates that the two variables in context have similar characteristics.

5.3.

BOXPLOTS

These were used to identify any invalid entries i.e. any variables with values which were too high or too low. This also helped in deciding on the spread of the data.

5.4.

PARALLEL PLOTS

The plot below shows the value for all each of the states on all the variables simultaneously. The red lines indicate the states with power consumption more than the median (880 kwh per capita) of all the states and blue indicate the states with power consumption less than 880 kwh per capita.

Figure 1: Parallel Plot of variables

Inferring from the figure above it is apparent that when power consumption of a state is high (Red), the unemployment ratios are low indicating that the power consumption can be used as somewhat a measure of prosperity of a state. Also indicated from the figure that for most of the states with low power consumption the incidents of crime against women was high indicating that the states which were less prosperous had more crimes against women. These implications are only limited to the set of variables that were identified for this study.

10

6. REGRESSION MODEL
From the 14 variables only two were relevant from a regression perspective and this chapter details about the outcomes following a linear regression of the data. The two variables which were statistically significant for a linear regression model are: i. ii. Literacy ratio (%) Per GDP of the state

Per capita GDP of a state are indicators of prosperity for a state and signifies how developed a state is in comparison to others. Whereas literacy rate describes the percentage of population in a state that literate. The regression model for the data is described below: Incidents of crime against women (predicted) 0.438 (Per capita GSDP)

-5.776

-0.058 (Literacy rate)

The model tells us that crime is predicted to decrease -5.776 when literacy ratio and GDP of a state are zero. Incidents of crime against women is predicted to decrease by 0.058 when literacy rate of a state increases by one unit provided all other variables are constant. On the other hand crime against women is going to increase by 0.438 when GDP goes up by one unit holding all other variables constant. This model describes that in order to curb crime against women a state in India needs to improve its literacy rate. The relation described with per capita GDP of a state indicates that in the country as a state is getting developed crime against women is increasing indicating that even though development is an important factor states need to focus on decreasing crime against women as well.

Following are other relevant statistics for the regression model explained above:
Mean Incidents of crime against women (per 1,00,000 of population) (log) 3.0951 Std. Deviation 1.19164 N 35

11

Literacy Rate (%) Per capita GDP of a state

77.9171 4.2234

8.59525 1.33170

35 35

The table above describes the data involved in the regression model for the data. The table below gives the model summary for the regression.
Incidents of crime against women (per 1,00,000 of population) (log) Incidents of crime against women (per 1,00,000 of population) (log) Literacy Rate (%) Per capita GDP of a state 1.000 -.487 .547 Literacy Rate (%) -.487 1.000 -.137 Per capita GDP of a state .547 -.137 1.000

There is low correlations between incidents of crimes against women and GDP of the state. This indicates that there is little to no collinearity between the data for the two variables.
Std. Error of the Model 1 R .687a R Square .471 Adjusted R Square .438 Estimate .89305 Durbin-Watson 2.284

The R square statistics is observed to measure how much standard error the model is able to explain. The higher the value of R square the better the model is, in this study the R square is 0.471 indicating that the model is able to explain 47.1% of the total standard error. The Durbin-Watson statistic is used to check the independence of error. The acceptable range for Durbin-Watson is 1.5 to 2.5. As evident from the table above the error in the model is independent as the Durbin Watson is 2.284.

Unstandardized Coefficients Model 1 (Constant) Literacy Rate (%) Per capita GDP of a state B 5.776 -.058 .438 Std. Error 1.555 .018 .116

Standardized Coefficients Beta t 3.715 -.419 .489 -3.231 3.769 Sig. .001 .003 .001

Collinearity Statistics Tolerance VIF

.981 .981

1.019 1.019

12

The unstandardized coefficients (B) are explained in the regression model and the significance values of each of the independent variables are below 0.05. VIF for GDP and number of villages is significantly below 3.5 and this indicates that there is no co-linearity in the independent variables. Collinearity is further explained in the table below.
Variance Proportions Per capita GDP of

Model 1

Dimension 1 2 3

Eigenvalue 2.930 .065 .005

Condition Index 1.000 6.703 23.543

(Constant) .00 .02 .98

Literacy Rate (%) .00 .04 .96

a state .01 .88 .11

For three independent variables the collinearity diagnostics gives three linear combinations or dimensions. Each of the three dimensions are in the ascending order of the variance that they explain which is indicated by Eigenvalue.

13

The residual graph of the dependent variable indicates that the residuals are normally distributed. The study has already taken into account the count variable by transforming the dependent variable data to log form.

The plot above indicates that the observed values from the model fits the expected pattern well enough to support the conclusion that residuals are normally distributed. Following are the partial plots for each of the variables in the regression model viz. literacy rate and per capita GDP of a state. The partial plot for literacy rate indicates a clear negative linear correlation with incidents of crime against women which further strengthens the model as well. The partial plot for per capita GDP of a state indicates a positive linear correlation but it is not as apparent as in the case of literacy rate.

14

15

7. CONCLUSION & RECOMMENDATIONS


7.1. CONCLUSIONS

The following are the conclusions that were drawn from the analysis of the data: The regression model built considers development indicators (per capita GDP of a state) and the literacy rate of a state as an indicators of incidents of crimes against women. The model suggests that if the variables are zero the crime incidents are likely to decrease whereas if GDP of a state increases the crime is also likely to increase and if literacy rate increase the incidents will decrease. The model is however weak on goodness of fit (R square = 0.471) indicating a need for more number of observations.

7.2.

RECOMMENDATIONS

According to the model to control the crime incidents against women literacy rate of a state should be increased this concurs with the common beliefs as well. Therefore to control the spread of crime against women a state needs to focus on improving its literacy rate this can be done by improving the education and negating the popular reasons because of which people drop out of school. The model also suggests that while the per capita GDP of a state increases the crime rate also increases this indicates a gap in the part of the state, that although it is developing it is not adequately providing for safety for women. The states need to prioritize the safety of women so that development of a state and restriction to the incidents of crimes against women can be undertaken simultaneously.

16

8. LIMITATIONS
The study had the following limitations: i. ii. Due to few number of observations or states in the country the regression model was weak. The identified variables in the study were not adequate enough to explain the incidents of crime against women in the country.

To counter these limitations data transformation can be used to alter the values of the data and attempt to gain a better linear regression model.

17

9. REFERENCES
Wickham H., Tidy Data, Journal of Statistical Software <http://vita.had.co.nz/papers/tidydata.pdf> R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL <http://www.R-project.org/.> http://planningcommision.nic.on/ http://unidow.com/ http://www.mapsofindia.com/ http:/censusindia.gov.in/ http:/data.gov.in/ http:/labourbureau.nic.in/ http:/updateox.com/ http:/www.census2011.co.in/ http:/www.census2011.co.in/ http:/www.kseboa.org/ http:/www.rbi.org.in/

18