You are on page 1of 2

Sustainable Future?

Employees spend a large portion of their lives at work each day. Providing a healthy and
sustainable work environment helps to promote productivity and create a culture of employees
that value the health of their surroundings.

To sustain a safe, healthy and functional workplace, an organization must train its employees the
practical and efficient work processes to minimize the impact of employee production on the
office environment and employees. Sustainable workplace practices go beyond what is required
by law and ensure longevity and overall well-being of the workforce.

Problem Statement

The company has been conducting several sustainability workshops to promote sense of
responsibility towards the environment in its employees. How often do you, as an employee,
involve yourself in sustainability practices? Quite often? Then, be interested in we finding out if
you will continue to practice sustainability methods in your future.

Data pre-processing

All the variables available to us can be divided into independent variables – Index, RowNumber,
EmployeeID, Surname, TrainingScore, Geography, Gender, Age, TrainingLevel, HasRewardCard,
IsActiveMember and the dependent variable – Exited.

Based on the independent variables, we will be predicting the outcome of the dependent
variable. In order to build an efficient model, we need to consider our variables with care. Since
RowNumber, has no impact on whether the employee exits the training or not, it will not be
included in our ‘matrix of features’. The same goes for EmployeeID and Surname. The variables
that we consider in our ‘matrix of features’ will be TrainingScore, Geography, Gender, Age,
TrainingLevel, HasRewardCard, IsActiveMember.

Once the variables are selected, we need to check for categorical variables among them so as to
carry out the process of encoding. We have independent variables – Geography and Gender, that
have categories as strings. Hence, these variables must be encoded before being used. Also, since
Geography has three values associated with it, we need to create dummy variables for this
variable, and even make sure to avoid the dummy variable trap.

You might also like