You are on page 1of 6

Assignment 1: Case Discussion

This is a group work and all the groups have to solve all the cases and submit the soft copies of
their presentation slides along with R codes by 17th January 2022 (11.00 a.m). The cases have to
be presented on 19th January during class hours.

Group-wise allocation
Groups 2,4,7,5,9 : case 2, case 3
Groups 1,3,6,8 : case 1, case 4

Case 1: XYZ Catalog marketing


The file Catalog Marketing.xlsx contains data on 1000 customers who purchased mail-order
products from XYZ Company in the current year. XYZ is a direct marketer of stereo equipment,
personal computers, and other electronic products. XYZ advertises entirely by mailing catalogs to
its customers, and all of its orders are taken over the telephone. The company spends a great deal
of money on its catalog mailings, and it wants to be sure that this is paying off in sales. For each
customer there are data on the following variables:

- Age: age of the customer at the end of the current year


- Gender: coded as 1 for males, 0 for females
- Own Home: coded as 1 if customer owns a home, 0 otherwise
- Married: coded as 1 if customer is currently married, 0 otherwise
- Close: coded as 1 if customer lives reasonably close to a shopping area that sells similar
merchandise, 0 otherwise
- Salary: combined annual salary of customer and spouse (if any)
- Children: number of children living with customer
- Previous Customer: coded as 1 if customer purchased from XYZ during the previous-year, 0
otherwise
- Previous Spent: total amount of purchases made from XYZ during the previous year
- Catalogs: number of catalogs sent to the customer this year
- Amount Spent: total amount of purchases made from XYZ this year

Estimate and interpret a regression equation for Amount Spent using the given data.

Case 2: Medical Expenses

In order for a health insurance company to make money, it needs to collect more in yearly
premiums than it spends on medical care to its beneficiaries. Consequently, insurers invest a
great deal of time and money to develop models that accurately forecast medical expenses for
the insured population. The medical_expenses.csv file includes 1,338 examples of beneficiaries
currently enrolled in the insurance plan, with features indicating characteristics of the patient
as well as the total medical expenses charged to the plan for the calendar year. The features
are:
• age: An integer indicating the age of the primary beneficiary (excluding those above 64 years,
as they are generally covered by the government).
• sex: The policy holder's gender: either male or female.
• bmi: The body mass index (BMI), which provides a sense of how over or underweight a person
is relative to their height. BMI is equal to weight (in kilograms) divided by height (in meters)
squared. An ideal BMI is within the range of 18.5 to 24.9.
• children: An integer indicating the number of children/dependents covered by the insurance
plan.
• smoker: A yes or no categorical variable that indicates whether the insured regularly smokes
tobacco.
• region: the beneficiary’s residential area in the US, northeast, southeast, southwest,
northwest.
• charges: Individual medical costs billed by health insurance

Use patient data to predict the medical expenses.

Case 3: House Sales

A real estate company wants to know the determinants of house sell. They have collected data
from the last few years, with a Sale Status variable, that indicates whether a house was sold slow
or fast. You are asked to conduct an analysis and help the company in predicting Sale Status of a
house.

Dataset: HouseSell.xlsx

Variable Description

• MSSubClass: Identifies the type of dwelling involved in the sale.


o 20: 1-story 1946 & newer all styles
o 30: 1-story 1945 & older
o 40: 1-story w/finished attic all ages
o 45: 1-1/2 story - unfinished all ages
o 50: 1-1/2 story finished all ages
o 60: 2-story 1946 & newer
o 70: 2-story 1945 & older
o 75: 2-1/2 story all ages
o 80: Split or multi-level
o 85: Split foyer
o 90: Duplex - all styles and ages
o 120: 1-story pud (planned unit development) - 1946 & newer
o 150: 1-1/2 story pud - all ages
o 160: 2-story pud - 1946 & newer
o 180: pud - multilevel - incl split lev/foyer
o 190: 2 family conversion - all styles and ages

• MSZoning: Identifies the general zoning classification of the sale.


o Agriculture
o Commercial
o FV- Floating Village Residential
o Industrial
o RH- Residential High Density
o RL- Residential Low Density
o RP- Residential Low-Density Park
o RM- Residential Medium Density

• LotArea: Lot size in square feet

• Neighborhood: Physical locations within Ames city limits


o Blmngtn- Bloomington Heights
o Blueste- Bluestem
o BrDale- Briardale
o BrkSide- Brookside
o ClearCr- Clear Creek
o CollgCr- College Creek
o Crawfor- Crawford
o Edwards- Edwards
o Gilbert- Gilbert
o IDOTRR- Iowa DOT and Rail Road
o MeadowV- Meadow Village
o Mitchel- Mitchell
o Names- North Ames
o NoRidge- Northridge
o NPkVill- Northpark Villa
o NridgHt- Northridge Heights
o NWAmes- Northwest Ames
o OldTown- Old Town
o SWISU- South & West of Iowa State University
o Sawyer- Sawyer
o SawyerW- Sawyer West
o Somerst- Somerset
o StoneBr- Stone Brook
o Timber- Timberland
o Veenker- Veenker
• Condition1: Proximity to various conditions
o Artery- Adjacent to arterial street
o Feedr- Adjacent to feeder street
o Norm- Normal
o RRNn- Within 200' of North-South Railroad
o RRAn- Adjacent to North-South Railroad
o PosN- Near positive off-site feature--park, greenbelt, etc.
o PosA- Adjacent to positive off-site feature
o RRNe- Within 200' of East-West Railroad
o RRAe- Adjacent to East-West Railroad

• BldgType: Type of dwelling


o 1Fam- Single-family Detached
o 2FmCon- Two-family Conversion; originally built as one-family dwelling
o Duplx- Duplex
o TwnhsE- Townhouse End Unit
o TwnhsI- Townhouse Inside Unit

• HouseStyle: Style of dwelling


o 1Story- One story
o 1.5Fin- One and one-half story: 2nd level finished
o 1.5Unf- One and one-half story: 2nd level unfinished
o 2Story- Two story
o 2.5Fin- Two and one-half story: 2nd level finished
o 2.5Unf- Two and one-half story: 2nd level unfinished
o SFoyer- Split Foyer
o SLvl- Split Level

• OverallQual: Rates the overall material and finish of the house


o 10- Very Excellent
o 9- Excellent
o 8- Very Good
o 7- Good
o 6- Above Average
o 5- Average
o 4- Below Average
o 3- Fair
o 2- Poor
o 1- Very Poor

• OverallCond: Rates the overall condition of the house


o 10- Very Excellent
o 9- Excellent
o 8- Very Good
o 7- Good
o 6- Above Average
o 5- Average
o 4- Below Average
o 3- Fair
o 2- Poor
o 1- Very Poor

• YearBuilt: Original construction date

• YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)

• RoofStyle: Type of roof


o Flat- Flat
o Gable- Gable
o Gambrel- Gabrel (Barn)
o Hip- Hip
o Mansard- Mansard
o Shed- Shed

• MasVnrType: Masonry veneer type


o BrkCmn- Brick Common
o BrkFace- Brick Face
o CBlock- Cinder Block
o None- None
o Stone- Stone

• MasVnrArea: Masonry veneer area in square feet

• GrLivArea: Ground living area square feet

• BsmtFullBath: Basement full bathrooms

• BsmtHalfBath: Basement half bathrooms

• FullBath: Full bathrooms above Ground

• HalfBath: Half baths above Ground

• BedroomAbvGr: Bedrooms above Ground

• KitchenAbvGr: Kitchens above Ground

• GarageType: Garage location


o 2Types- More than one type of garage
o Attchd- Attached to home
o Basment- Basement Garage
o BuiltIn- Built-In (Garage part of house - typically has room above garage)
o CarPort- Car Port
o Detchd- Detached from home
o NA- No Garage

• GarageYrBlt: Year garage was built

• GarageFinish: Interior finish of the garage


o Fin- Finished
o RFn- Rough Finished
o Unf- Unfinished
o NA- No Garage

• GarageCars: Size of garage in car capacity

• GarageArea: Size of garage in square feet

• SaleCondition: Condition of sale


o Normal- Normal Sale
o Abnorml- Abnormal Sale- trade, foreclosure, short sale
o AdjLand- Adjoining Land Purchase
o Alloca- Allocation - two linked properties with separate deeds, typically condo with a garage
unit
o Family- Sale between family members
o Partial- Home was not completed when last assessed (associated with New Homes)

• SaleStatus
o SoldFast – House was sold within 6 months
o SoldSlow – House was sold after 6 months

Case 4: Yogurt Trier


The yogurt.xlsx file contains the data on 856 people who have either tried or not tried a company’s
new frozen yogurt product. The dataset contains a categorical variable, Have Tried, and several
other variables to capture demographic information about customers. Use the dataset to build an
appropriate model to predict whether someone will try new frozen yogurt or not.

You might also like