Description
Purpose
By responding to this Request for Proposal (RFP), the Proposer agrees that s/he has read and understood all documents within this RFP package.
Background
The Commercial Banking Corporation (hereafter the “Bank”), acting by and through its department of Customer Services and New Products is seeking proposals for banking services. The Bank ultimately wants to predict which customers will buy a variable rate annuity product. Previously the bank sought consulting work on the same project, but also had a focus on understanding the factors involved. Here the focus is more on predictive power.
A variable annuity is a contract between you and an insurance company / bank, under which the insurer agrees to make periodic payments to you, beginning either immediately or at some future date. You purchase a variable annuity contract by making either a single purchase payment or a series of purchase payments.
A variable annuity offers a range of investment options. The value of your investment as a variable annuity owner will vary depending on the performance of the investment options you choose. The investment options for a variable annuity are typically mutual funds that invest in stocks, bonds, money market instruments, or some combination of the three. If you are interested in more information, see: http://www.sec.gov/investor/pubs/varannty.htm
The project will be broken down into 3 phases:
- Phase 1 – MARS and GAMs
- Phase 2 – Tree-Based Models
- Phase 3 – Model Interpretation
Objective – Phase 1
The scope of services in this phase includes the following:
- For this phase use only the ins_t data set.
- Previous analysis has identified potential predictor variables related to the purchase of theinsurance product so no initial variable selection before model building is necessary.
- The data has missing values that need to be imputed.o Typically,theBankhasusedmedianandmodeimputationforcontinuousand categorical variables but are open to other techniques if they are justified in the report.
•
- The Bank is interested in the value of the MARS algorithm.
o BuildamodelusingtheMARSalgorithm.
§ (HINT: You CANNOT just copy and paste the code from class. In class we built a
model to predict a continuous variable. You will need to look up the
documentation for the ‘glm = ‘ option.)
o TheBankhasnottraditionallyusedCVforitsmodelbuilding.Ifyoudesireto,defend
your choice in the report.
§ (HINT: You DO NOT need to do CV here if you don’t want to. For those
interested in digging deeper, you can use the ‘trace = ‘, ‘nfold = ‘, and ‘pmethod
= ‘ options to get a CV approach to model selection from the MARS algorithm.) o Reportthevariableimportanceforeachofthevariablesinthemodel.
o ReporttheareaundertheROCcurveaswellasaplotoftheROCcurve.
§ (HINT: Use the same approaches you used back in the logistic regression class.) • The Bank is also interested in the value of the GAM approach to model building.
o BuildaGAMmodelusingsplinesonthecontinuousvariables.
§ (HINT: You CANNOT just copy and paste the code from class. In class we built a
model to predict a continuous variable. You will need to look up the
documentation for the ‘family = ‘ option.)
o ListthevariablesyouchosetokeepinyourfinalGAMmodelanddefendyourreasoning. o ReporttheareaundertheROCcurveaswellasaplotoftheROCcurve.
§ (HINT: Use the same approaches you used back in the logistic regression class.)
Data Provided
The following two sets of data are provided for the proposal:
- The training data set insurance_t contains 8,495 observations and selected variables.o Allofthesecustomershavebeenofferedtheproductinthedatasetunderthevariable INS, which takes a value of 1 if they bought and 0 if they did not buy.
o Thereareselectedvariablesdescribingthecustomer’sattributesbeforetheywere offered the new insurance product.
- The validation data set insurance_v contains 2,124 observations and selected variables.
- The table below describes the Roles and Description of the variables found in both data sets.o Except for Branch of Bank, consider anything with more than 10 distinct values as continuous.
Name Model Role Description
ACCTAGE DDA DDABAL DEP DEPAMT CHECKS DIRDEP NSF NSFAMT PHONE TELLER SAV SAVBAL ATM ATMAMT POS POSAMT CD CDBAL IRA IRABAL INV INVBAL MM MMBAL MMCRED CC CCBAL CCPURC SDB INCOME LORES HMVAL AGE CRSCORE INAREA INS BRANCH
Input Age of oldest account
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Target
Input
Indicator for checking account
Checking account balance
Checking deposits
Total amount deposited
Number of checks written
Indicator for direct deposit
Number of insufficient fund issues
Amount of NSF
Number of telephone banking interactions
Number of teller visit interactions
Indicator for savings account
Savings account balance
Indicator for ATM interaction
Total ATM withdrawal amount
Number of point of sale interactions
Total amount for point of sale interactions
Indicator for certificate of deposit account
CD balance
Indicator for retirement account
IRA balance
Indicator for investment account
INV balance
Indicator for money market account
MM balance
Number of money market credits
Indicator for credit card
CC balance
Number of credit card purchases
Indicator for safety deposit box
Income
Length of residence in years
Value of home
Age
Credit score
Indicator for local address
Indicator for purchase of insurance product
Branch of bank





