top of page

Machine learning for classification risk

​

Machine learning techniques are often used for financial analysis and decision-making tasks such as accurate forecasting, classification of risk, estimating probabilities of default, and data mining. However, implementing and comparing different machine learning techniques to choose the best approach can be challenging. Machine learning is synonymous with Non-parametric modeling techniques. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and determined from data.

​

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required in order to assess if the product (bank term deposit) would be or would not be subscribed.

​

The classification goal is to predict if the client will subscribe a term deposit or not (variable y). The data set contains 45211 observations capturing 16 attributes/features.

​

Attributes:

  1. age (numeric)

  2. job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services")

  3. marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)

  4. education (categorical: "unknown","secondary","primary","tertiary")

  5. default: has credit in default? (binary: "yes","no")

  6. balance: average yearly balance, in euros (numeric)

  7. housing: has housing loan? (binary: "yes","no")

  8. loan: has personal loan? (binary: "yes","no")

  9. contact: contact communication type (categorical: "unknown","telephone","cellular")

  10. day: last contact day of the month (numeric)

  11. month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")

  12. duration: last contact duration, in seconds (numeric)

  13. campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)

  14. pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)

  15. previous: number of contacts performed before this campaign and for this client (numeric)

  16. poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")

​

Output variable (desired target):

  1. y: has the client subscribed a term deposit? (binary: "yes","no")

 

​

Cross validation is almost an inherent part of machine learning. Cross validation may be used to compare the performance of different predictive modeling techniques. In this example, we use holdout validation In this example, we partition the data into training set and test set. The training set will be used to calibrate/train the model parameters. The trained model is then used to make a prediction on the test set. Predicted values will be compared with actual data

​

Compare different algorithms:

​

  1. Neural Networks

  2. Generalized Linear Model - Logistic Regression

  3. Discriminant Analysis

  4. Classification Using Nearest Neighbors

  5. Naive Bayes Classification

  6. Support Vector Machines

  7. Decision Trees

  8. Rduced TreeBagger

 

​

​

​

​

CONTACT ME

© 2017 By Alessandro  Spelta. 

Alessandro Spelta

RESEARCHER & DATA ANALYST

​

Phone:

+39-3475602788

Email:

alessandro.spelta@unicatt.it

​

Success! Message received.

bottom of page