ML Classification Project: Telecom Churn

customer-churn@2x

Background

A lot of Telecom companies face the prospect of customers switching over to other service providers.  This projects builds a model to predict whether a customer would continue to stay back with the existing provider or is likely to move over to another customer.

Step 1 : Data Sourcing and Wrangling

The data was sourced from here on Kaggle (you got to be a Kaggle member to get the data).  The column headers and their description is given below:

Customer ID
network_age
Customer tenure in month
Total Spend in Months 1 and 2 of 2017
Total SMS Spend
Total Data Spend
Total Data Consumption
Total Unique Calls
Total Onnet spend
Total Offnet spend
Total Call centre complaint calls
Network type subscription in Month 1
Network type subscription in Month 2
Most Loved Competitor network in in Month 1
Most Loved Competitor network in in Month 2
Churn Status

Transformations

Data Transformations were applied to the above fields highlighted in red.  Customer id has a prefix of ADF.  This prefix is stripped and the rest of the data is numeric.

Most Loved Competitor has the name of the service provider.  This is transformed as below:

Mango 1
PQza 2
ToCall 3
Uxaa 4
Weematel 5
Zintel 6
0 7

Again there are some rows that have ‘blanks’ this is defaulted to 0 and those that already had 0 are changed to 7.

Later all columns are transformed to Decimal type after the above two transformations.

Step 2 : Data Analysis and Correlation

The correlation of all fields to the “Churn Status” field was gauged

Customer ID.2 0.002494
network_age -0.12423
Customer tenure in month -0.12423
Total Spend in Months 1 and 2 of 2017 -0.02961
Total SMS Spend 0.099149
Total Data Spend 0.036429
Total Data Consumption -0.14214
Total Unique Calls -0.13405
Total Onnet spend -0.00479
Total Offnet spend 0.103442
Total Call centre complaint calls -0.07159
Network type subscription in Month 1 0.062901
Network type subscription in Month 2 0.040454
Most Loved Competitor network in in Month 1 -0.12587
Most Loved Competitor network in in Month 2 -0.20108
Churn Status 1

I did a small experiment to include only those fields that had a high positive correlation – in the prediction model.  I tried to see how accurate the model was.  It didn’t turn out well.  When I included all other fields (even those that had a negative correlation) – the model accuracy increased. 

Step 3 : Model Construction

I built the models on both Azure and Python

Azure

Telechurn

This time I restricted the algorithms used only to Two-Class Neural Network, Averaged Perceptron and Bayes Point Machine.  I did this because my earlier Titanic project showed me that the other classification models on Azure were not as good.  The project is published on cortana gallery here

Azure: Model Metrics

Model Two-Class Neural Network Two-Class Averaged Perceptron Two-Class Bayes Point Machine
Metrics
AUC 0.988 0.883 0.885
Accuracy 0.94 0.853 0.817
Precision 0.956 0.891 0.865
Recall 0.935 0.833 0.79
True Positive 129 115 109
False Positive 9 23 29
True Negative 6 14 17
False Negative 108 100 97

As you can see the Neural Network had the best efficacy on Azure

Python

The entire project was done on Python as well and the algorithms used were Decision Tree Classifier (two variants were used here – with different hyper-parameters) and Random Forest Classifier.

Python: Model Metrics

Model Decision Tree Classifier 1 Decision Tree Classifier 2 Random Forest Classifier
Metrics
Score 0.934156379 0.991769547 0.962962963
True Positive 531 559 559
False Positive 28 0 0
True Negative 58 0 0
False Negative 490 548 548

As can be seen the second variant of the Decision Tree Classifier was found to be the best.

Hence this python model was used for the final predictions.

Step 4: Making Actual Predictions

When the second variant of the Decision Tree Classifier was run on the validation dataset – the results were like this below.  There was only one error (highlighted in red) out of 49 records:

Actual Churn Status Predicted Churn
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
0 0
0 0
1 0
1 1

The hyper-parameters of the second variant of the Decision Tree Classifier is as below:

max_depth = 10, min_samples_split = 5, random_state = 1

The entire project is here on github along with the python notebook

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: