
Improving customer retention is a common request received by data scientists. The typical approach involves four steps:
- Build a predictive machine learning model based on observed data about the customer base
- Once a prediction has been made, use a post-hoc explainability method to understand the importance of each feature within each prediction
- View which features are highly correlated with churners and apply blanket policies to try to prevent churn
- Adhere to the required MLOps processes to ensure model is deployed and maintained in production
This approach brings some statistical rigor to the process of customer retention. However, the approach has some fundamental challenges. Most notably, it is based on historical, statistical associations. It is very difficult to ascertain if these associations are causal or simply spurious. For example, causaLens has seen datasets which show a positive correlation between discounts offered and churn rate. This is a counterintuitive result and one that a domain expert is likely to question. A subsequent causal analysis of the same datasets reveals that customers who are dissatisfied are more likely to complain, and it is this that results in them being offered a discount. This type of ‘confounded’ relationship is not picked up by a correlational approach.