Model Performance – Super Business Manager

Generalization is the ultimate goal in model performance. It refers to a model’s ability to make accurate predictions on new data, demonstrating that it has learned the underlying patterns rather than just memorizing the training examples.

Underfitting and overfitting are two fundamental problems in machine learning that describe how well a model generalizes from the data it was trained on to new, unseen data.

Model Performance displaying overfitting, generalization and overfitting. Source: DataInterview.com

1. Regression

Regression models predict a continuous output value (like a house’s price).

Overfitting

The model is too complex and fits the training data too closely, including the noise. This results in an excellent fit on the training data but poor performance on new data because the model has essentially “memorized” the training examples rather than learning the general trend.

 The model is too complex and fits historical sales data, including random fluctuations or one-time events like a holiday sale. It would perform well on past data but fail to accurately predict future sales because it has memorized the noise, not the underlying trend. This could lead to a company overspending on advertising with the false expectation of high returns.

Generalization

The model finds the ideal balance. It is complex enough to capture the true underlying pattern in the data without being influenced by random noise. It performs well on both the training and test datasets.

The model captures the general relationship between advertising and sales. It correctly predicts that a certain increase in ad spend leads to a proportional increase in sales, but it isn't fooled by past anomalies. A generalized model gives a business a reliable tool for budget allocation and sales forecasting.

Underfitting

The model is too simple to capture the underlying patterns in the training data. This results in a poor fit on the training data and, consequently, on new data. For example, trying to fit a straight line to data that is clearly curved will lead to underfitting.

The model is too simple to capture the relationship between advertising and sales. For example, if the true relationship is non-linear (e.g., sales increase faster at higher ad spending levels), a simple linear model would fail to capture this. This could lead to a company setting an advertising budget that is too low because the model suggests diminishing returns when in reality, they are increasing.

2. Classification

Classification models predict a discrete output class (like whether an email is spam or not).

Overfitting

The model creates a decision boundary that is overly complex and perfectly separates the training examples, even those that are noise or outliers. This leads to high accuracy on the training data but poor performance on unseen data because the model’s boundary is too specific.

A bank's fraud detection model is overfitted if it's trained on a specific set of past fraudulent transactions and becomes too rigid. It might be excellent at catching those exact types of fraud but fail to detect new, slightly different fraudulent activities. This could lead to a high number of false negatives, where new fraud goes undetected.

Generalization

The model creates a decision boundary that is smooth and captures the core separation between classes. It performs well on both the training and test sets, showing that it has learned a robust and generalizable pattern.

A well-generalized fraud detection model learns the key indicators of fraud (e.g., unusual spending patterns, location changes) rather than specific transaction details. This model is robust and can accurately identify both historical fraud and new, previously unseen fraudulent activities.

Underfitting

The model’s decision boundary is too simple and fails to separate the classes effectively, even on the training data. For instance, a linear model used to classify data that is not linearly separable will underfit.

A model is underfit if it's too simple to identify fraudulent transactions. For example, a model that only flags transactions over a certain dollar amount would be underfitted. It would miss a lot of small-dollar fraudulent transactions and falsely flag legitimate large transactions.

3. Error Plots (Learning Curves)

An error plot, or learning curve, shows the model’s performance on the training set and a validation (or test) set as the amount of training data or the training time increases.

Overfitting

The training error is low and continues to decrease, while the validation error is high and may even start to increase after a certain point. The large gap between the two curves indicates that the model is performing well on the data it has seen but poorly on new data.

Imagine a company training a model to classify customer reviews as positive or negative. The learning curve for an overfitted model would look like this:
- Training Error Curve (Red Line): This curve would start high and quickly drop to a very low level. The training loss continues to decrease, approaching zero. This indicates the model is becoming very good at fitting the data it has already seen, effectively memorizing the training examples.
- Test Error Curve (Blue Line): This curve also starts high but decreases for a while before bottoming out and then potentially starting to rise again. The key characteristic is the significant and growing gap between the training and validation errors. This gap shows the model's poor performance on new data, a classic sign of overfitting. The model has learned the training data's noise and quirks, not the generalizable patterns.

Generalization

The training and validation error curves converge and both are low. This indicates that the model is learning the underlying patterns and performing well on both the training and unseen data, achieving a good balance.

A financial institution is training a model to detect fraudulent transactions. A learning curve that indicates good generalization would be:
- Training Error Curve (Red Line): This curve starts high and decreases as the model trains, eventually leveling off at a low, stable value.
- Test Error Curve (Blue Line): This curve starts high but decreases in tandem with the training error curve. The two curves converge and stay close together at a low error rate. This is the ideal scenario. The model has learned a robust, generalizable pattern from the training data and is performing consistently well on unseen data. This indicates the model is reliable for real-world application.

Underfitting

Both the training and validation errors are high and have a small gap between them. This shows that the model isn’t complex enough to learn from the training data, and its poor performance is consistent across both seen and unseen examples.

A hospital wants to build a simple model to predict patient readmission risk. The learning curve for an underfitted model would show:
- Training Error Curve (Red Line): This curve would stay high and flat, even as more data is added. The model simply isn't complex enough to learn the underlying relationships in the data.
- Test Error Curve (Blue Line): This curve would also be high and flat, very close to the training error curve. There is little to no gap between the two. The model is consistently performing poorly on both the training and validation data, indicating it lacks the capacity to capture the problem's complexity. A more sophisticated model or additional features are needed.