Machine learning models make predictions. Those predictions are never perfect. The question every business should ask is not are we wrong — all models are wrong sometimes — but how wrong are we, and in which direction does that matter? RMSE gives you a direct, unit-native answer to the first part of that question.

This dispatch demystifies RMSE from the ground up: what it is, how it is computed, why squaring errors is a deliberate design choice, and how to interpret a number that your data team will produce without always explaining. By the end, you will know exactly what to ask.

// 01 What RMSE Actually Is

RMSE stands for Root Mean Square Error. Strip the acronym and you get a plain-language description: the average size of your model’s mistakes, expressed in the same units as whatever you are predicting.

This unit-preservation matters enormously. If your model predicts monthly revenue in Moroccan dirhams, RMSE is in dirhams. If it predicts delivery time in hours, RMSE is in hours. If it predicts product demand in units, RMSE is in units. You do not need to decode a dimensionless ratio or a percentage — the number speaks the same language as your operations.

KEY_INSIGHT_01

If your revenue prediction model has an RMSE of 12,000 MAD, it means the model’s predictions are off by an average of 12,000 MAD. That context is everything.

That single sentence — “the predictions are off by an average of 12,000 MAD” — is something a CFO can evaluate. Is 12,000 MAD a significant error in the context of your business? If you are predicting 2,000,000 MAD revenue months, probably not. If you are predicting 15,000 MAD orders, it is a fatal flaw. The number becomes a business decision once you anchor it to the scale of what you are measuring.

// 02 The Formula in Plain Language

RMSE is defined as the square root of the average of squared errors. Breaking that down into four concrete steps makes it mechanical:

  • STEP 1 — For each prediction, calculate the error: (actual value − predicted value)
  • STEP 2 — Square each of those errors to make them positive and amplify large ones
  • STEP 3 — Average all the squared errors across every prediction in your dataset
  • STEP 4 — Take the square root to bring the units back to the original scale

In practice, no one writes this by hand. Every major ML library implements it in one line. In Python with scikit-learn, the implementation is as follows:

Metric // compute_rmse.py
from sklearn.metrics import mean_squared_error
import numpy as np

# y_true = array of actual values
# y_pred = array of model predictions
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

# In sklearn >= 0.24 you can also use:
rmse = mean_squared_error(y_true, y_pred, squared=False)

# Example output interpretation:
# rmse = 12000.0  -->  off by ~12,000 MAD on average

The mathematical expression your data team will write is: RMSE = √( (1/n) × Σ(yᵢ − ŷᵢ)² ) — where yᵢ is the actual value, ŷᵢ is the predicted value, and n is the number of observations. The four steps above are a direct translation of that formula into procedure.

// 03 Why Squaring Matters

The squaring step is not arbitrary. It is the defining design choice that gives RMSE its character — and its business relevance. Squaring errors has two effects: it makes all errors positive (so under-predictions and over-predictions do not cancel each other out), and more importantly, it penalises large errors disproportionately more than small ones.

Consider two scenarios for a demand forecasting model. In Scenario A, you make one prediction that is off by 5,000 units. In Scenario B, you make five predictions each off by 1,000 units. The total absolute error is the same: 5,000 units. But after squaring, Scenario A contributes 25,000,000 to the sum; Scenario B contributes 5 × 1,000,000 = 5,000,000. RMSE treats Scenario A as significantly worse.

This is intentional, and it maps well onto business risk. A single catastrophic prediction — massively under-ordering for a peak season, wildly overestimating cash flow for a budget cycle — is operationally more damaging than several minor misses. RMSE’s sensitivity to outliers encodes that business reality directly into the metric.

“RMSE does not care that you got most predictions right. It punishes the outlier mistakes hardest. In a business context, that is exactly the right behaviour.”

The direct consequence for model selection: if you are optimising for a context where extreme errors are catastrophic — financial forecasting, supply chain planning, safety-critical scheduling — RMSE is the right metric to minimise. If your operational context is equally sensitive to all errors regardless of magnitude, there are alternatives worth considering.

// 04 RMSE vs Other Metrics

RMSE does not exist in isolation. It is one member of a small family of regression metrics, each measuring a different facet of model error. Understanding the differences allows you to have a more precise conversation with your data team about what, specifically, is being optimised and why.

Reference // Metrics Comparison Table
METRIC  | WHAT IT TELLS YOU                        | WHEN TO USE IT
--------+------------------------------------------+-----------------------------
MAE     | Average absolute error, same units.       | Equal-weight errors matter.  
        | Simpler to interpret. Less outlier-sens.  | Robust median-like behaviour.
--------+------------------------------------------+-----------------------------
RMSE    | Average error, same units, outlier-heavy. | Large errors are catastrophic.
        | Penalises big mistakes harder than small.  | Finance, supply chain, ops.  
--------+------------------------------------------+-----------------------------
      | % of variance in target explained by      | Benchmarking model quality   
        | model. 1.0 = perfect. 0.0 = baseline avg. | vs. a naive baseline.        

MAE (Mean Absolute Error) is conceptually simpler: it is the plain average of how far off predictions are, without squaring. A model with MAE = 8,000 MAD is, on average, 8,000 MAD from the truth — no amplification of outliers. If your errors are all roughly equal in severity, MAE is the right summary. If some errors are structurally more dangerous than others, RMSE gives a more honest picture.

R² answers a different question entirely. It tells you not how big the errors are, but how much of the variation in the real-world outcome your model explains. An R² of 0.85 means the model accounts for 85% of why revenue fluctuates across months. It is useful for benchmarking, but it does not tell you the error in business units. You need both RMSE and R² together for a complete evaluation picture.

// 05 What a Good RMSE Looks Like

There is no universal threshold for a “good” RMSE. The number is only meaningful in relation to the scale of what you are predicting. This is where many business reviews go wrong — a data team reports an RMSE in isolation and management either panics or celebrates without the necessary reference point.

The correct framing is always relative. A 500 MAD RMSE on a model predicting average orders of 50,000 MAD represents a 1% error rate — a model that is performing excellently and likely better than any human estimate. The exact same 500 MAD RMSE on a model predicting average orders of 1,000 MAD represents a 50% error rate — a model that is effectively useless and should not be deployed.

  • RMSE = 500 MAD on 50,000 MAD average → 1% relative error → EXCELLENT
  • RMSE = 500 MAD on 5,000 MAD average → 10% relative error → ACCEPTABLE (context-dependent)
  • RMSE = 500 MAD on 1,000 MAD average → 50% relative error → MODEL IS UNRELIABLE

The practical ratio to calculate is RMSE divided by the mean of the target variable — sometimes called the Coefficient of Variation of RMSE (CV-RMSE). If this ratio is under 10%, your model is performing well for most business applications. Between 10% and 20%, it is adequate for some decisions and weak for others. Above 30%, you likely need to revisit the model before relying on it for operational choices.

BUSINESS_TAKEAWAY

When your data team presents RMSE, always ask for the average value of the target variable in the same breath. Never evaluate RMSE in isolation. The ratio is the metric that matters for business decisions.

The final practical instruction is simple: when your data team presents model performance, ask for RMSE and the average value it applies to. Ask for both the training RMSE and the test RMSE (divergence between the two signals overfitting). And if you are comparing two models, the one with the lower RMSE on held-out test data is the one making smaller mistakes on data it has never seen — which is the only kind of prediction that matters in production.