what is cost function in machine learning
A cost function in machine learning is a mathematical formula that measures how wrong a model’s predictions are compared to the true values, usually as a single number that we try to minimize.
What is a cost function?
Think of it as a “badness score” for your model: higher cost means worse predictions, lower cost means better ones.
During training, the learning algorithm keeps adjusting model parameters (like weights) to reduce this score, step by step.
In supervised learning:
- For regression, it measures how far predicted numbers are from actual numbers.
- For classification, it measures how far predicted probabilities are from the true class labels.
Why is it important?
The cost function is the target the training algorithm is optimizing. If you change the cost function, you literally change what “good performance” means for the model.
It helps you:
- Quantify model error as a single scalar value.
- Compare different models or parameter settings objectively.
- Guide optimization methods like gradient descent in the right direction.
Common examples
Some classic cost functions:
- Mean Squared Error (MSE) – used in regression, averages the squared difference between actual yiy_iyi and predicted y^i\hat{y}_iy^i.
- Binary cross-entropy (log loss) – used in binary classification, penalizes confident wrong predictions heavily.
- Multiclass cross-entropy – used for multi-class problems (e.g., image classification with many labels).
A simple illustration:
If a house-price model predicts 300k but the true price is 320k, the cost
function turns that error (20k) into a numeric penalty that contributes to the
total cost over all training examples.
Quick HTML table view
Here’s a compact HTML table summarizing key points:
html
<table>
<tr>
<th>Aspect</th>
<th>Explanation</th>
</tr>
<tr>
<td>Definition</td>
<td>Measures the error between predicted and actual values as a single numeric score to minimize.</td>
</tr>
<tr>
<td>Role in training</td>
<td>Guides optimization algorithms (e.g., gradient descent) to update model parameters.</td>
</tr>
<tr>
<td>Regression example</td>
<td>Mean Squared Error (MSE) averages squared differences: large errors get penalized more.</td>
</tr>
<tr>
<td>Classification example</td>
<td>Cross-entropy (log loss) penalizes wrong, overconfident predictions strongly.</td>
</tr>
<tr>
<td>Why it matters</td>
<td>Defines what “good performance” means; changing it changes the model’s behavior and priorities.</td>
</tr>
</table>
Mini story to remember it
Imagine training a rookie archer: every time they shoot an arrow, you measure how far it lands from the bullseye. That distance score is like the cost function for each shot. The archer keeps adjusting their stance to reduce that score over many attempts, just like a model adjusting its parameters to reduce cost over training data.
TL;DR: A cost function in machine learning is a numerical measure of how bad your model’s predictions are, and training is the process of adjusting parameters to make that number as small as possible.
Information gathered from public forums or data available on the internet and portrayed here.