The upper bound of the (non‑log) likelihood in logistic classification is 1 , and this value is only attainable in the degenerate case where the model predicts every observed label with probability 1 (perfect classification).

Intuition

In logistic regression, the model outputs a probability pi=P(yi=1∣xi)p_i=P(y_i=1\mid x_i)pi​=P(yi​=1∣xi​) for each data point.

  • The likelihood is the product of all those predicted probabilities for the actually observed outcomes:

L(θ)=∏ipiyi(1−pi)1−yiL(\theta)=\prod_i p_i^{y_i}(1-p_i)^{1-y_i}L(θ)=i∏​piyi​​(1−pi​)1−yi​

This is literally a product of numbers between 0 and 1, so 0≤L(θ)≤10\le L(\theta)\le 10≤L(θ)≤1.

  • You reach L(θ)=1L(\theta)=1L(θ)=1 only if every term in the product equals 1, which would require each predicted probability for the true class to be exactly 1 and for all others 0.

In realistic datasets, even a very well‑fitting logistic regression model will have likelihood strictly less than 1 , because probabilities rarely hit exactly 0 or 1 and data are noisy.

Is that upper bound attainable in practice?

  • Theoretical answer: Yes, 1 is the theoretical upper bound.
  • Practical answer: Almost never; it would correspond to a model that perfectly “explains” the data with absolute certainty, which is extremely unlikely outside of contrived examples or separable data with extreme parameter values.

So for the usual “maximum likelihood” fit in logistic classification, you are always trying to push the likelihood up toward 1 , but in real applications you simply get as high as the data and model allow, not all the way to the bound.

Information gathered from public forums or data available on the internet and portrayed here.