The upper bound of the (joint) likelihood in logistic classification with maximum likelihood is 1 , and this value is in principle attainable, but only in special, usually unrealistic cases.

Intuition in plain language

In binary logistic regression, each data point is modeled as a Bernoulli random variable: for input xix_ixi​, the model outputs a probability pi=P(yi=1∣xi,θ)p_i=P(y_i=1\mid x_i,\theta)pi​=P(yi​=1∣xi​,θ), and the probability of the observed label yi∈{0,1}y_i\in \{0,1\}yi​∈{0,1} is

P(yi∣xi,θ)={piif yi=1,1−piif yi=0.P(y_i\mid x_i,\theta)= \begin{cases} p_i&\text{if }y_i=1,\\ 1-p_i&\text{if }y_i=0. \end{cases}P(yi​∣xi​,θ)={pi​1−pi​​if yi​=1,if yi​=0.​

The likelihood of the whole dataset is the product of these probabilities over all samples, so it is always a number in (0,1](0,1](0,1].

  • Because each factor is at most 1, and the likelihood is a product of such factors, the maximum possible value of the likelihood is 1.
  • To actually get likelihood L=1L=1L=1, the model would have to assign probability exactly 1 to every observed label (and 0 to the opposite label) for all samples simultaneously.
  • With a logistic (sigmoid) model, that would require probabilities of exactly 0 or 1, which corresponds to parameters of infinite magnitude (weights going to ±∞\pm\infty ±∞), i.e., a perfectly separable dataset pushed to an extreme decision boundary.

So:

  • Upper bound of the likelihood: 1.
  • Is it attainable?
    • In a strict finite-parameter logistic model, the sigmoid never outputs exactly 0 or 1 for finite weights, so L=1L=1L=1 is not attained; you only approach it as parameters go to infinity.
* In practice, with real data and regularization, the maximum likelihood will be **strictly less than 1** , and that is the value your optimization will converge to.

Quick numeric story

Imagine 3 data points, and your model predicts:

  • For a positive example, p=0.99p=0.99p=0.99.
  • For another positive, p=0.98p=0.98p=0.98.
  • For a negative example, p=0.97p=0.97p=0.97 that it is negative, i.e. 1−p=0.971-p=0.971−p=0.97.

Then the likelihood is

L=0.99×0.98×0.97≈0.941.L=0.99\times 0.98\times 0.97\approx 0.941.L=0.99×0.98×0.97≈0.941.

This is less than 1 but still high — indicating the model fits the labels well. To get L=1L=1L=1, you would need every factor to be exactly 1, which logistic regression can only approach, not achieve, with finite parameters.

Mini-FAQ

  1. What about the log-likelihood?
    • Log-likelihood is the sum of log probabilities, each ≤0\le 0≤0, so the upper bound of the log-likelihood is 0 (which corresponds to likelihood 1).
  1. Does maximum likelihood always reach this upper bound?
    • No. Maximum likelihood finds the best parameters within the model class , but that best value is almost always strictly below 1.
  1. What happens on perfectly separable data?
    • For linearly separable data, the likelihood can be pushed arbitrarily close to 1 by making weights larger in magnitude, so the MLE technically does not exist as a finite parameter; optimization may diverge instead of converging to a finite maximum.

Bottom line:
When classifying data with logistic classification using maximum likelihood, the theoretical upper bound of the likelihood over the dataset is 1 , corresponding to perfect prediction of all labels, but in realistic logistic regression with finite parameters this bound is only approached, not actually attained.

Information gathered from public forums or data available on the internet and portrayed here.