when classifying data with logistic classification, what is the upper bound of the likelihood in the maximum likelihood method? is this value attainable?
In logistic classification using the maximum likelihood method, the likelihood function is bounded above by 1. This upper bound represents perfect prediction of the data, where the model assigns probability 1 to each observed class label. However, this value is typically not attainable in practice due to model limitations and data noise.
Likelihood Basics
Logistic regression models binary outcomes via the sigmoid function, where predictions pi=σ(wTxi)p_i=\sigma(\mathbf{w}^T\mathbf{x}i)pi=σ(wTxi) lie strictly between 0 and 1. The likelihood L(w)=∏i=1npiyi(1−pi)1−yiL(\mathbf{w})=\prod{i=1}^np_i^{y_i}(1-p_i)^{1-y_i}L(w)=∏i=1npiyi(1−pi)1−yi (or its log form for optimization) measures data fit. Since each term piyi(1−pi)1−yi≤1p_i^{y_i}(1-p_i)^{1-y_i}\leq 1piyi(1−pi)1−yi≤1, the product satisfies 0<L(w)≤10<L(\mathbf{w})\leq 10<L(w)≤1, with equality to 1 only if pi=yip_i=y_ipi=yi for all iii.
Why 1 Is the Bound
- Theoretical maximum : Achieved if the model outputs exactly match labels (e.g., pi=1p_i=1pi=1 for all yi=1y_i=1yi=1, and vice versa).
- Log-likelihood perspective : Maximizing ℓ(w)=∑i=1n[yilogpi+(1−yi)log(1−pi)]\ell(\mathbf{w})=\sum_{i=1}^n[y_i\log p_i+(1-y_i)\log (1-p_i)]ℓ(w)=∑i=1n[yilogpi+(1−yi)log(1−pi)] targets ℓ≤0\ell \leq 0ℓ≤0, corresponding to L≤1L\leq 1L≤1.
Attainability Challenges
This perfect fit is rarely attainable :
- Linear separability edge case : If data is perfectly separable, parameters w\mathbf{w}w can grow unbounded, driving L→1−L\to 1^-L→1− but never reaching it exactly due to sigmoid asymptotes.
- Noise and overlap : Real data has irreducible error; even optimal w\mathbf{w}w yields L<1L<1L<1.
- Practical note : Optimization (e.g., gradient descent) converges to a high-but-sub-1 value, signaling good fit when close (say, L>0.9L>0.9L>0.9).
Scenario| Likelihood Bound| Attainable?| Example
---|---|---|---
Perfect prediction| 1| No (sigmoid limit)| All points on decision boundary
Noisy data| <1| N/A| Typical ML datasets1
Separable data| Approaches 1| Asymptotically| Infinite ∥w∥|\mathbf{w}|∥w∥3
Bottom TL;DR : Upper bound is 1, but unattainable in logistic regression—aim for values near 1 via MLE.
Information gathered from public forums or data available on the internet and portrayed here.