Logistic regression

Date

Wednesday, February 11, 2026

Links of interest

Notes

In class, we:

Reviewed key terms from the course so far, including features, weights, covariance and correlation coefficients, data augmentation, dimensional reduction, training/validation/testing datasets, thresholding, binary vs. multiclass classifiers, logits, softmax, gradient descent, and loss and cost functions.
Explored why a linear model is insufficient for estimating class probabilities:
$$\hat{y} = \mathbf{x}\mathbf{w} + b$$
The problem: there is nothing forcing $\hat{y}$ to stay in $[0, 1]$. We need a function that can map any real number to the range $[0, 1]$.

The remainder of the lecture focused on logistic regression, which is a classification method that uses a sigmoid function to model class probabilities. We:

Introduced the logistic (sigmoid) function:
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
This is the same function that appears inside softmax. In logistic regression, we model the probability that $y = 1$ as:
$$P(y = 1 \mid \mathbf{x}) = \sigma(\mathbf{x}\mathbf{w} + b) = \frac{1}{1 + e^{-(\mathbf{x}\mathbf{w} + b)}}$$
where $\mathbf{x}$ are input features, $\mathbf{w}$ are learned weights, and $b$ is the bias term.
Discussed a decision rule: once we have a probability $\hat{y} = P(y = 1 \mid \mathbf{x})$, we classify by applying a threshold:
$$\text{prediction} = \begin{cases} 1 & \text{if } \hat{y} \geq 0.5 \ 0 & \text{if } \hat{y} < 0.5 \end{cases}$$
Defined the cross-entropy loss function, which we minimize to train the model:
$$L = -\sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$$
- When $y_i = 1$: loss $= -\log(\hat{y}_i)$ — penalizes low predicted probabilities.
- When $y_i = 0$: loss $= -\log(1 - \hat{y}_i)$ — penalizes high predicted probabilities.
- Correct predictions produce loss near 0; wrong predictions produce loss that grows toward $\infty$.