Logistic regression

Date

Wednesday, February 11, 2026

Notes

In class, we:

  1. Reviewed key terms from the course so far, including features, weights, covariance and correlation coefficients, data augmentation, dimensional reduction, training/validation/testing datasets, thresholding, binary vs. multiclass classifiers, logits, softmax, gradient descent, and loss and cost functions.

  2. Explored why a linear model is insufficient for estimating class probabilities:

    $$\hat{y} = \mathbf{x}\mathbf{w} + b$$

    The problem: there is nothing forcing $\hat{y}$ to stay in $[0, 1]$. We need a function that can map any real number to the range $[0, 1]$.


The remainder of the lecture focused on logistic regression, which is a classification method that uses a sigmoid function to model class probabilities. We:

  1. Introduced the logistic (sigmoid) function:

    $$\sigma(z) = \frac{1}{1 + e^{-z}}$$

    This is the same function that appears inside softmax. In logistic regression, we model the probability that $y = 1$ as:

    $$P(y = 1 \mid \mathbf{x}) = \sigma(\mathbf{x}\mathbf{w} + b) = \frac{1}{1 + e^{-(\mathbf{x}\mathbf{w} + b)}}$$

    where $\mathbf{x}$ are input features, $\mathbf{w}$ are learned weights, and $b$ is the bias term.

  2. Discussed a decision rule: once we have a probability $\hat{y} = P(y = 1 \mid \mathbf{x})$, we classify by applying a threshold:

    $$\text{prediction} = \begin{cases} 1 & \text{if } \hat{y} \geq 0.5 \ 0 & \text{if } \hat{y} < 0.5 \end{cases}$$

  3. Defined the cross-entropy loss function, which we minimize to train the model:

    $$L = -\sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$$

    • When $y_i = 1$: loss $= -\log(\hat{y}_i)$ — penalizes low predicted probabilities.
    • When $y_i = 0$: loss $= -\log(1 - \hat{y}_i)$ — penalizes high predicted probabilities.
    • Correct predictions produce loss near 0; wrong predictions produce loss that grows toward $\infty$.