Dimensional Reduction
Date
Monday, January 26, 2026Links of interest
Notes
In class, we:
- Reviewed your first team presentations. Some notes I provided:
- Data characterizations should include both detailed metadata and visualizations, ideally ones that you make.
- Skip intricate slide backgrounds (including photographs). Solid colors are best.
- If you show something on a slide, you must explain it!
- Formally defined dimensions as “the number of parameters (variables) in a dataset”.
- Explored why we might want to reduce the number of dimensions in a dataset, including:
- Improve the ability to visualize data.
- Remove parameters that either are highly correlated or contribute very little to variance in your dataset.
- Reduce computational burden (i.e., reducing the number of parameters lessens the number of calculations, amount of memory, etc. required for processing).
- Contend with the curse of dimensionality (this week’s reading assignment!).
- Established two ways to reduce data dimensions:
- Feature selection, such as by filtering
- Feature extraction.
- Defined variance, covariance, and correlation, which are all important to understand how our data are spread and related.
- Learned about Principal Component Analysis (PCA), which is a feature extraction method, and applied it to a published dataset.
Below are some details about describing data and implementing PCA: