Skip to content

Feature engineering Types

Feature Creation (Encoding, Binning)

Refers to the creation of new features from existing data to help with better predictions.

Examples:

  • one-hot-encoding: transform values in columns with boolean 1 or 0 (Ex: cities, product categories)
  • binning: continuous numerical values into groups (ex: age, salary, temperature)
  • splitting (Ex: date to year, month and day, and full name to first and last name)
  • calculated features (ex: Body Mass Index, total purchase value, account age from signup date)

Feature Transformation

  • Imputation (missing)
  • Missing indicators (1 or 0 for missing)
  • Outlier clipping
  • Feature interaction (Cartesian product. Ex: age_income, age * income)
  • Outlier clipping
  • Scaling/normalization
  • Invalid value replacement

Feature Extraction (PCA, LDA)

Involves reducing the amount of data to be processed using dimensionality reduction and noise removal techniques, reducing the amount of memory and computing power required while still maintaining the original data characteristics.

Examples:

  • PCA (Principal Component Analysis)
  • LDA (Linear Discriminant Analysis)
  • IDA (Independent Component Analysis)

Principal Component Analysis

PCA is a dimensionality reduction technique that converts many correlated features into a smaller number of new features (principal components) while keeping as much variance(information) as possible.

  • Unsupervised: does not use labels
  • Linear transformation

Imagine you have 2 features:

height weight
170 70
180 80
160 60

These features are high correlated, so instead of using both PCA uses a new axis that captures the largest variation in data

Use cases:

  • image compression
  • feature reduction before training

Feature Selection

Is the process of selecting a subset of extracted features.

Examples:

  • Feature importance score
  • Correlation matrix