3. Automatic Input Selection
General Idea of Input Selection
- Forward Selection (FS)
- fast
- recommended for many potential inputs but only few selected
- ignores correlations/interactions between potential inputs - Backward Elimination (BE)
- slower
- recommended if most potential inputs are selected
- takes into account correlations/interactions between potential inputs - Stagewise Selection
- combination of FS and BE - Brute Force
- all input combinations
Input Selection - What for?
Input Selection in the Context of Machine Learning
For choice of optimal model complexity, input selection evaluates merits of input subsets
Motivation
- Optimal bias/variance trade-off
- Explore relevance of inputs
Aims
- Weakening the curse of dimensionality
- Reducing the model’s variance error
- Increase of process understanding
- More concise and transparent models
- May support future design of experiments (DoE) or active learning strategies
Local Model Networks Enhanced
Separate Inputs for Local Models and Validity Functions
- Whole problem split into smaller sub-problems
- Training algorithm necessary
- Here: Hierarchical Local Model Tree (HILOMOT)
Separation between Linear and Nonlinear Effects
- LMNs Reveal Possibility to Separate between Linear and Nonlinear Effects
- Any physical input ui can be included in the rule premises zi and/or in the rule consequents xi
- The separation between linear and nonlinear effects is not possible for other types of models
- In the following the rule premises are referred to as the z-input space, the rule consequents are referred to as the x-input space
Equivalence to Fuzzy
Interpretation as Takagi-Sugeno Fuzzy System
Input Selection With the HILOMOT-Algorithm
Input Spaces
Overview: Different Selection Strategies for Input Selection Tasks
Depending on the selection strategy either subsets or all physical inputs can be assigned to the x- and z-input space.
Demonstration Example: HILOMOT Wrapper Method
Artificial Process
- Superposition of:
- Hyperbola f(u1)
- Gaussian function f(u2)
- Linear function f(u3)
- Normal distributed noise, σ = 0, σ = 0.05 (for u4)
Setting
- Training samples N = 625 & N = 1296
- Placed on a 4-dimensional grid
HILOMOT Wrapper Settings
- Evaluation criterion: Akaike’s information criterion (AICc)
- Search Strategy: Backward elimination (BE)
4-D Demonstration Example: Linked x-z-Selection
Results for the Linked x-z-Selection
- Because of the BE search strategy, the results have to be read from right to left
- Each ‘o’ is labeled with the input that is discarded in the corresponding BE step
- A growing sample size N reduces the bad influence of the useless input u4
- For both sample sizes N, the HILOMOT wrapper method identifies the 4th input as not useful in the linked x-z-space
4-D Demonstration Example: z-Selection
Results for the z-Selection
- Inputs u3 and u4 are identified as useless in the z-space
- Because the slope in u3 direction does not change, no partitioning has to be done in that direction - As soon as an important input in the z-space is removed the evaluation criterion gets significantly worse
4-D Demonstration Example: Separated x-z-Selection
Results for the Separated x-z-Selection
- Input u4 is identified as useless in the x- as well as in the z-space for both sample sizes N
- Input u3 is identified as useless in the z-space for both sample sizes N
- After the first useful input is discarded, the evaluation criterion gets slightly worse
Real-World Demonstration Examples
Auto Miles Per Gallon (MPG) Data Set
(From the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml) 2010)
- N = 392 samples
- q = 1 physical output:
- Auto miles per gallon - p = 7 physical inputs:
- Cylinders (u1)
- Displacement (u2)
- Horsepower (u3)
- Weight (u4)
- Acceleration (u5)
- Model year (u6)
- Country of origin (u7) - Data Splitting
- ¾ of data: training samples (used for network training, complexity selection, input selection)
- ¼ of data: test samples (only for final testing)
- Heuristic, deterministic data splitting strategy
Auto MPG Wrapper Input Selection Results
Results for the x-z-Input Selection
- Both search strategies yield similar results:
- Best AICc-Value for 5 inputs
- Same input combination - Input selection path:
- Indicates the input combination to a given number of inputs
- In case of BE read from right to left
Results on the Test Data
- Mean squared error (MSE) for best AICc-Values:
- MSE (5 inputs) = 6.0
- MSE (all inputs) = 7.7 - Most important inputs: u4 (car weight) and u6 (model year)
- Improvement of model accuracy
- Information about very important inputs
Results for the z-Input Selection
- Best AICc-Value ES result:
- 4 inputs - Best AICc-Value BE result:
- 4 inputs
- BUT other input combination(see input selection path)
Results on the Test Data
- MSE for best AICc-Values:
- MSE (best ES) = 5.2
- MSE (best BE) = 5.3
- MSE (all inputs) = 7.7 - Most important inputs: u2 (displacement) and u6 (model year)
- Improvement of model accuracy
- Information about nonlinear input influences
Climate Control
Process
- 2 controlled inputs (u1, u2)
- valve positions - 4 measured inputs (d1, d2, d3, d4)
- temperature, rel. humidity, temp., temp. - 2 measured outputs (y1, y2):
- temperature, rel. humidity - Difficult modeling with ARX: Unstable models are possible
Selection
- Selection on the simulated output of validation data.
- Backward selection gives significantly better results.
- ~15 inputs/regressors are sufficient for a good model.
- Allowed delays: (k-1) … (k-10).
- Forward selection: Tries to model the process with multiple nonlinear influences.
- Backward selection: Nonlinearities only need to be considered in u1(k-1) and u2(k-2)!
Next Chapter: 4. Metamodeling Back to Overview