In this first example, we have a matrix, A, with 100 columns of data, but the data in B only really depends on the first 4 of those columns.
The permutation vector computed shows the first 4 entries are relevant, and the coefficient vector, LSP, exactly matches the terms used to build B. All other columns not referenced by p can be discarded.
In this second example, we will create a result vector that depends on 10 variables, of which only 5 of them are measured in the matrix, A (along with 95 other measurements of irrelevant/random properties).
The notation A[..,p] will select all the rows of A and only the column indices found in the list p. This is the reduced matrix. Note the correlation of B and (A[..,p].LSP)
Compare this with the standard least squares fit.
The correlation with the training data is a closer match using standard least squares, but let's see what happens when we use these models to predict results using new data.
Note how the correlation of the new data is much better using the predictive model. The standard model suffers from overfitting.