Predictive Least Squares

All Products Maple MapleSim

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Regression : Predictive Least Squares

Statistics

PredictiveLeastSquares

fit a linear model to data

	Calling Sequence
	PredictiveLeastSquares(A, B, v, options)

Parameters

A	-	Matrix; values of independent variables
B	-	Vector; values of dependent variable
v	-	name; (optional) independent variable name
options	-	(optional) equation(s) of the form option=value where option is one of samplesize, tolerance or numtrials

Description

•	The PredictiveLeastSquares command returns a list, P, and a vector, V that best satisfies the equation A[..,P] . x is approximately equal to B according to random trials using a subset of the data for fitting and the remaining data to test the goodness of the fit.

•

This command works best in situations where the problem is underspecified. That is, the number of variables, or columns in A is of the same order of magnitude or less than the number of observations or rows of A and B. The returned list, P contains the column index for the variables that have been tested to be most relevant, thus minimizing the effect of outliers and overfitting when using the model to predict new results.

•	A and B must contain numeric entries. A is a m x n Matrix, and B is a m x 1 vector.

Options

The options argument can contain one or more of the options shown below.

•

numtrials= integer -- Specify how many random subsamples should be used to determine which variables to drop at each phase of regression. After each sweep that causes one or more variables/columns to be removed, another phase consisting of the specified number of trials is performed. The default is numtrials=15.

•	tolerance = realcons(nonnegative) -- Set the tolerance that determines whether a fit coefficient can be considered insignificant, and therefore should be removed. This is a relative tolerance, compared to the largest coefficient. The default is 1e-10.

•

samplesize= realcons(nonnegative) -- Provide the fraction of data that will be used for building the model. A setting of samplesize=.7, will cause snapshots using 70% of the data to be used for fitting, and the remaining 30% to be used for testing. This must be a number between 0 and 1. The default is .55.

Notes

•

The underlying computation is done in floating-point; therefore, all data points must have type realcons and all returned solutions are floating-point, even if the problem is specified with exact values. For more information about numeric computation in the Statistics package, see the Statistics/Computation help page.

Examples

>	$with (Statistics) &colon;$

In this first example, we have a matrix, A, with 100 columns of data, but the data in B only really depends on the first 4 of those columns.

>	$A ≔ LinearAlgebra :- RandomMatrix (100, 100, datatype = float [8]) &colon;$

>	$B ≔ Vector (100, i \mapsto A [i, 1] + 0.1 \cdot A [i, 2] + 0.5 \cdot A [i, 3] - 0.3 \cdot A [i, 4]) &colon;$

The permutation vector computed shows the first 4 entries are relevant, and the coefficient vector, LSP, exactly matches the terms used to build B. All other columns not referenced by p can be discarded.

>	$p, LSP ≔ PredictiveLeastSquares (A, B)$

$p, LSP ≔ [1, 2, 3, 4], [\begin{array}{c} 1. \\ 0.100000000000000 \\ 0.500000000000000 \\ −0.300000000000000 \end{array}]$

(1)

In this second example, we will create a result vector that depends on 10 variables, of which only 5 of them are measured in the matrix, A (along with 95 other measurements of irrelevant/random properties).

>	$numsamples ≔ 50 &colon;$

>	$numvariables ≔ 100 &colon;$

>	$Z ≔ LinearAlgebra :- RandomMatrix (numsamples, 10, datatype = float [8]) &colon;$

>	$A ≔ LinearAlgebra :- RandomMatrix (numsamples, numvariables, datatype = float [8]) &colon;$

>	$A [.., 1 .. 5] ≔ Z [.., 1 .. 5] &colon;$

>	$B ≔ Vector (numsamples, i \mapsto add (\frac{Z [i, j]}{j}, j = 1 .. 10)) &colon;$

>	$p, LSP ≔ PredictiveLeastSquares (A, B) &colon;$

The notation A[..,p] will select all the rows of A and only the column indices found in the list p. This is the reduced matrix. Note the correlation of B and (A[..,p].LSP)

>	$Correlation (B, A [.., p] \cdot LSP)$

$0.992800745975950$

(2)

Compare this with the standard least squares fit.

>	$LS ≔ LinearAlgebra :- LeastSquares (A, B) &colon;$

>	$Correlation (B, A \cdot LS)$

$1.$

(3)

The correlation with the training data is a closer match using standard least squares, but let's see what happens when we use these models to predict results using new data.

>	$Z2 ≔ LinearAlgebra :- RandomMatrix (numsamples, 10, datatype = float [8]) &colon;$

>	$A2 ≔ LinearAlgebra :- RandomMatrix (numsamples, numvariables, datatype = float [8]) &colon;$

>	$A2 [.., 1 .. 5] ≔ Z2 [.., 1 .. 5] &colon;$

>	$GuessLS ≔ A2 \cdot LS &colon;$

>	$GuessLSP ≔ A2 [.., p] \cdot LSP &colon;$

>	$Actual ≔ Vector (numsamples, i \mapsto add (\frac{Z2 [i, j]}{j}, j = 1 .. 10)) &colon;$

Note how the correlation of the new data is much better using the predictive model. The standard model suffers from overfitting.

>	$Correlation (GuessLS, Actual)$

$0.727378412013161$

(4)

>	$Correlation (GuessLSP, Actual)$

$0.959754787645568$

(5)

Compatibility

•	The Statistics[PredictiveLeastSquares] command was introduced in Maple 17.

•	For more information on Maple 17 changes, see Updates in Maple 17.

Maple

Maple Add-Ons

Student Success Platform

定着率の向上

MapleSim

MapleSim Add-Ons

システムズエンジニアリング

コンサルティングサービス

オンライン教育製品

教育利用

産業分野

自動車及び航空宇宙

ロボティクス

マシンデザイン
& 産業オートメーション

その他

アプリケーション

製品価格

購入

教育機関向け
学生ライセンス

Maplesoft Elite Maintenance (EMP)

サポート

製品のトレーニング

オンライン製品ヘルプ

Webセミナー
& イベント

出版物

コンテンツ Hub

事例とアプリケーション

コミュニティ

Maplesoftについて

メディアセンター

ユーザコミュニティ

お問合せ

Online Help

All Products Maple MapleSim

Maple

シンプルな操作性の高度数学ソフトウェア

Maple Add-Ons

Student Success Platform

定着率の向上

MapleSim

先進的なシステムレベルモデリング

MapleSim Add-Ons

システムズエンジニアリング

コンサルティングサービス

オンライン教育製品

教育利用

産業分野

自動車及び航空宇宙

ロボティクス

マシンデザイン & 産業オートメーション

その他

アプリケーション

製品価格

購入

教育機関向け 学生ライセンス

Maplesoft Elite Maintenance (EMP)

サポート

製品のトレーニング

オンライン製品ヘルプ

Webセミナー & イベント

出版物

コンテンツ Hub

事例とアプリケーション

コミュニティ

Maplesoftについて

メディアセンター

ユーザコミュニティ

お問合せ

Online Help

All Products Maple MapleSim

マシンデザイン
& 産業オートメーション

教育機関向け
学生ライセンス

Webセミナー
& イベント