Excel: Calculating Linear RegressionLast reviewed: November 2, 1994Article ID: Q51026 |
SUMMARYMicrosoft Excel can be very useful when you need to make predictions of future data based on historical data. One of the most common techniques used for making such predictions is that of linear regression.
MORE INFORMATIONFor example, assume you have accumulated eight years' worth of sales figures for a company. These figures are as follows:
| A | B ------------------------ 1 | Year | Quantity Sold 2 | 1 | 1250 3 | 2 | 1540 4 | 3 | 1798 5 | 4 | 3300 6 | 5 | 4000 7 | 6 | 3700 8 | 7 | 3100 9 | 8 | 3484From these numbers, you want to predict the sales for the next years (years 9, 10, and 11). To accomplish this, use the Excel TREND() function. The TREND() function accepts arrays of known y's and known x's, and returns a linear trend of values along the best fit line for these x's and y's. (Note: The cells in column C are formatted as "0.00".)
| A | B | C
-------------------------------------
1 | Year | Quantity Sold | Predicted
2 | 1 | 1250 | 1528.00
3 | 2 | 1540 | 1883.29
4 | 3 | 1798 | 2238.57
5 | 4 | 3300 | 2593.86
6 | 5 | 4000 | 2949.14
7 | 6 | 3700 | 3304.43
8 | 7 | 3100 | 3659.71
9 | 8 | 3484 | 4015.00
10 | 9 | | 4370.29
11 | 10 | | 4725.57
12 | 11 | | 5080.86
In cells C2:C12, enter the formula =TREND(B2:B9,A2:A9,A2:A12) as an
array (by selecting C2:C12, typing the formula, and pressing
COMMAND+ENTER). In this formula, cells B2:B9 are the known y's, A2:A9
are the known x's, and A2:A12 are the new x's (the points on the line
for which we want to have y values returned).
You can now chart the data to view your results. If you're using Excel 2.20, do the following:
6000.00|
|
|
5000.00| Predicted ==> !
| ___ /
| / /\ % <== Actual
4000.00| / / \/
| |/
| /|
3000.00| / /
| / |
| / /
2000.00| / _/
|/ _/
! /
1000.00%
|
|
0.00|
----------------------
1 2 3 4 5 6 7 8 9 10 11
After predicting the values, you need to determine how accurately the
predicted line reflects the actual data. To do this, calculate the
coefficient of determination, or the "R squared" value, as follows:
r2=sum(Yest-Yavg)2/sum(Y-Yavg)2To begin with, calculate the average for the y's (Quantity Sold). In cell B15, enter the formula =AVERAGE(B2:B9) and receive the result of 2771.50. The worksheet now looks like the following:
| A | B | C
--------------------------------------
1 | Year | Quantity Sold | Predicted
2 | 1 | 1250 | 1528.00
3 | 2 | 1540 | 1883.29
4 | 3 | 1798 | 2238.57
5 | 4 | 3300 | 2593.86
6 | 5 | 4000 | 2949.14
7 | 6 | 3700 | 3304.43
8 | 7 | 3100 | 3659.71
9 | 8 | 3484 | 4015.00
10 | 9 | | 4370.29
11 | 10 | | 4725.57
12 | 11 | | 5080.86
13 | | |
14 | | |
15 | Yavg | | 2771.5
Next, establish two columns to represent the two parts of your
equation. In column D, enter the formula for (Yest-Yavg)2, and in
column E, enter the formula for (Y-Yavg)2. In cell D2, enter the
formula =(C2-$B$15)^2 and then fill the formula through cell D9. In
cell E2, enter the formula =(B2-$B$15)^2 and then fill this formula
down through cell E9. Now, calculate the coefficient of determination
in cell E15 by entering the formula =SUM(D2:D9)/SUM(E2:E9). (Note:
Columns C, D, and E are formatted as "0.00", and cell E15 is formatted
as "General".)
| | A | B | C | D | E | | 1 | Year | Quantity Sold | Predicted | (Yest-Yavg)^2 | (Y-Yavg)^2 | | 2 | 1 | 1250 | 1528.00 | 1546292.25 | 2314962.25 | | 3 | 2 | 1540 | 1883.29 | 788924.62 | 1516592.25 | | 4 | 3 | 1798 | 2238.57 | 284012.86 | 947702.25 | | 5 | 4 | 3300 | 2593.86 | 31556.98 | 279312.25 | | 6 | 5 | 4000 | 2949.14 | 31556.98 | 1509212.25 | | 7 | 6 | 3700 | 3304.43 | 284012.86 | 862112.25 | | 8 | 7 | 3100 | 3659.71 | 788924.62 | 107912.25 | | 9 | 8 | 3484 | 4015.00 | 1546292.25 | 507656.25 | |10 | 9 | | 4370.29 | | | |11 | 10 | | 4725.57 | | | |12 | 11 | | 5080.86 | | | |13 | | | | | | |14 | | | | | | |15 | Yavg | 2771.5 | | | 0.658952019| The closer the R squared value comes to 1, the better the fit of data points to the predicted values. In this case, the R squared value returned is .65895, which represents a fairly good fit, but also indicates that our predicted values are not very reliable.
|
KBCategory: kbother
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |