The residual value in linear regression analysis needs to be calculated first before calculating the variance. In addition, the linear regression of the ordinary least square method must pass the assumption test that the residuals must be normally distributed. However, before calculating the residual value, you must first calculate the predicted Y value. Therefore, we will discuss how to calculate the predicted Y value and residual value on this occasion.
First, we will find out how to get the predicted Y value in simple linear regression. This model only consists of one dependent variable and one independent variable. In last week’s article, a tutorial was given on calculating the coefficients of the regression parameters, namely the intercept (bo) value and the b1 coefficient. These two values will be used to calculate the Y Predicted value.
As we already know, the general equation for simple linear regression is:
Y = bo + b1X
Where
Y = dependent variable
X = independent variable
bo = intercept
b1 = regression coefficient
The predicted Y value can be calculated for each observation based on this equation. The way to calculate it is by adding and multiplying each coefficient of the estimation result with the initial observation value of the independent variable.
For example, if the intercept value is 218.38 and the estimated coefficient value for the X variable is -0.0014. Furthermore, the first observation value for the variable X is 6000. So, how to calculate the predicted Y for this 1st observation is:
Y = bo + b1X
Y = 218.38 + (-0.0014)*6000
Y = 210.003
You can use Microsoft Excel to simplify and save time in calculating Y predicted. In the same way, you can calculate the predicted Y value for all existing observation data or sample data. Congratulations, you have successfully calculated the Y predicted value correctly.
Referring to the beginning of the paragraph, Y predicted is used to calculate the residual value in a simple linear regression analysis. Before calculating the residual value, you should know the definition of residual value in regression analysis. The residual value is the difference between the actual observed value of the dependent variable (Y) and the predicted Y value. The formula to calculate it can be seen in the following equation:
Residual = Y Actual – Y Predicted
For example, if the Actual Y value is 213, then you can calculate the residual value as follows:
Residual = Y Actual – Y Predicted
Residual = 213 – 210.003
Residual = 2.997
You have successfully calculated the residual value for the first observation/sample from these calculations. Next, you need to calculate residual values for all observations/samples in your study.
This regression estimation can use historical data/time series data and cross-section data. Based on the calculations, we can determine the difference between the actual and predicted values of the regression estimation results for the dependent variable. For example, based on two decades of annual sales data, we can forecast sales data for the next few years using this simple linear regression.
Based on the calculation of the first observation, we get a residual value of 2.997. After calculating all residual values, we can test for normality. One of the simple linear regression assumptions that must be met is that the residuals are normally distributed. This assumption must be met so that the regression estimation results produce the Best Linear Unbiased Estimator (BLUE). For those who are more interested in learning to use audio-visuals, “Kanda Data” has prepared a video tutorial. This video is delivered in Indonesian, and please use English subtitles:
Hopefully, the video you have watched is clear. If you still have questions, please leave them in the comments column, or you can also comment below the post of this article. Before we end our topic this time, allow me to recap. On this occasion, we have learned to find the residual value obtained from the difference between the actual Y value and the predicted Y. Predicted Y value is obtained by adding and multiplying each coefficient of the estimation result with the initial observation value of the independent variable.
For those who want to get updated video tutorials related to statistics, econometrics and data, please visit the “Kanda Data” youtube channel. See you in the next article!
I’m glad to know this article was beneficial for you. Thank You