In linear regression analysis, residual values play a crucial role. The residual value is the difference between the actual and predicted Y values. The actual Y value can be obtained from observations or samples of the dependent variable.
Based on the data that has been collected, we already have the actual Y values. We need to find the predicted Y values (Y predicted) to obtain the residual values. Where do we get the predicted values? We need to perform linear regression analysis first to obtain the predicted Y values.
From the results of linear regression analysis, we will obtain estimated coefficient values, including the intercept value and the estimated coefficient values of each independent variable used in the regression equation.
Given the importance of researchers’ understanding of residual values in linear regression analysis, I am interested in discussing the definition and purpose of determining residual values in linear regression analysis.
Definition of Residual Value
As I mentioned in the previous paragraph, a residual can be defined as the difference between the actual observation value of the dependent variable (Actual Y) and the predicted value of the dependent variable (Y predicted).
Residual values can be either positive or negative. The residual value is negative if the predicted value is greater than the actual value. Conversely, if the predicted value is smaller than the actual value, the residual value will be positive.
Purpose of Determining Residual Values
In linear regression analysis, residual values are used to assess the assumptions required for linear regression analysis, whether they are met or not. Several assumptions need to be satisfied in the least squares linear regression analysis.
The purpose of testing these assumptions is to ensure that the estimation results are not biased. This condition is often called the Best Linear Unbiased Estimator (BLUE).
One of the assumption tests that use residual values is the test for normality and heteroskedasticity. In linear regression analysis, it is assumed that the residuals are normally distributed. Additionally, in the heteroskedasticity test, it is assumed that the variance of the residuals is constant.
Therefore, understanding residual values is essential for researchers using the least squares linear regression analysis tool.
Predicted Y Values in Linear Regression Analysis
Predicted Y values (Y predicted) can be obtained through calculations using the general linear regression equation. The general formula in the regression equation is as follows:
Y = b0 + b1X1 + b2X2 + b3X3 + … + bnXn + e
Where: Y = Dependent variable; X1, X2, X3, Xn = Independent variables; b0 = Intercept; b1, b2, b3, bn = Estimated coefficients of the independent variables; and e = Disturbance error
Based on this regression equation, from the initial observed data, you will obtain the actual Y value and the actual values of X1, X2, and X3 (assuming we are using multiple linear regression with three independent variables).
Therefore, to calculate the predicted Y value, we need to find the estimated values, including the intercept and the estimated coefficients of each independent variable. Estimates can be obtained through linear regression analysis using the data already collected by the researcher.
For example, suppose the estimated coefficients are as follows:
b0 = 34.5; b1 = 13.5; b2 = 14.7; b3 = -15.2
Then, the regression equation used to calculate the predicted Y value is as follows:
Y’ = 34.5 + 13.5X1 + 14.7X2 – 15.2X3
Using this formula, we can input the values of X1, X2, and X3 to obtain the predicted Y value for the first observation. We then repeat the same process to calculate the predicted Y value for the entire research sample.
How to Determine Residual Values
Once we have successfully calculated the predicted Y values, all the components needed to calculate residual values are in place. The formula for residuals is as follows:
Residual = Actual Y – Predicted Y
Based on the formula above, we simply need to subtract the actual values from the predicted values for each observation or research sample. Subsequently, we will obtain residual values for each research sample. When calculating residual values, it is recommended that researchers use Excel to avoid errors.
The presence of statistical software can greatly assist researchers in quickly finding residual values, even for large datasets. It will undoubtedly save researchers time and effort in data analysis.
Using statistical software, we can obtain residual values directly with a few steps and stages. In some statistical software applications, the same residual values will be produced.
Conclusion
After reading this article, it is hoped that you now better understand the concept of residual values in regression analysis. For example, if a researcher is going to perform a normality test, it is necessary to calculate or generate the residual values before conducting the normality test.
Let’s take a moment to recall the definition of residuals. Residuals are the differences between actual Y and predicted Y values. Researchers need to find estimates of the intercept and coefficients for each independent variable to obtain predicted Y values.
It concludes the article I can share and discuss on this occasion. Hopefully, it provides value and adds to the knowledge of those needing an understanding of residuals in regression analysis. See you in the next educational article. Stay healthy and take care!