# Linear Regression Residual Calculation Formula

In linear regression analysis, testing residuals is a very common practice. One crucial assumption in linear regression using the least squares method is that the residuals must be normally distributed.

To test this assumption, we first need to find or calculate the residuals. However, many people still do not understand how to calculate regression residuals.

Therefore, on this occasion, I would like to discuss how we can obtain residuals in regression. Once we have obtained the residuals, the next step is to conduct a normality test or other necessary tests using the residual values.

## Definition of Residual Value

The residual value is the difference between the actual observed value of the dependent variable and the predicted value of that variable. In other words, the residual value is the difference between the actual Y value and the predicted Y value.

The actual Y value or the true observation of the dependent variable is obtained through data collection, either from surveys (primary data) or from secondary data sources.

For example, when conducting a field survey, we might interview 150 consumers of product ABC. In this survey, we collect data on household income from these 150 respondents. This data represents the actual observed values or the actual Y values.

The next step is to understand how the predicted Y value or the estimated value of the dependent variable is obtained. To get this predicted value, we first need to estimate the regression equation.

For instance, we want to estimate the influence of three independent variables on the household income of consumers of product ABC (the dependent variable). To do this, we perform multiple linear regression analysis to obtain the estimated coefficients for each independent variable as well as the intercept value. For example, the multiple linear regression estimation results in the following equation:

Y = 10.4 + 3.5X1 + 2.3X2 – 1.2X3

To calculate the predicted value of the dependent variable for the first respondent, we substitute the actual values of the independent variables X1, X2, and X3 into the equation. Using simple mathematical operations, we can calculate the predicted values for each of the 150 respondents.

Calculating the residual value is very important in regression analysis. The residual value provides information about how well the regression model predicts the actual values of the data we have collected. A small residual indicates that our prediction model is quite accurate, whereas a large residual indicates otherwise.

## Residual Value Calculation Formula

As I have explained previously, the residual is the difference between the actual observed value of the dependent variable and the predicted or estimated value of that variable. Based on this definition, the formula for calculating the residual value is as follows:

Residual = Y actual – Y predicted

Where Y actual represents the actual observed value of the dependent variable, and Y predicted is the predicted or estimated value of the dependent variable.

As mentioned earlier, we can calculate the residual value using the formula above. To facilitate the calculation of the residual value, we can use software like Excel. In Excel, we can use formulas to manually calculate the predicted Y and the residual value.

In addition to Excel, we can also utilize statistical data processing applications such as SPSS, R, or Python to calculate the residual value. These applications allow us to perform more complex and in-depth calculations. Once we obtain the residual value, we can proceed with further analysis such as a normality test.

Conducting a normality test on residuals is crucial to ensure that the residuals are normally distributed. One of the main assumptions in linear regression is that the residuals must be normally distributed. By performing a normality test, we can evaluate whether this assumption is met or not.

By understanding and calculating the residual value, we can improve the accuracy of our regression model. This process is essential to ensure that our model provides reliable predictions. I hope this explanation helps you understand the concept of residuals and how to calculate them. Thank you for reading this article, and stay tuned for Kanda Data future updates.