Saturday, July 27, 2024
HomeSimple Linear RegressionTutorial on How to Calculate Residual Values in Excel

Tutorial on How to Calculate Residual Values in Excel

Regression analysis is already widely used by researchers to explore the influence of independent variables on dependent variables. If we use regression analysis, we must have a good understanding of residual values. These residual values are needed in regression analysis. In addition, in the assumption tests required in linear regression analysis using the ordinary least square method, some also use residual values.

Residual values indicate the difference between the actual values of the data and the values predicted by the regression model. Given the importance of residual values in regression analysis, in this article, we will discuss how to calculate residual values using Excel.

Definition of Residual Values

Before we calculate residual values, let’s first define what is meant by residual values. Residual values are the difference between the actual values of the data (actual Y) and the values predicted by the model (predicted Y). In the context of regression, residual values reflect how far a specific data point is from the regression line.

Residual values are often denoted by “e.” Furthermore, we need to know that residuals can be positive or negative. The positivity or negativity of the residual value indicates the direction of prediction error.

If a positive residual value indicates that the predicted value is too low compared to the actual value. On the other hand, if a residual value is negative, it indicates that the predicted value is too high.

Residual Value Formula

The formula for calculating residual values refers to the definition of residuals. Residuals are the difference between the actual values of the data (actual Y) and the values predicted by the model (predicted Y). If written in formula form, it becomes:

Based on this formula, the value of Y represents the actual value of the observed or measured dependent variable in the dataset. Meanwhile, the value of Y predicted is the value predicted by the regression model for the dependent variable at a specific observation. By using input data and regression model parameters, we can calculate the value predicted by the model (Y Predicted).

Data for Practicing Residual Value Calculation

The data for practicing regression value calculation in this article uses annual time series data from 2013 to 2022. In this exercise, we will use an example dataset analyzed using simple linear regression consisting of only one dependent variable and one independent variable.

The objective of this case example is to determine the influence of product price on product sales. The equation specification is created for the product sales variable, denoted as “Y,” which is designated as the dependent variable. The product price variable is designated as the independent variable, denoted as “X.”

The variable Y is measured in unit quantities, while variable X is measured in USD. The data used for practicing residual value calculation in this article can be seen in detail in the table below:

Preparing Actual Observation Data (Actual Y)

The first step is to input actual observation data into an Excel sheet. For example, let’s input the data above into columns A and B, with column A containing the Year and column B containing the actual Y values.

These are the actual values of the observed or measured dependent variable in the dataset. Based on the training data aimed at modeling the influence of product price on product sales, the product sales dataset (Y) represents the actual values of product sales.

Preparing Predicted Data (Predicted Y)

After entering the actual Y data, we input the predicted Y data into column C. In this example, we place the data in column C.

These are the values predicted by the regression model for the dependent variable at specific observations. To obtain predicted values, we need to estimate the linear regression equation first.

We need to find the intercept and the estimated coefficient value for the price variable. For a tutorial on how to obtain these estimation values, please refer to my previous article titled: “Simple Linear Regression Analysis in Excel and How to Interpret the Results.”

Based on the results of the simple linear regression analysis estimation, the obtained values can be seen in the figure below:

From the figure above, it is known that the intercept value is 436.7696 and the estimated coefficient value for the price variable (X) is -20.9553. Based on these estimation results, we create the predicted Y equation as follows:

Y Predicted = 436.7696 – 20.9553X

Next, we calculate each predicted Y value by inputting the actual X values into this equation. In detail, the calculation values for predicting product sales (Predicted Y) can be seen in the table below:

How to Calculate Residual Values in Excel

After preparing the actual Y and predicted Y values, now we can use the residual formula in Excel to calculate residual values for each observation. Suppose we want to calculate residual values in column E. We can use the following formula in cell E2:

=B2-D2

Then, we can copy this formula downwards to calculate residual values for all observations. Excel will automatically calculate the difference between actual Y and predicted Y for each observation. In detail, the results of calculating residual values using Excel can be seen in the table below:

Residual Values in Regression Analysis

Residual values help us evaluate how well our regression model fits the actual data. Through residual values, we can determine how close the predicted values are to the actual values, allowing us to assess the accuracy and precision of the model.

Based on residual values, we can identify patterns in prediction errors. This can help us understand whether the regression model may need improvement or if additional variables need to be added.

Conclusion

Calculating residual values is a crucial step in linear regression analysis. By using Excel, we can easily calculate residual values for each observation. Hopefully, this tutorial is helpful for you in understanding and independently calculating residual values in Excel. Stay tuned for educational updates from Kanda Data in the next week.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments