Simple linear regression can be used to analyze the effect of one variable on another variable. The regression analysis consists of the dependent variable and the independent variable. Simple linear regression only consists of one dependent variable and one independent variable. If in the regression equation the number of independent variables is two or more than two variables, it is no longer called simple linear regression.
A simple linear regression equation describes the relationship of one independent variable with one dependent variable, usually depicted by a straight line. Therefore, one of the assumptions of simple linear regression is the assumption of linearity. In simple linear regression analysis using the ordinary least square method, the assumptions required are met to obtain the best linear unbiased estimator. Therefore, researchers need to test assumptions, including heteroscedasticity, linearity, normality, and autocorrelation (if using time series data).
Simple linear regression estimation
Estimates are conducted to obtain estimated coefficients that can be used to describe the phenomenon under study. When the researcher chooses to use simple linear regression analysis, statistical estimates will be conducted based on the data obtained. In simple linear regression analysis, because it only consists of one independent variable, there is only one independent variable’s estimation coefficient (b1). In addition, there is an estimated coefficient of intercept (bo). The estimated regression coefficient can also be used to determine the predicted value of the dependent variable.
The estimated coefficient can be obtained by manual calculation or statistical software analysis results. Because of the importance of this estimated coefficient value, a researcher must understand where the estimated coefficient value is obtained. On this occasion, Kanda Data will write a tutorial on manually calculating simple linear regression (part 1). Kanda Data will calculate the value of the intercept coefficient (b0), the estimated coefficient (b1) and the coefficient of determination (R squared).
Simple linear regression analysis mini-research example
As an exercise material, I have prepared a mini-research example of a simple linear regression analysis. Data was collected over ten years at a company testing new technology on the cars it makes. A researcher conducts observations to determine the influence of the selling price on car sales.
Every year there is an introduction of technology, and the car’s selling price will increase yearly. Assuming the same marketing technique, the company has recorded the number of car unit sales for the last ten years. Based on this background, the specifications of the regression equation created by the researcher are as follows:
Y = b0 + b1X + e
Description:
Y = car sales (units)
X = price (thousands of USD)
b0, b1 = regression estimation coefficient
e = disturbance error
Data has been collected from 2010-2019. In this case, the data used is annual time series data from car sales and prices. To test the assumptions that need to be done, namely normality, heteroscedasticity, autocorrelation, and linearity tests. However, this article only focuses on how to manually calculate the value of the intercept coefficient (b0), b1 and the coefficient of determination (R squared) using Excel. The data that researchers have collected can be seen in the table below:
How to Calculate the Intercept Estimation Coefficient (bo) and b1
Following what I have written in the previous paragraph, to avoid errors in calculating manually, I am here using Excel. Using Excel will avoid mistakes in calculations because researchers can check the results of their calculation operations, and if there are errors, they can be corrected directly in Excel.
There are two ways to calculate the estimated coefficients b0 and b1: using the original sample observation and the deviation of the variables from their means. Because I will be calculating the coefficient of determination (R squared), I use the second method, namely, the variable’s deviation from their means. The formula used to calculate b0 and b1 based on the book Koutsoyiannis (1977) can be seen as follows:
Calculating the values of b0 and b1 using this formula cannot be done simultaneously. Researchers have to calculate the value of b1 first and then calculate the bo. There must be a value of b1 first to calculate bo.
Furthermore, to calculate the value of b1, it is necessary to calculate the difference between actual X and average X and actual Y and average Y. Researchers can create a table in excel that contains the components of calculating values that can be arranged as shown in the image below:
After creating a formula template in Excel, you calculate the average value of the Car Sales variable (Y) and the price variable (X). After calculating the average, find the difference between the actual Y and the average Y and between the actual X and the average X. In the next step, multiply xy and square x. In detail, the calculation stages can be seen in the image below:
Next, copy and paste the Excel formula from the 2nd year’s data to the last year’s data. To copy and paste formulas in Excel, you must pay attention to the absolute values of the average Y, and the average X. Absolute values can be applied by pressing F4 on the keyboard until a dollar sign appears. Next, please copy and paste the formula until you get the results as shown in the image below:
The next step is to calculate the value of the coefficient b1. To calculate the value of b1, use the formula I have written in the previous paragraph. The calculation results can be seen below:
The bo value is calculated using the formula I have written in the previous paragraph. After obtaining the value of b1, it can be used to calculate the value of b0. The results of the calculation of the coefficient of bo can be seen as follows:
How to Calculate the Coefficient of Determination (R Squared)
The coefficient of determination (R Squared) is important in simple linear regression analysis. The coefficient of determination in simple linear regression shows the goodness of fit of a model. The higher the value of the coefficient of determination indicates that the independent variable’s variance can explain the variance of the dependent variable well.
The value of the coefficient of determination is 0 to 1. The closer to 1, the value of the coefficient of determination is, the better the model can be. To manually calculate the coefficient of determination, you can use the formula that I cited from Koutsoyiannis (1977) as follows:
Based on the Excel table calculated in the previous paragraph, there is 1 component that still needs to be calculated according to the coefficient of determination formula. One column needs to be added in Excel, namely y squared, as can be seen in the image below:
The last step is calculating the coefficient of determination using the formula I wrote in the previous paragraph. Based on the calculation results, the coefficient of determination value is 0.8450. In detail, it can be seen as follows:
Based on what has been calculated in the previous paragraphs, we have manually calculated the coefficients of bo, b1 and the coefficient of determination using Excel. It can also be compared with the analysis results using statistical software to do a crosscheck. If the calculation in Excel is correct according to the tutorial in this article, then the results of manual calculations with statistical software will be similar.
That’s what I can write for all of you. Hopefully, it will be helpful for those who study how to calculate simple linear regression manually. Thank you, see you in next week’s article! Wait for the next manual calculation update (part 2).