Thursday, July 25, 2024
HomeStatisticsFinding the Best Regression Model Based on R Square

# Finding the Best Regression Model Based on R Square

The regression model specification is an important step that researchers must consider. The exact specification will get the best regression model and represent real phenomena well.

As I have written in several previous articles, that linear regression is often chosen by researchers. Linear regression was selected to know the effect of a variable or several variables on other variables.

For the linear regression model to be a good model, the researcher must prove that the regression equation fulfils the required assumptions. The regression equation that achieves the required assumptions will get the best linear unbiased estimator.

## Assumption of Linear Regression using Ordinary Least Square (OLS) Method

Researchers widely use linear regression using the ordinary least square method to determine the effect of the independent variable on the dependent variable. Linear regression can be simple linear regression consisting of 1 independent variable or multiple linear regression composed of 2 or > 2 independent variables.

In linear regression using the ordinary least square method, several assumptions need to be tested by researchers. The regression assumption test on cross-section and time series data is almost identical.

The difference is that it is necessary to test the non-autocorrelation assumption in time series data. Meanwhile, it is not necessary to test the non-autocorrelation assumption in cross-sectional data. For more detailed information, I’ve written several articles related to assumption testing, namely:

The articles I have written above explain the basic theory, test methods, and how to interpret the output. Thus, to obtain a good regression model/equation, the required assumptions must be met.

## Understanding R Square

Based on the article’s title, I linked the best regression model with R Squared. The value of R Square is between 0-1. The value of R Square close to 1 indicates that the regression model/equation we create will be better.

R Squared is also known as the coefficient of determination. This R Squared value will be easier to understand if we can interpret R Squared well.

For example, the value of R Square of a regression equation is 0.87. The value of R Square can be interpreted as the independent variable’s variance explains the dependent variable’s variance of 87%. The remaining 13% is explained by other variables that are not included in the regression equation.

## Best Regression Model with R Square Approach

After we understand the best regression model and R Squared, the researcher can determine the best regression model based on the R Squared value approach.

The selection of the best regression model means that the researcher composes several regression equations. Next, the researcher will test each regression equation to determine which regression equation is the best.

With the R Squared approach, researchers can choose the best regression equation that has the highest R Squared value. However, there is an important point that researchers need to pay attention to.

The researcher has a justification for compiling the regression equation/regression model. It is better if the justification for preparing the regression equation (model specifications) is based on theoretical references and the results of previous studies (empirical data). Furthermore, researchers can develop new ideas (novelty) or refine the results of previous research.

## Selection of the Best Regression Equation by Stepwise Method

The selection of the best regression equation with the R Squared approach can be analyzed by the stepwise method. SPSS statistical software can be used to perform stepwise regression analysis.

Researchers do not need to perform regression analysis with various combinations of dependent and independent variables one by one. Researchers only need to input data in SPSS and proceed with stepwise linear regression analysis.

Furthermore, the output of various equations consisting of a combination of dependent and independent variables will appear. Researchers can interpret the analysis result directly based on a high R Squared value.

## Exercise Linear Regression Analysis with Stepwise Method using SPSS

I have created a mini-research example to make it easier to understand choosing the best regression model with an R Squared value approach. In this study, three variables are expected to affect the car sales volume.

The three variables are advertising costs, marketing personnel, and operational costs. Researchers can make several equations by combining several independent variables to obtain the best regression equation.

Examples of combinations of independent variables that can be created are:

1. Car Sales Volume = Advertising Cost

2. Car Sales Volume = Marketing Personnel

3. Car Sales Volume = Operational Cost

4. Car Sales Volume = Advertising Cost + Marketing Personnel

5. Car Sales Volume = Advertising Cost + Operational Cost

6. Car Sales Volume = Marketing Personnel + Operational Cost

7. Car Sales Volume = Advertising Cost + Marketing Personnel + Operational Cost

Based on the combination of regression equations that have been created, the researcher can identify the best equation based on the R Squared value. Through stepwise regression analysis using SPSS, the researcher only needs to input and run the regression analysis.

For example, car sales volume data has been collected from 15 stores in ABC city. The data that has been collected is then tabulated into SPSS. The results of the data tabulation can be seen in the image below:

Furthermore, the “Variable View” setting in SPSS can be seen in the image below:

The next step is for researchers to perform a stepwise linear regression analysis (the assumption of linear regression using the OLS method is considered to have been fulfilled). The way of analysis is: Analyze -> Regression -> Linear.

Next, move the car sales volume variable into the dependent box and move the advertising costs, marketing personnel, and operational costs variables into the independent box. Next to the method, select stepwise. For more details, see the image below:

Then click Ok, and the output of the analysis will appear.

## How to Interpret SPSS Output

Based on the SPSS output, several items will appear: variables entered/removed, model summary, ANOVA, Coefficient, and Excluded Variables. Based on the SPSS output, the four best models have a high R Square value and the F Anova Sig test has a p-value <0.05.

The Output Model Summary can be seen in the figure below:

Based on the output model summary, the 1st to 4th model can be concluded that it is a good model because it has an R Square value between 0.890 to 0.976. The 3rd regression model is a regression model that has the highest R Square value of 0.976.

In the 3rd regression model, all independent variables are used: Advertising Cost + Marketing Personnel + Operational Cost. However, based on the regression t-test, from this combination of independent variables, there is one variable that partially does not significantly affect the dependent variable. In more detail, the output of the regression t-test can be seen in the image below:

Based on SPSS output, the other three models are also good. The researcher can decide which regression equation to use. Thus, in this article, we have learned the selection of the best regression model based on the R Squared value using the stepwise method in SPSS. Hopefully, this article is useful for all of you. See you in the next article update!

RELATED ARTICLES