Sunday, July 21, 2024
HomeData Analysis in RHow to Analyze Multicollinearity in Linear Regression and its Interpretation in R...

# How to Analyze Multicollinearity in Linear Regression and its Interpretation in R (Part 2)

Non-multicollinearity is one of the assumptions required in the ordinary least square (OLS) method of linear regression analysis. Non-multicollinearity assumption implies that there is no strong correlation among the independent variables in the equation.

If there is a strong correlation among the independent variables in the linear regression equation, the estimated coefficients may be potentially biased and inconsistent. Therefore, to obtain the best linear unbiased estimator (BLUE), a multicollinearity test is required.

Multicollinearity tests can be done by testing the correlation among the independent variables and by examining the value of the Variance Inflation Factor (VIF). If the correlation among the independent variables is high (>0.70), it may lead to multicollinearity problems.

Furthermore, if the multicollinearity test is done using the VIF value, the larger the VIF value, the greater the potential for multicollinearity problems. Therefore, in the multicollinearity test, it is expected to obtain a small value (<10).

## Why is the multicollinearity test only performed on multiple linear regression?

As I have mentioned in the previous paragraph, the purpose of the multicollinearity test is to determine whether there is a strong correlation among the independent variables or not. Therefore, it can be understood that the correlation test is only conducted on the independent variables.

Thus, the multicollinearity test can only be performed on multiple linear regression analysis. However, in simple linear regression that consists of only one independent variable, there is no need to perform a multicollinearity test.

## Mini Research Using Multiple Linear Regression Analysis

This is Part 2 of the multiple linear regression analysis and assumption test in R. The exercise material for multiple linear regression analysis still uses the same data as in the previous article entitled: “Multiple linear regression analysis and interpretation in R“.

The purpose of the mini-research example is to analyze the influence of cost and marketing on the sales of a product. The data was collected by researchers from 15 sales outlets owned by a company in X region. The data collected by the researcher can be seen in the table below:

## How to Import a Dataset from Excel to R

Importing a dataset in R can be done by clicking “file” and then selecting “import dataset” from the various options available. As the data has been saved in Excel, select “from Excel”.

The next step involves browsing the location where the Excel file is saved. Then, a preview of the data inputted in Excel will appear. Click “Import” to proceed. If these steps are followed correctly and systematically, a preview of the imported data from Excel will appear in R studio.

## The syntax for Multicollinearity Test in R

In order to perform a multicollinearity test in multiple linear regression analysis in R, researchers must first conduct the analysis itself. The following is a detailed syntax for conducting the multicollinearity test:

Based on the syntax above, the first step for researchers is to type the syntax for multiple linear regression analysis, such as sales ~ cost + marketing, which should be adjusted according to the number of variables used. Please note that capitalization and spelling of variable labels should be typed clearly as they appear in the preview in R studio.

Next, in the syntax “data = Multiple_Linear_Regression”, researchers should indicate the data source used. Please type the name of the file exactly as it appears when importing data.

After pressing “enter,” the next step is to type “library(car).” Once this is done, type “vif <- vif(model)” to model the value of the variance inflation factor. The value of variance inflation factor is obtained by dividing one by the tolerance value.

To display the value of VIF, type “vif” and press “enter.” Then, the output of the multicollinearity test for multiple linear regression analysis will appear on the screen.

## Interpretation of Multicollinearity Test Output in R

The output of the multicollinearity test in R is similar to other analysis tools. The results of the multicollinearity test analysis in R can be seen in the figure below:

Based on the above figure, it can be seen that the variance inflation factor (VIF) value is 3.61358. From the output, it is known that the VIF value for the “cost” variable and the “marketing” variable is the same.

This is because the regression equation used is a multiple linear regression with two independent variables. Therefore, there is only one correlation value between the “cost” variable and the “marketing” variable.

Based on the VIF value of 3.61358 in the above figure, it can be concluded that the VIF value is < 10. This means that there is no multicollinearity problem in the linear regression equation.

Therefore, the regression equation satisfies one of the assumptions in the OLS linear regression method, which is that there is no strong correlation between the independent variables. This is the second part of the article that can be written on this occasion. Stay tuned for the next part of the article.

RELATED ARTICLES