How to Analyze Multicollinearity in Linear Regression Using R Studio

In linear regression analysis using the Ordinary Least Square method, it is necessary to ensure that there is no strong correlation between independent variables. To obtain the best linear unbiased estimator, there must not be a strong correlation between the independent variables.

If the results of a linear regression equation show a strong correlation between the independent variables, this is referred to as multicollinearity. Therefore, it is important for researchers to test for multicollinearity in linear regression equations.

The most commonly used method for detecting multicollinearity is by using the Variance Inflation Factor (VIF). This multicollinearity test can easily be analyzed using R Studio.

In this article, Kanda Data will explain how to analyze multicollinearity in R Studio along with the interpretation of the results. The case study example used in this article still utilizes the same data as in a previously written article.

The purpose of the case study is to analyze the effect of advertising costs and the number of marketing staff on product sales volume. We can construct a linear regression equation as follows:

π‘Œ=𝛽0+𝛽1𝑋1+𝛽2𝑋2+…+𝛽𝑛𝑋𝑛+πœ–

Where:

π‘Œ is product sales (in thousands of units),

𝑋1 is advertising cost (in hundreds of US dollars),

𝑋2 is the number of marketing staff (employees),

𝛽0 is the intercept (constant),

𝛽1 and 𝛽2 are regression coefficients,

πœ– is the error or residual.

Next, based on the obtained data, proceed with data input and tabulation. The input results for both the dependent and independent variables can be seen in detail in the table below.

Steps for Multicollinearity Testing in R Studio

Before conducting a multicollinearity test, the first essential step is to perform multiple linear regression analysis. The tutorial for this has been written in a previous article, but for deeper understanding, the necessary syntax in R Studio will be provided again here.

For steps on importing data from Excel, refer to a previously written article. Once the data to be analyzed is properly imported, write the following command in R Studio:

model <- lm(Sales ~ Advertising_Cost + Marketing_Staff, data = data)

summary(model)

After entering this command and pressing enter, the output of the analysis results in R Studio will appear as shown below.

Once the linear regression analysis is performed, proceed with the multicollinearity analysis. The purpose of the multicollinearity test is to detect the strength of correlation between the independent variables in the regression model. One method to perform this test is by using the Variance Inflation Factor (VIF), which we will demonstrate in this article.

To obtain the Variance Inflation Factor (VIF) value in R Studio, the ‘car’ package needs to be installed first. For those who are conducting a multicollinearity test in R Studio for the first time, follow the command below to install the ‘car’ package:

install.packages(“car”)

Next, if the ‘car’ package is successfully installed in R Studio, to conduct the multicollinearity test, write the following command:

library(car)

vif(model)

After entering the command and pressing enter, the output of the multicollinearity test using the VIF value will appear as follows:

Advertising_CostΒ  Marketing_Staff Β Β Β Β Β Β Β Β 

3.61358Β Β Β Β Β Β Β Β Β  3.61358

The analysis results show that the VIF values for the correlation between advertising cost and marketing staff are 3.61358. This value is small, being less than 10, indicating that there is no multicollinearity between the independent variables in the tested linear regression equation. Therefore, it can be concluded that there is no strong correlation between the independent variables in the regression equation of the case study example in this article.

Thus, this article shared by Kanda Data for this occasion. Hopefully, the content of this article can be beneficial and provide insight for those who need it. Stay tuned for updates on future articles.