How to Calculate the Variance Inflation Factor (VIF) in a Multicollinearity Test for Regression

Post Views: 179

Reading Time: 5 minutes

In linear regression analysis, to obtain the best linear unbiased estimator, you need to perform a series of assumption tests. One of the assumption tests required in linear regression is the multicollinearity test.

The multicollinearity test in a linear regression equation is intended to ensure that there is no strong correlation between independent variables. Since we want to test whether there is a strong correlation between independent variables, the multicollinearity assumption test is only conducted in multiple linear regression equations. In multiple linear regression, there must be at least two independent variables.

One of the most commonly used methods by researchers to detect multicollinearity is the Variance Inflation Factor (VIF). The VIF value indicates whether the tested linear regression model has a multicollinearity problem. If the VIF value exceeds 10, the linear regression equation is generally considered to have a multicollinearity issue.

Due to the importance of understanding the multicollinearity test in linear regression, it is crucial to grasp how the VIF value is obtained. In this article, I will discuss and provide a tutorial on how to manually calculate the VIF value in a multicollinearity test for regression. To facilitate the calculation process, I will use Excel to assist with the mathematical computations.

Formula for Variance Inflation Factor (VIF)

To calculate the Variance Inflation Factor (VIF) value, you need to know the formula used. The key component in the formula for calculating VIF is the coefficient of determination (R Square).

However, it is important to note that the R Square value used is obtained from regressing an independent variable against the other independent variables. The detailed formula for calculating VIF can be seen in the image below:

Based on the formula above, the first step is to calculate the coefficient of determination. The coefficient of determination can be calculated manually or by using the “Data Analysis” menu in Excel.

If you want to learn how to manually calculate the coefficient of determination, please refer to my previous article on this website. In this article, I will demonstrate how to obtain the coefficient of determination using the “Data Analysis” menu in Excel.

Case Study Example

To better understand how to calculate the VIF value, this article provides a case study of multiple linear regression with two independent variables. In this case study, a researcher collects time-series data, including inflation rate data, unemployment rate data, and economic growth data for a specific region.

Based on the collected data, the detailed input data can be seen in the table below:

From the table above, we can see that there are three variables: inflation rate, unemployment rate, and economic growth. The researcher aims to observe the impact of inflation and unemployment rates on economic growth.

Therefore, the inflation rate and unemployment rate are treated as independent variables, denoted as X1 for the inflation rate and X2 for the unemployment rate. Meanwhile, the dependent variable, which is influenced by these independent variables, is economic growth, denoted as Y.

The next step is to calculate the coefficient of determination, which will be used to compute the VIF value. Let’s calculate the coefficient of determination based on the case study presented in this article.

Calculation of the Coefficient of Determination for VIF Calculation

As I mentioned in the previous paragraph, we use a slightly different coefficient of determination value compared to the usual approach. When calculating the VIF for a multicollinearity test in regression, we need to determine the coefficient of determination for the influence of one independent variable on another independent variable.

It is important to note that we do not use the coefficient of determination for the influence of X1 and X2 on Y. Instead, we will find the coefficient of determination for the influence of variable X2 on variable X1, or in other words, we need to determine the coefficient of determination for the impact of the unemployment rate on the inflation rate. This coefficient of determination will then be used to calculate the VIF value.

The first step is to open the “Data” ribbon in Excel. Then, look at the top right corner for the “Data Analysis” menu. If you do not see the “Data Analysis” menu in the top right corner, you need to activate the Data Analysis Toolpak in Excel first. You can find an article on how to do this on my website by searching for the keyword “Data Analysis Toolpak.”

Next, after clicking the “Data Analysis” menu, a window will appear with several analysis options provided by Excel. Select “Regression” and click “OK.” This will open the “Regression” window. Then, input all the data along with its labels for the inflation rate variable (X1) into the “Input Y Range” box.

Similarly, input all the data and its labels for the unemployment rate variable (X2) into the “Input X Range” box. Next, enable or check the “Labels” option and select a 95% confidence level. You can choose to save the output analysis results in the same Excel sheet. The detailed steps can be seen in the image below:

Based on the image, after clicking “OK,” the regression analysis results will appear, including the coefficient of determination value. The detailed output containing the coefficient of determination (R Square) can be seen in the image below:

From the image above, we can see that the R Square value is 0.36808. Next, we will use this R Square value to calculate the VIF.

Calculation of the VIF Value

Now, let’s refer back to the formula I mentioned in the previous paragraph for calculating the VIF value. In detail, by applying mathematical principles and substituting the coefficient of determination value obtained earlier, the calculation steps are as follows:

Based on the calculation results, the VIF value is 1.58. Since the VIF value of 1.58 is less than 5, we can conclude that the regression equation being tested does not exhibit multicollinearity issues.

Thus, there is no strong correlation between the inflation rate variable (X1) and the unemployment rate variable (X2). Based on this analysis, it is concluded that the regression equation meets the non-multicollinearity assumption.

That concludes the article I have written and shared with all of you. I hope this is useful and enhances our knowledge. Stay tuned for the next article update from Kanda Data next week. Thank you.

Tags: Coefficient of determination, Data analysis in excel, Kanda data, multicollinearity test, multiple linear regression, r square, Regression Analysis, Regression Assumptions, research methodology, Statistical Analysis, statistics, variance inflation factor, VIF Calculation

Categories: Assumptions of Linear Regression

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

How to Calculate the Variance Inflation Factor (VIF) in a Multicollinearity Test for Regression

Formula for Variance Inflation Factor (VIF)

Case Study Example

Calculation of the Coefficient of Determination for VIF Calculation

Calculation of the VIF Value

Related Posts:-

Can Outliers Make Your Data Look Non-Normal? Here’s a Simulation and How to Handle It

Assumption Tests in Linear Regression Using Survey Data

Normality Test in Regression: Should We Test the Raw Data or the Residuals?

Popular Post