KANDA DATA

  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Menu
  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Home/Data Analysis in R/How to Analyze Multicollinearity in Linear Regression and its Interpretation in R (Part 2)

Blog

2,006 views

How to Analyze Multicollinearity in Linear Regression and its Interpretation in R (Part 2)

By Kanda Data / Date Apr 17.2023
Data Analysis in R

Non-multicollinearity is one of the assumptions required in the ordinary least square (OLS) method of linear regression analysis. Non-multicollinearity assumption implies that there is no strong correlation among the independent variables in the equation.

If there is a strong correlation among the independent variables in the linear regression equation, the estimated coefficients may be potentially biased and inconsistent. Therefore, to obtain the best linear unbiased estimator (BLUE), a multicollinearity test is required.

Multicollinearity tests can be done by testing the correlation among the independent variables and by examining the value of the Variance Inflation Factor (VIF). If the correlation among the independent variables is high (>0.70), it may lead to multicollinearity problems.

Furthermore, if the multicollinearity test is done using the VIF value, the larger the VIF value, the greater the potential for multicollinearity problems. Therefore, in the multicollinearity test, it is expected to obtain a small value (<10).

Why is the multicollinearity test only performed on multiple linear regression?

As I have mentioned in the previous paragraph, the purpose of the multicollinearity test is to determine whether there is a strong correlation among the independent variables or not. Therefore, it can be understood that the correlation test is only conducted on the independent variables.

Thus, the multicollinearity test can only be performed on multiple linear regression analysis. However, in simple linear regression that consists of only one independent variable, there is no need to perform a multicollinearity test.

Mini Research Using Multiple Linear Regression Analysis

This is Part 2 of the multiple linear regression analysis and assumption test in R. The exercise material for multiple linear regression analysis still uses the same data as in the previous article entitled: “Multiple linear regression analysis and interpretation in R“.

The purpose of the mini-research example is to analyze the influence of cost and marketing on the sales of a product. The data was collected by researchers from 15 sales outlets owned by a company in X region. The data collected by the researcher can be seen in the table below:

How to Import a Dataset from Excel to R

Importing a dataset in R can be done by clicking “file” and then selecting “import dataset” from the various options available. As the data has been saved in Excel, select “from Excel”.

The next step involves browsing the location where the Excel file is saved. Then, a preview of the data inputted in Excel will appear. Click “Import” to proceed. If these steps are followed correctly and systematically, a preview of the imported data from Excel will appear in R studio.

The syntax for Multicollinearity Test in R

In order to perform a multicollinearity test in multiple linear regression analysis in R, researchers must first conduct the analysis itself. The following is a detailed syntax for conducting the multicollinearity test:

Based on the syntax above, the first step for researchers is to type the syntax for multiple linear regression analysis, such as sales ~ cost + marketing, which should be adjusted according to the number of variables used. Please note that capitalization and spelling of variable labels should be typed clearly as they appear in the preview in R studio.

Next, in the syntax “data = Multiple_Linear_Regression”, researchers should indicate the data source used. Please type the name of the file exactly as it appears when importing data.

After pressing “enter,” the next step is to type “library(car).” Once this is done, type “vif <- vif(model)” to model the value of the variance inflation factor. The value of variance inflation factor is obtained by dividing one by the tolerance value.

To display the value of VIF, type “vif” and press “enter.” Then, the output of the multicollinearity test for multiple linear regression analysis will appear on the screen.

Interpretation of Multicollinearity Test Output in R

The output of the multicollinearity test in R is similar to other analysis tools. The results of the multicollinearity test analysis in R can be seen in the figure below:

Based on the above figure, it can be seen that the variance inflation factor (VIF) value is 3.61358. From the output, it is known that the VIF value for the “cost” variable and the “marketing” variable is the same.

This is because the regression equation used is a multiple linear regression with two independent variables. Therefore, there is only one correlation value between the “cost” variable and the “marketing” variable.

Based on the VIF value of 3.61358 in the above figure, it can be concluded that the VIF value is < 10. This means that there is no multicollinearity problem in the linear regression equation.

Therefore, the regression equation satisfies one of the assumptions in the OLS linear regression method, which is that there is no strong correlation between the independent variables. This is the second part of the article that can be written on this occasion. Stay tuned for the next part of the article.

Tags: how to interpret multicollinearity result in R, How to perform multicollinearity test in R, interpret R multicollinearity test output, Kanda data, multicollinearity test in R, multicollinearity test using R studio, statistics, understanding multicollinearity test output in R

Related posts

How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness

Date Oct 02.2025

Regression Analysis for Binary Categorical Dependent Variables

Date Sep 27.2025

How to Sort Values from Highest to Lowest in Excel

Date Sep 01.2025

Leave a Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Article Publication
  • Assumptions of Linear Regression
  • Comparison Test
  • Correlation Test
  • Data Analysis in R
  • Econometrics
  • Excel Tutorial for Statistics
  • Multiple Linear Regression
  • Nonparametric Statistics
  • Profit Analysis
  • Regression Tutorial using Excel
  • Research Methodology
  • Simple Linear Regression
  • Statistics

Popular Post

October 2025
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Sep    
  • How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness
  • Regression Analysis for Binary Categorical Dependent Variables
  • How to Sort Values from Highest to Lowest in Excel
  • How to Perform Descriptive Statistics in Excel in Under 1 Minute
  • How to Tabulate Data Using Pivot Table for Your Research Results
Copyright KANDA DATA 2025. All Rights Reserved