KANDA DATA

  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
Menu
  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
Home/Data Analysis in R/How to Analyze Multicollinearity in Linear Regression Using R Studio

Blog

1,161 views

How to Analyze Multicollinearity in Linear Regression Using R Studio

By Kanda Data / Date Nov 25.2024 / Category Data Analysis in R

In linear regression analysis using the Ordinary Least Square method, it is necessary to ensure that there is no strong correlation between independent variables. To obtain the best linear unbiased estimator, there must not be a strong correlation between the independent variables.

If the results of a linear regression equation show a strong correlation between the independent variables, this is referred to as multicollinearity. Therefore, it is important for researchers to test for multicollinearity in linear regression equations.

The most commonly used method for detecting multicollinearity is by using the Variance Inflation Factor (VIF). This multicollinearity test can easily be analyzed using R Studio.

In this article, Kanda Data will explain how to analyze multicollinearity in R Studio along with the interpretation of the results. The case study example used in this article still utilizes the same data as in a previously written article.

The purpose of the case study is to analyze the effect of advertising costs and the number of marketing staff on product sales volume. We can construct a linear regression equation as follows:

π‘Œ=𝛽0+𝛽1𝑋1+𝛽2𝑋2+…+𝛽𝑛𝑋𝑛+πœ–

Where:

π‘Œ is product sales (in thousands of units),

𝑋1 is advertising cost (in hundreds of US dollars),

𝑋2 is the number of marketing staff (employees),

𝛽0 is the intercept (constant),

𝛽1 and 𝛽2 are regression coefficients,

πœ– is the error or residual.

Next, based on the obtained data, proceed with data input and tabulation. The input results for both the dependent and independent variables can be seen in detail in the table below.

Steps for Multicollinearity Testing in R Studio

Before conducting a multicollinearity test, the first essential step is to perform multiple linear regression analysis. The tutorial for this has been written in a previous article, but for deeper understanding, the necessary syntax in R Studio will be provided again here.

For steps on importing data from Excel, refer to a previously written article. Once the data to be analyzed is properly imported, write the following command in R Studio:

model <- lm(Sales ~ Advertising_Cost + Marketing_Staff, data = data)

summary(model)

After entering this command and pressing enter, the output of the analysis results in R Studio will appear as shown below.

Once the linear regression analysis is performed, proceed with the multicollinearity analysis. The purpose of the multicollinearity test is to detect the strength of correlation between the independent variables in the regression model. One method to perform this test is by using the Variance Inflation Factor (VIF), which we will demonstrate in this article.

To obtain the Variance Inflation Factor (VIF) value in R Studio, the ‘car’ package needs to be installed first. For those who are conducting a multicollinearity test in R Studio for the first time, follow the command below to install the ‘car’ package:

install.packages(“car”)

Next, if the ‘car’ package is successfully installed in R Studio, to conduct the multicollinearity test, write the following command:

library(car)

vif(model)

After entering the command and pressing enter, the output of the multicollinearity test using the VIF value will appear as follows:

Advertising_CostΒ  Marketing_Staff Β Β Β Β Β Β Β Β 

3.61358Β Β Β Β Β Β Β Β Β  3.61358

The analysis results show that the VIF values for the correlation between advertising cost and marketing staff are 3.61358. This value is small, being less than 10, indicating that there is no multicollinearity between the independent variables in the tested linear regression equation. Therefore, it can be concluded that there is no strong correlation between the independent variables in the regression equation of the case study example in this article.

Thus, this article shared by Kanda Data for this occasion. Hopefully, the content of this article can be beneficial and provide insight for those who need it. Stay tuned for updates on future articles.

Tags: Detecting Multicollinearity in R, econometrics, Kanda data, Linear Regression R Studio, Multicollinearity Analysis, Multicollinearity Testing Guide, statistics, variance inflation factor

Related posts

How to Sort Values from Highest to Lowest in Excel

Date Sep 01.2025

How to Perform Descriptive Statistics in Excel in Under 1 Minute

Date Aug 21.2025

How to Tabulate Data Using Pivot Table for Your Research Results

Date Aug 18.2025

Categories

  • Article Publication
  • Assumptions of Linear Regression
  • Comparison Test
  • Correlation Test
  • Data Analysis in R
  • Econometrics
  • Excel Tutorial for Statistics
  • Multiple Linear Regression
  • Nonparametric Statistics
  • Profit Analysis
  • Regression Tutorial using Excel
  • Research Methodology
  • Simple Linear Regression
  • Statistics

Popular Post

September 2025
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« Aug    
  • How to Sort Values from Highest to Lowest in Excel
  • How to Perform Descriptive Statistics in Excel in Under 1 Minute
  • How to Tabulate Data Using Pivot Table for Your Research Results
  • Dummy Variables: A Solution for Categorical Variables in OLS Linear Regression
  • The Difference Between Residual and Error in Statistics
Copyright KANDA DATA 2025. All Rights Reserved