How to Analyze Heteroskedasticity in Linear Regression Using R Studio

Heteroskedasticity testing is an assumption test in linear regression using the OLS method to ensure that the residual variance is constant. A constant residual variance is referred to as homoskedasticity.

If the test results show that the residual variance is not constant, the regression model is said to have a heteroskedasticity problem. To obtain the best linear unbiased estimator in the regression equation, the model must be ensured to be free from heteroskedasticity.

In this article, Kanda Data will share a tutorial on how to test for heteroskedasticity in linear regression using R Studio. Letโ€™s use the following case study as practice material for heteroskedasticity testing using R Studio.

The topic of this case study continues from an article Kanda Data wrote last week. The research topic for this practice case study aims to determine the effect of advertising costs and the number of marketing staff on product sales volume in the ABC region. Based on this example, we can formulate the linear regression model as follows:

๐‘Œ=๐›ฝ0+๐›ฝ1๐‘‹1+๐›ฝ2๐‘‹2

Where:

๐‘Œ is product sales (thousand units),

๐‘‹1 is advertising cost (hundred US dollars),

๐‘‹2 is the number of marketing staff (employees),

๐›ฝ0 is the intercept (constant),

๐›ฝ1 and ๐›ฝ2 are regression coefficients,

๐œ– is the error or residual.

Data consisting of 15 observations for the variables Sales, Advertising Cost, and Marketing Staff is available in detail in the table below:

Steps for Heteroskedasticity Testing in Linear Regression Analysis

Once we have the data to be used for practice analysis in this article, you can download and install the R application on your laptop. Once R Studio is successfully installed, the next step is to perform multiple linear regression analysis first before conducting heteroskedasticity testing.

After opening R Studio, the next step is to input the data to be analyzed into R Studio. There are two options available: importing data directly from Excel or manually entering it in R Studioโ€™s command line.

For steps and tutorials on how to import data from Excel into R Studio and how to manually input data into R Studio, you can refer to previous articles posted by Kanda Data.

To perform multiple linear regression analysis, use the following command:

model <- lm(Sales ~ Advertising_Cost + Marketing_Staff, data = data)

summary(model)

After pressing Enter or clicking ‘Run,’ the analysis output will appear as shown in the image below:

Based on the analysis results, the next step is to test for heteroskedasticity in the regression equation. As mentioned in the opening paragraph of this article, heteroskedasticity testing aims to ensure that the variance of the residuals from the multiple linear regression equation in the case study above is constant.

Heteroskedasticity can be tested using the Breusch-Pagan test in R Studio. Initially, we need to install the โ€˜lmtestโ€™ package:

install.packages(“lmtest”)

Use the following command to test for non-heteroskedasticity in the regression equation:

library(lmtest)

bptest(model)

The Breusch-Pagan test output is as follows:

studentized Breusch-Pagan test

data:ย  model

BP = 1.3924, df = 2, p-value = 0.4985

The analysis results show a Breusch-Pagan value of 1.3924 with a p-value of 0.4985 (p-value > 0.05). It can be concluded that there is no heteroskedasticity problem (null hypothesis accepted). This result indicates that the residual variance is constant across the range of independent variable values.

This article by Kanda Data concludes this weekโ€™s topic. We hope this short article provides valuable insights for readers. Stay tuned for next weekโ€™s article update from Kanda Data. Happy learning!