Differences in Assumptions of Normality, Heteroscedasticity, and Multicollinearity in Linear Regression Analysis

If you analyze research data using linear regression, it is crucial to understand the required assumptions. Understanding these assumption tests is essential to ensure consistent and unbiased analysis results.

If you have studied econometrics, one of your lecturers might have emphasized the importance of fulfilling assumptions in linear regression using the OLS (Ordinary Least Squares) method. It is necessary to ensure and conduct a series of assumption tests to achieve the Best Linear Unbiased Estimator (BLUE).

Because of the importance of understanding these assumption tests, I am interested in writing an article discussing the core assumption tests in OLS linear regression. Specifically, I will discuss the differences in the assumptions of Normality, Heteroscedasticity, and Multicollinearity.

Normality Assumption Test in Regression

The first assumption test I will discuss is the normality test. In linear regression analysis, the normality test is a critical stage. Make sure you don’t skip it.

One key point to understand is that the normality test differs from other assumption tests in analysis. In linear regression, the normality test examines the residuals.

Again, I emphasize: in linear regression analysis, the normality test focuses on the residuals. Why?

According to theoretical books, one of the assumptions in linear regression requires the residuals to be normally distributed. Residuals that are not normally distributed fail to meet the assumption because they may lead to bias.

Another important point to understand is the definition of residuals. Residuals are the differences between the actual observed values and the predicted values.

In formulaic terms, residuals are the difference between the actual (Y) and the predicted Y. Residual values can be calculated manually or obtained using statistical software.

If you calculate residuals manually, you need to first calculate the intercept and regression estimation coefficients. These values are required to compute the predicted Y values.

Once the predicted Y values are calculated, residuals can be determined by subtracting the predicted Y values from the actual Y values. The next step is to perform a normality test on the calculated residuals.

Several tests can be used to check residual normality. The most popular tests among researchers are the Kolmogorov-Smirnov test and the Shapiro-Wilk test.

Both tests produce similar conclusions regarding residual normality. Their coefficients typically differ slightly, but the conclusion—whether to accept or reject the null hypothesis—is identical.

In the residual normality test, the null hypothesis can be stated as “residuals are normally distributed.” The alternative hypothesis (H1) is “residuals are not normally distributed.”

Using the Kolmogorov-Smirnov or Shapiro-Wilk test, a p-value is obtained as the basis for drawing conclusions. The decision criterion is as follows: if the p-value is less than or equal to 0.05, the null hypothesis is rejected.

Conversely, if the p-value is greater than 0.05, the null hypothesis is accepted. For example, if you perform a residual normality test and obtain a Kolmogorov-Smirnov p-value of 0.220, this value is greater than 0.05, so the null hypothesis is accepted.

Since the null hypothesis is accepted, it can be concluded that the residuals are normally distributed. This means that the assumption required for linear regression is satisfied in the given example.

Heteroscedasticity Assumption Test in Regression

The heteroscedasticity test is another assumption test you need to perform to obtain the Best Linear Unbiased Estimator. In this test, you must ensure that the residual variance is constant.

One assumption in OLS linear regression requires constant residual variance, also known as homoscedasticity.

Therefore, understanding how to detect heteroscedasticity in a regression equation is essential. If the heteroscedasticity detection shows that residual variance is not constant, the regression equation exhibits heteroscedasticity symptoms.

Regression equations with heteroscedasticity symptoms, or non-constant residual variance, may result in biased estimation outcomes. So, how do you detect heteroscedasticity?

Several tests can be used to detect heteroscedasticity, one of which is the Breusch-Pagan test.

Similar to the normality test, the heteroscedasticity test also requires a null hypothesis and an alternative hypothesis:

H0 = Residual variance is constant (homoscedasticity)

H1 = Residual variance is not constant (heteroscedasticity)

Suppose you perform a heteroscedasticity test using the Breusch-Pagan method and obtain a coefficient of 8.57 and a p-value of 0.159. Since the p-value is greater than 0.05, the null hypothesis is accepted.

Accepting the null hypothesis indicates constant residual variance or homoscedasticity. Hence, based on this test, the assumption required for OLS linear regression is satisfied.

Multicollinearity Assumption Test in Linear Regression

The multicollinearity test is another critical assumption test in linear regression analysis. It ensures no strong correlation exists between independent variables.

If independent variables strongly correlate, the regression equation faces multicollinearity issues, leading to biased estimation results. Therefore, conducting a multicollinearity test is vital to maintain consistent and unbiased results.

This test applies only to multiple linear regression, not simple linear regression, as its purpose is to assess correlations among independent variables.

You might wonder how to test for multicollinearity in linear regression. One way is to compute the correlations between independent variables. However, the most popular method among researchers is to use the Variance Inflation Factor (VIF).

In multiple linear regression, the VIF can be calculated manually or analyzed using statistical software.

Typically, a VIF value below 10 indicates no multicollinearity. Some sources even suggest that a VIF below 5 confirms the absence of multicollinearity. Thus, the multicollinearity test is crucial to ensure no strong correlation among independent variables.

This article has explained the differences between normality, heteroscedasticity, and multicollinearity tests in linear regression analysis. I hope it benefits those who have yet to understand these assumption tests thoroughly. Thank you for reading, and stay tuned for the next article update!