How to Test the Normality Assumption in Linear Regression and Interpreting the Output

May 13, 2022

9,536 views

The normality test is one of the assumption tests in linear regression using the ordinary least square (OLS) method. The normality test is intended to determine whether the residuals are normally distributed or not.

The normality assumption must be fulfilled to obtain the best linear unbiased estimator. Regression models that fulfill the required assumptions have a chance to get the correct hypothesis testing results.

In the normality assumption test in linear regression, you test the residuals, not the variable data. The assumption required in the OLS linear regression method is that the residuals are normally distributed.

In the normality test, it is recommended that you formulate the hypothesis first. We can create a null hypothesis and an alternative hypothesis.

In hypothesis testing, we use statistical software to test the null hypothesis. The hypothesis for the normality test can be created as follows:

Ho: Residuals are normally distributed

H1: Residuals are not normally distributed

Previously, do you still remember what residual is? Residual is the difference between the actual Y and the predicted Y variables. Next, how to test the hypothesis?

For testing the hypothesis, you can choose the analysis tools that you think are easy to do. I decided to test for normality using the Shapiro Wilk test on this occasion.

Shapiro and Wilk proposed this test in 1965. This normality test is effective for small samples.

In the test criteria, we can see the p-value is compared with the previously set alpha. For example, if we set an alpha of 0.05 (5%), then the criteria for testing the hypothesis are:

P-value > 0.05: Ho is accepted

P-value <= 0.05: Ho is rejected (H1 is accepted)

Normality Test Using Mini Research

To provide a more in-depth understanding, I suggest you can exercise using the data that I will convey. An example of a mini-research used on this occasion is a study that aims to determine the effect of income and population on rice consumption.

In the mini-research, income and population were used as independent variables. Rice consumption is used as the dependent variable. The data we use for exercise can be seen in the table below:

How to test for normality of Shapiro Wilk in STATA

In the first step, you open the STATA application. Furthermore, under the menu options in STATA, you will find several icons. You select the table icon with a pencil drawing (Data Editor).

Next will find the “Data Editor (Edit)” window. In the next step, you input all the data I have conveyed above.

Data from the rice consumption variable (Y) is inputted in the first column, then data from the income (X1) and population (X2) variables are entered in the 2nd column and 3rd column. Next, you create the name and label the variable on the top right, as shown below:

You have input data successfully in STATA up to this stage, and the data is ready to be analyzed. Furthermore, because we conducted the residual normality test, we must first find the residual value.

To find the residual value, you need to perform a regression analysis first. To perform a regression analysis, type in the command in STATA as follows:

regress Y X1 X2

Next, you can press enter, and the results of the linear regression analysis will appear from the variables that we have input. To get the residual value, then you type in the command in STATA as follows:

predict res,r

Next, you can press enter, and the residual value will appear. To check the residual value, you can click the data editor again. The results of the residual value can be seen in the image below:

To test the normality of the residuals using Shapiro Wilk, then you type in the command in STATA as follows:

swilk res

Next, you can press enter, and the normality test results using Shapiro Wilk will appear.

Normality Test Output and Interpreting the Output

The output of the Shapiro Wilk normality test based on the results of the analysis using STATA can be seen in the table below:

Based on the normality test results according to the table above, the prob>z value is 0.68364. Based on this value, the p-value is greater than 0.05, so the null hypothesis is accepted.

Based on the hypothesis that has been created previously, the results of hypothesis testing indicate that the null hypothesis is accepted. Thus it can be concluded that the residuals are normally distributed.

Because the residuals are normally distributed, the regression model created has fulfilled the normality assumption. Next, we need to test other assumptions, such as non-multicollinearity, non-heteroscedasticity, etc.

Well, that’s the article on this occasion that kanda data can convey. I hope this article will be beneficial for all of us. See you in the following article!

3 COMMENTS

Mark Dwyer October 5, 2022 At 9:31 pm

Thank you for this guide to testing for normality and for the detailed example.

In R, the Shapiro-Wilk test is limited to no more than 5K observations. You might add the Anderson-Darling test as well.

“The normality test is one of the assumption tests in linear regression using the ordinary least square (OLS) method. The normality test is intended to determine whether the residuals are normally distributed or not. The normality assumption must be fulfilled to obtain the best linear unbiased estimator. Regression models that fulfill the required assumptions have a chance to get the correct hypothesis testing results. ”
These statements are not true. Proving that OLS is BLUE does not depend on normality. Most moderately large data sets are sufficiently stable that central limit theorems imply conventional test statistics effectively follow asymptotic (e.g., chi-squared) distributions without assuming the underlying data are normally distributed.
See, for example, “Introductory Econometrics, A Modern Approach,” by Jeffrey M. Wooldridge.

Reply
- Kanda Data October 8, 2022 At 6:50 am
  
  Thank you for the valuable insight. Normality tests can conduct with several test approaches, one of which is using Shapiro-Wilk. In addition to the normality test, other assumption tests need to be tested to obtain BLUE, such as non-heteroscedasticity, linearity, non-multicollinearity, etc.
  
  Reply
How to Analyze Paired Sample t-Test and Independent Sample t-Test - KANDA DATA December 4, 2022 At 10:46 am

[…] If researchers observe variables measured by interval or ratio data scales, they can use the t-test. However, before using the t-test, researchers need to conduct an additional test, namely the normality test. The normality test ensures that the data group tested for mean differences is normally distributed. Normality test tutorials can read my previous article entitled: “How to Test the Normality Assumption in Linear Regression and Interpreting the Output“ […]

Reply

How to Test the Normality Assumption in Linear Regression and Interpreting the Output

Normality Test Using Mini Research

How to test for normality of Shapiro Wilk in STATA

Normality Test Output and Interpreting the Output

Assumptions of Multiple Linear Regression on Time Series Data

When is autocorrelation testing performed in linear regression analysis?

Understanding the Essence of Assumption Testing in Linear Regression Analysis: Prominent Differences between Cross-Sectional Data and Time Series Data

3 COMMENTS

LEAVE A REPLY Cancel reply

Most Popular

Assumptions of Multiple Linear Regression on Time Series Data

Analysis of Cobb-Douglas Production Function: Theoretical Basics and Case Study Examples

Understanding the Profit Formula in Financial Analysis and Examples of Its Calculation

What to Do If the Regression Coefficient Is Negative?

Why Should Data Transformation Be Done Only Once?

How to Find Residuals Using the Data Analysis ToolPak in Excel

Analyzing Rice Production Changes with a Paired t-Test Before and After Training Using Excel

Recent Comments

ABOUT US

FOLLOW US