Understanding Normality Test in Ordinary Least Squares Linear Regression

Linear regression analysis examines the influence of independent variables on dependent variables. This analysis can take the form of simple linear regression or multiple linear regression. Most linear regression analyses utilize the Ordinary Least Squares (OLS) method.

Several assumptions must be met to obtain unbiased estimation results. One of the assumptions required in the Ordinary Least Squares linear regression analysis is that the residuals are normally distributed. This test is often referred to as the normality test.

What needs to be understood here is that what is being tested for normality is the residuals, not the observed data. Therefore, the residual values must be calculated first to perform a normality test.

Considering the importance of understanding the normality test in linear regression, I want to discuss this topic in more detail.

Definition of Residual

A residual is the difference between the actual and predicted Y values. Here, Y refers to the dependent variable in the regression equation.

Based on the data collected in the research, you will obtain observation values from both samples and populations. These values are referred to as actual Y values.

We need to estimate the linear regression equation to obtain the predicted Y value. The regression equation can be estimated through manual calculations or statistical analysis applications.

We can formulate the regression equation based on the estimation results, including the estimated coefficients of the intercept and all independent variables in the equation we create.

Next, we input the actual values of each independent variable into the estimated equation. The values obtained from each observation are known as predicted Y values.

The result of the difference between the actual Y value and the predicted Y value is the residual value. We then use these residual values to test for normality.

The availability of statistical applications assists us in conducting normality tests. Various statistical applications offer specific advantages in data processing.

However, for basic statistical processing, most of the necessary tools are already available in existing statistical applications. There are numerous statistical applications, such as SPSS, STATA, SAS, Minitab, R studio, etc.

Based on the analysis options in several statistical applications, there are two tests commonly used by researchers to test for normality. These two tests are the Shapiro-Wilk Test and the Kolmogorov-Smirnov Test.

Both of these tests will yield the same conclusion. You can use either one of these tests or both for cross-checking purposes.

Formulating Hypotheses for Normality Test

We need to formulate statistical hypotheses for the normality test to test whether the residuals are normally distributed.

As I mentioned in the previous paragraph, what we are testing for normality is the values of the residuals. Therefore, statistical hypotheses can be formulated as follows:

Ho: Residuals are normally distributed

H1: Residuals are not normally distributed

After formulating the statistical hypotheses, the next step is establishing the criteria for accepting the hypotheses.

Generally, the confidence interval used in research is 95%. Consequently, the critical alpha level is set at 0.05. Therefore, the criteria for accepting hypotheses can be organized as follows:

P-value > 0.05, then the null hypothesis is accepted

P-value ≤ 0.05, then the null hypothesis is rejected (accepting the alternative hypothesis)

Example of Interpreting Normality Test Output

To ensure a better understanding, let’s walk through a simple example. Suppose the result of the Shapiro-Wilk test shows a P-value of 0.245. What’s the conclusion?

If we compare this P-value to the criteria for accepting hypotheses, we can conclude that the P-value of 0.245 is greater than 0.05. Since P-value > 0.05, we accept the null hypothesis.

Therefore, it can be inferred that the residuals are normally distributed. Because the residuals follow a normal distribution, the equation being tested satisfies the assumption of normality in linear regression using the Ordinary Least Squares method.

In addition to normality testing, we also need to conduct other assumption tests, such as tests for heteroskedasticity, multicollinearity, and autocorrelation for time series data.

It concludes the article I can provide for this occasion. I hope it proves beneficial and adds value to the knowledge of anyone in need. Stay tuned for more educational content updates from me. Look out for the next article update from Kanda Data in the coming week. Thank you.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

KANDA DATA

Blog