Residual normality testing is a key assumption check in linear regression analysis using the Ordinary Least Squares (OLS) method. One essential requirement of linear regression is that the residuals should follow a normal distribution. In this article, Kanda Data shares a tutorial on how to perform residual normality analysis in linear regression using R Studio, along with steps to interpret the results.
Letβs use the following case study as an example to practice residual normality testing in R Studio. The study aims to examine the effect of advertising costs and the number of marketing staff on product sales volume. Based on this objective, the equation specification can be written as follows:
π=π½0+π½1π1+π½2π2
Where:
π is product sales (in thousand units) as the dependent variable,
π1 is advertising cost (in hundreds of US dollars) as the first independent variable,
π2 is the number of marketing staff (in number of employees) as the second independent variable,
π½1 and π½2 are regression coefficients,
π represents the error or residual. For this exercise, data from 15 observations were collected for the variables Sales, Advertising Cost, and Marketing Staff. Details of the data are presented in the table below:
Steps to Perform Residual Normality Testing in Linear Regression Analysis
First, download and install the R application on your laptop. After successfully setting up R Studio, youβll need to conduct multiple linear regression analysis before proceeding with residual normality testing.
After opening R Studio, import the data for analysis. There are two methods to do this: (a) Directly import data from Excel; and (b) Manually input the data using commands in R Studio.
For detailed instructions on importing data from Excel or manual input methods, refer to our previous articles.
Run the following command to perform the regression analysis:
model <- lm(Sales ~ Advertising_Cost + Marketing_Staff, data = data)
summary(model)
Press Enter or click Run to generate the regression analysis output, which should look similar to the following:
As mentioned earlier, residual normality testing ensures that the residuals from the regression equation in this case study follow a normal distribution.
You can use the Shapiro-Wilk test or a QQ plot for this purpose. In this tutorial, weβll use the Shapiro-Wilk test. Run the following command in R Studio:
shapiro.test(residuals(model))
The output of the Shapiro-Wilk test will look like this:
Shapiro-Wilk normality test
data:Β residuals(model)
W = 0.94428, p-value = 0.4393
The analysis shows a W value of 0.94428 and a p-value of 0.4393. Since the p-value is greater than 0.05, we conclude that the residuals are normally distributed, and the null hypothesis is accepted.
This concise tutorial is brought to you by Kanda Data. We hope this guide proves valuable to our readers. Stay tuned for more updates and articles from us!