How to Perform Residual Normality Analysis in Linear Regression Using R Studio and Interpret the Results

Residual normality testing is a key assumption check in linear regression analysis using the Ordinary Least Squares (OLS) method. One essential requirement of linear regression is that the residuals should follow a normal distribution. In this article, Kanda Data shares a tutorial on how to perform residual normality analysis in linear regression using R Studio, along with steps to interpret the results.

Let’s use the following case study as an example to practice residual normality testing in R Studio. The study aims to examine the effect of advertising costs and the number of marketing staff on product sales volume. Based on this objective, the equation specification can be written as follows:

π‘Œ=𝛽0+𝛽1𝑋1+𝛽2𝑋2

Where:

π‘Œ is product sales (in thousand units) as the dependent variable,

𝑋1 is advertising cost (in hundreds of US dollars) as the first independent variable,

𝑋2 is the number of marketing staff (in number of employees) as the second independent variable,

𝛽1 and 𝛽2 are regression coefficients,

πœ– represents the error or residual. For this exercise, data from 15 observations were collected for the variables Sales, Advertising Cost, and Marketing Staff. Details of the data are presented in the table below:

Steps to Perform Residual Normality Testing in Linear Regression Analysis

First, download and install the R application on your laptop. After successfully setting up R Studio, you’ll need to conduct multiple linear regression analysis before proceeding with residual normality testing.

After opening R Studio, import the data for analysis. There are two methods to do this: (a) Directly import data from Excel; and (b) Manually input the data using commands in R Studio.

For detailed instructions on importing data from Excel or manual input methods, refer to our previous articles.

Run the following command to perform the regression analysis:

model <- lm(Sales ~ Advertising_Cost + Marketing_Staff, data = data)

summary(model)

Press Enter or click Run to generate the regression analysis output, which should look similar to the following:

As mentioned earlier, residual normality testing ensures that the residuals from the regression equation in this case study follow a normal distribution.

You can use the Shapiro-Wilk test or a QQ plot for this purpose. In this tutorial, we’ll use the Shapiro-Wilk test. Run the following command in R Studio:

shapiro.test(residuals(model))

The output of the Shapiro-Wilk test will look like this:

Shapiro-Wilk normality test

data:Β  residuals(model)

W = 0.94428, p-value = 0.4393

The analysis shows a W value of 0.94428 and a p-value of 0.4393. Since the p-value is greater than 0.05, we conclude that the residuals are normally distributed, and the null hypothesis is accepted.

This concise tutorial is brought to you by Kanda Data. We hope this guide proves valuable to our readers. Stay tuned for more updates and articles from us!