Tutorial on R Studio: Testing Residual Normality in Multiple Linear Regression for Time Series Data

The normality test in multiple linear regression analysis is aimed at detecting whether the residuals are normally distributed. In research using time series data, it is also necessary to perform a normality test to ensure that the required assumptions are met.

Multiple linear regression using the ordinary least square method requires residuals to be normally distributed. Normally distributed residuals are essential to obtain the Best Linear Unbiased Estimator in the analyzed linear regression equation.

In this article, I will provide a tutorial on how to analyze residual normality for time series data using R Studio. As we all know, R Studio can assist researchers in data analysis.

Before diving into the step-by-step residual normality test in R Studio, let’s first understand what residuals are. Residuals are the differences between the actual observed values (Y actual) and the predicted values of those observations (Y predicted).

Example of a Time Series Data Case Study

As mentioned in the title, this tutorial will practice with time series data. First, let’s align our understanding of the definition of time series data. Time series data refers to data collected for a subject at specific time intervals, such as food consumption data for a country collected from 2000 to 2024.

In this article, we use a time series data example from research aimed at determining the impact of inflation rates and unemployment rates on economic growth.

For this study, data were collected for 30 quarterly periods, including variables such as inflation rate, unemployment rate, and economic growth in country ABC. The specification of the multiple linear regression equation for this time series data can be written as follows:

π‘Œ=𝛽0+𝛽1𝑋1+𝛽2𝑋2+…+𝛽𝑛𝑋𝑛+πœ–

Where:

π‘Œ is economic growth (%),

𝑋1 is the inflation rate (%),

𝑋2 is the unemployment rate (%),

𝛽0 is the intercept (constant),

𝛽1 and 𝛽2 are regression coefficients,

πœ– is the error or residual.

To solve this linear regression equation and for practice in this article, please refer to the data collected as shown in the table below:

Tutorial: Residual Normality Test in Multiple Linear Regression for Time Series Data in R Studio

Before performing the residual normality test, we need to first write the command for multiple linear regression analysis. Ensure that the data is correctly input into R Studio before starting the analysis.

Data input can be done manually or by importing data from other file formats, such as Excel. I have previously written a tutorial on how to input data into R Studio.

The command for multiple linear regression analysis in R Studio can be written as follows:

model <- lm(Economic_Growth ~ Inflation_Rate + Unemployment_Rate, data = data)

summary(model)

After entering the command correctly and pressing enter or clicking β€˜Run,’ an output will appear as shown below:

The next step is to perform a residual normality test. In this article, I will demonstrate the Shapiro-Wilk test for residual normality.

To perform the Shapiro-Wilk test in R Studio, use the following command:

shapiro.test(residuals(model))

An output will appear containing the W value and p-value from the Shapiro-Wilk test. The detailed output is as follows:

Shapiro-Wilk normality test

data:Β  residuals(model) W = 0.96792, p-value = 0.4839

Based on the analysis results, the Shapiro-Wilk test value (W) is 0.96792, with a p-value of 0.4839, indicating that the p-value > 0.05. Since the p-value is greater than 0.05, we conclude that the null hypothesis is accepted, meaning the residuals are normally distributed.

Thus, based on the analysis results, the linear regression equation we created meets the assumption required for OLS linear regression, namely that the residuals are normally distributed.

This concludes the article I can provide at this time. I hope it is helpful for those using R Studio for linear regression analysis. Stay tuned for updates from Kanda Data next week!