KANDA DATA

  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Menu
  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Home/Data Analysis in R/Tutorial on R Studio: Testing Residual Normality in Multiple Linear Regression for Time Series Data

Blog

708 views

Tutorial on R Studio: Testing Residual Normality in Multiple Linear Regression for Time Series Data

By Kanda Data / Date Dec 09.2024 / Category Data Analysis in R

The normality test in multiple linear regression analysis is aimed at detecting whether the residuals are normally distributed. In research using time series data, it is also necessary to perform a normality test to ensure that the required assumptions are met.

Multiple linear regression using the ordinary least square method requires residuals to be normally distributed. Normally distributed residuals are essential to obtain the Best Linear Unbiased Estimator in the analyzed linear regression equation.

In this article, I will provide a tutorial on how to analyze residual normality for time series data using R Studio. As we all know, R Studio can assist researchers in data analysis.

Before diving into the step-by-step residual normality test in R Studio, let’s first understand what residuals are. Residuals are the differences between the actual observed values (Y actual) and the predicted values of those observations (Y predicted).

Example of a Time Series Data Case Study

As mentioned in the title, this tutorial will practice with time series data. First, let’s align our understanding of the definition of time series data. Time series data refers to data collected for a subject at specific time intervals, such as food consumption data for a country collected from 2000 to 2024.

In this article, we use a time series data example from research aimed at determining the impact of inflation rates and unemployment rates on economic growth.

For this study, data were collected for 30 quarterly periods, including variables such as inflation rate, unemployment rate, and economic growth in country ABC. The specification of the multiple linear regression equation for this time series data can be written as follows:

π‘Œ=𝛽0+𝛽1𝑋1+𝛽2𝑋2+…+𝛽𝑛𝑋𝑛+πœ–

Where:

π‘Œ is economic growth (%),

𝑋1 is the inflation rate (%),

𝑋2 is the unemployment rate (%),

𝛽0 is the intercept (constant),

𝛽1 and 𝛽2 are regression coefficients,

πœ– is the error or residual.

To solve this linear regression equation and for practice in this article, please refer to the data collected as shown in the table below:

Tutorial: Residual Normality Test in Multiple Linear Regression for Time Series Data in R Studio

Before performing the residual normality test, we need to first write the command for multiple linear regression analysis. Ensure that the data is correctly input into R Studio before starting the analysis.

Data input can be done manually or by importing data from other file formats, such as Excel. I have previously written a tutorial on how to input data into R Studio.

The command for multiple linear regression analysis in R Studio can be written as follows:

model <- lm(Economic_Growth ~ Inflation_Rate + Unemployment_Rate, data = data)

summary(model)

After entering the command correctly and pressing enter or clicking β€˜Run,’ an output will appear as shown below:

The next step is to perform a residual normality test. In this article, I will demonstrate the Shapiro-Wilk test for residual normality.

To perform the Shapiro-Wilk test in R Studio, use the following command:

shapiro.test(residuals(model))

An output will appear containing the W value and p-value from the Shapiro-Wilk test. The detailed output is as follows:

Shapiro-Wilk normality test

data:Β  residuals(model) W = 0.96792, p-value = 0.4839

Based on the analysis results, the Shapiro-Wilk test value (W) is 0.96792, with a p-value of 0.4839, indicating that the p-value > 0.05. Since the p-value is greater than 0.05, we conclude that the null hypothesis is accepted, meaning the residuals are normally distributed.

Thus, based on the analysis results, the linear regression equation we created meets the assumption required for OLS linear regression, namely that the residuals are normally distributed.

This concludes the article I can provide at this time. I hope it is helpful for those using R Studio for linear regression analysis. Stay tuned for updates from Kanda Data next week!

Tags: Data Analysis with R, Kanda data, multiple linear regression, R programming, R Studio Tutorial, residual normality test, statistics, Time Series Analysis Shapiro-Wilk Test

Related posts

How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness

Date Oct 02.2025

Regression Analysis for Binary Categorical Dependent Variables

Date Sep 27.2025

How to Sort Values from Highest to Lowest in Excel

Date Sep 01.2025

Categories

  • Article Publication
  • Assumptions of Linear Regression
  • Comparison Test
  • Correlation Test
  • Data Analysis in R
  • Econometrics
  • Excel Tutorial for Statistics
  • Multiple Linear Regression
  • Nonparametric Statistics
  • Profit Analysis
  • Regression Tutorial using Excel
  • Research Methodology
  • Simple Linear Regression
  • Statistics

Popular Post

October 2025
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Sep    
  • How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness
  • Regression Analysis for Binary Categorical Dependent Variables
  • How to Sort Values from Highest to Lowest in Excel
  • How to Perform Descriptive Statistics in Excel in Under 1 Minute
  • How to Tabulate Data Using Pivot Table for Your Research Results
Copyright KANDA DATA 2025. All Rights Reserved