Multiple linear regression is a statistical analysis technique used to model the relationship between one dependent variable and two or more independent variables. The multiple linear regression model is used to predict the value of the dependent variable based on the estimated values of the independent variables.
The general equation of multiple linear regression is:
Y = bo+b1X1+b2X2+…+bnXn+e
Where:
Y is the dependent variable
X1, X2, …, Xn are the independent variables
bo is the intercept
b1, b2, …,bn are the regression coefficients
e is the error term
To obtain the Best Linear Unbiased Estimator (BLUE), it is necessary to ensure that certain assumptions are met. In this article, Kanda Data will discuss the tests for these assumptions in multiple linear regression on time series data.
Time series data is data collected or observed at regular time intervals. Examples include daily stock prices, monthly sales data, or daily temperature data. Time series data have specific characteristics such as trends, seasonality, and cycles that must be considered in the analysis. Let’s delve deeper into the required assumption tests.
Assumption of Normally Distributed Residuals (Data Normality)
The normality assumption states that the distribution of residuals in the regression model should follow a normal distribution. This normality is important for the validity of statistical inference, such as hypothesis testing and confidence interval construction.
One way to test the normality of residuals is by using statistical tests such as the Kolmogorov-Smirnov test and the Shapiro-Wilk test. If the test results show a p-value greater than the significance level (e.g., 0.05), we accept the null hypothesis that indicates the residuals are normally distributed.
Assumption of Constant Variance of Residuals (Homoscedasticity)
Homoscedasticity means that the variance of residuals is constant at every level of the independent variables’ predictions. If the variance of residuals is not constant, it is called heteroscedasticity. A regression model with heteroscedasticity can lead to inefficient regression coefficient estimates.
The Breusch-Pagan test can be used to detect heteroscedasticity. Additionally, plotting residuals against predicted values can also be used. Non-random patterns in the plot indicate heteroscedasticity.
If the Breusch-Pagan test results in a p-value greater than 0.05, we accept the null hypothesis, suggesting that the model has homoscedasticity. Random and patternless residual plots also indicate homoscedasticity.
Assumption of No Strong Correlation Among Independent Variables (No Multicollinearity)
Multicollinearity occurs when there is a high correlation between two or more independent variables. This can cause difficulties in determining the influence of each independent variable on the dependent variable.
A common way to detect multicollinearity is by looking at the Variance Inflation Factor (VIF). A VIF value greater than 10 indicates serious multicollinearity.
If the VIF values for all independent variables are less than 10, it shows that there is no significant multicollinearity. This means we can be confident that the regression coefficients provide reliable estimates of the influence of each independent variable.
Assumption of No Autocorrelation
Autocorrelation is the correlation between residuals at different times in time series data. Autocorrelation can lead to inefficient regression coefficient estimates and inaccurate residual variances.
The Durbin-Watson test is a common method for detecting autocorrelation. The Durbin-Watson value ranges from 0 to 4, with a value around 2 indicating no autocorrelation.
Generally, if the Durbin-Watson value is close to 2, it indicates no autocorrelation in the residuals. Values far from 2 (closer to 0 or 4) indicate potential positive or negative autocorrelation.
Conclusion
Using multiple linear regression on time series data requires meeting several assumptions to ensure that the resulting model is valid and reliable. Tests for normality of residuals, homoscedasticity, no multicollinearity, and no autocorrelation are some of the necessary assumption tests.
By conducting these tests, we can ensure that the results of the regression analysis provide accurate and useful insights for decision-making. Well, this is the article that Kanda Data can write at this time. Stay tuned for updates from Kanda Data in the next opportunity.