Researchers often use linear regression analysis to analyze associative relationships between variables. Multiple linear regression is an analysis that researchers often use because it can analyze the effect of more than two independent variables on the dependent variable.
Various data analysis tools can be used to analyze multiple linear regression. On this occasion, I will explore how to analyze and interpret multiple linear regression results using Excel.
As we already know, Excel is very familiar to use to support daily work. Excel can be used to process and present data so that it can be more informative.
But did you know that Excel can also analyze inferential statistics, such as multiple linear regression? Therefore, on this occasion, I will write about how to analyze multiple linear regression using Excel and how to interpret the results.
How to use excel functions for data analysis
Excel has multiple functions to analyze data descriptively or use inferential statistics. Previously, researchers used Excel more often to analyze research results using descriptive statistics.
Excel can also be used to analyze inferential statistics. Several analysis options are provided by Excel, for example, regression analysis, correlation analysis, different tests, and others. To use the Excel data analysis menu, you must first activate the “data analysis toolpak” menu.
If you have activated the data analysis menu in Excel, you will be able to find it in the upper right corner of the Excel worksheet. When you click on the menu, an analysis data window containing various analysis tools provided by Excel will appear.
For those who haven’t activated the data analysis toolpak menu, you can activate it following the tutorial I wrote with the title: “How to Activate and Load the Data Analysis Toolpak in Excel“. A detailed data analysis menu in Excel can be seen in the image below:
Examples of mini research for exercise materials
Multiple linear regression analysis using Excel and how to interpret the results can be more easily understood if accompanied by exercise. I have prepared material for multiple linear regression analysis exercises with two independent variables.
I use quarterly time series data from the first quarter of 2017 to the third quarter of 2020. This mini-research aims to determine the effect of operational costs and ad frequency on product sales. The data that has been collected by researchers can be seen in the table below:
Before carrying out multiple linear regression analysis, it is necessary to specify the equation or model specification first. For the specification of equations, researchers can use theoretical references and the results of previous empirical studies.
Based on the theory and empirical experience, researchers can decide whether to make equations that previously existed, retest existing theories, or even get new theories.
Based on Mini research as an exercise material, on this occasion, the specifications of the equation used are:
Y = b0 + b1X1 + b2X2 + e
Description:
Y = Product sales (units)
X1 = Advertising cost (USD)
X2 = Ad frequency (times)
b0 = Intercepts
b1, b2 = Estimated coefficients
e = Error terms
Stages of analysis using Data Analysis in Excel
The stages of multiple linear regression analysis using Excel are as easy as using other data analysis tools such as SPSS, SAS, STATA, and others. To use data analysis in Excel, you need to click on the data analysis menu in the upper right corner of your Excel worksheet.
After you click on the data analysis menu, the analysis options provided by Excel will appear. Next, you need to select “regression” from the various analysis tools available.
After you click regression, then the regression window will appear. Next, you need to input all the data to be analyzed. You can input all data in variable y at once with its label.
In the same way, you can also input all the data in variable X1 and variable X2 at the same time with their labels. Because you have entered it along with the label, you need to activate the label. Furthermore, for the confidence level, I set it at 95%. I assume the maximum alpha limit is 5% in the mini-research above.
Furthermore, the analysis results can be displayed in three options: on the same sheet, on a different sheet or in a new Excel worksheet file. Here I will save the results of the analysis on the same sheet. In more detail, the stages that I have described above can be seen in the image below:
Output Interpretation from Data Analysis Excel
After you have completed all stages of the analysis and you click ok, the output data analysis using Excel will appear. The output of Excel analysis will be exactly the same as the output using other data analysis tools in Excel.
For the multiple linear regression analysis I have done above, I aim only to present the primary data analysis results to test the statistical research hypothesis.
There are three points that I will discuss in the interpretation of the results of the analysis using multiple linear regression in Excel, namely the coefficient of determination. F-statistics and t-statistics. I will discuss each in more detail below.
Coefficient of determination (R squared)
The coefficient of determination can be used to show the goodness of fit of a model. The value of the coefficient of determination is in the numbers 0 to 1. The value of the coefficient of determination that is close to 1 indicates that the model is getting better.
Conversely, the closer to 0 the coefficient of determination is, the less good the ability of the independent variable to explain the dependent variable is. The results of the analysis of the coefficient of determination using data analysis in Excel can be seen in the image below:
Based on the picture above, the value of the coefficient of determination, or R Squared, was 0.9284954. The coefficient of determination shows that the value is close to 1.
The value of the coefficient of determination can be interpreted that the variance of the operational cost and ad frequency variables of 92.85% can explain the variance of the sales product variable. The rest is explained by other variables not included in the equation.
The magnitude of the coefficient of determination in this equation is because the data used is time series. As for the cross-section data, the coefficient of determination is generally smaller than the time series data.
Interpretation of F-statistics
The results of the analysis using Excel, which I interpret next, are F statistics. This value is often associated with simultaneous hypothesis testing of a regression equation.
As an example, a statistical hypothesis is compiled, namely:
Ho = Operational cost and ad frequency simultaneously have no significant effect on sales production.
Ha = Operational cost and ad frequency simultaneously significantly affect sales production.
Based on these statistical hypotheses, F statistics can be used for hypothesis testing. Statistical F values based on the results of analysis using data analysis in Excel can be seen in the image below:
Based on the picture above, the statistical F value was 77.90498271. To test the hypothesis, we can compare it with the F table and look at the P value.
Based on the P value, it can be seen that the value is less than 5%. Therefore based on the analysis results, it can be concluded that the null hypothesis is rejected.
Because it rejects the null hypothesis, it accepts the alternative hypothesis. Thus it can be concluded that operational cost and ad frequency simultaneously significantly affect product sales.
Interpretation of t-statistics
The third interpretation of the analysis results using Excel is to look at the statistical T value. Based on the specifications of the multiple linear regression equation using two independent variables, it means that here we need to look at the t-statistic values of the operational cost and ad frequency variables.
To make it easier to test the statistical hypothesis, we need to develop the hypothesis first as follows:
Ho = Operational cost partially has no significant effect on sales product sales
Ha = Operational cost partially has a significant effect on product sales.
Ho = Ad frequency partially has no significant effect on product sales
Ha = Ad frequency partially has a significant effect on product sales.
Based on the hypothesis that we have compiled, here I will use the P value to test the statistical hypothesis. The results of t-statistics can be seen in the image below:
It can be seen that the t statistics value is 4.261243 with a p-value less than 5% to test the null hypothesis for the operational cost variable. It can be concluded that because the p-value is less than 5%, the null hypothesis is rejected.
Because it rejects the null hypothesis, so we accept the alternative hypothesis. Therefore, it can be concluded that operational costs partially have a significant effect on product sales.
Furthermore, for the second variable, ad frequency, it can be seen that the t-statistic value is 2.547969 with a p-value smaller than 5%. Therefore, it can be concluded that the null hypothesis is rejected or the alternative hypothesis is accepted. Thus it can be concluded that ad frequency partially has a significant effect on product sales.
It is an article that I can write on this occasion. Hopefully, this article can be helpful for all of you. Please wait for the next article update the following week. Thank you!