Linear regression analysis using the Ordinary Least Squares (OLS) method is the most commonly used technique for examining the influence of one variable on another. There are certain assumptions that need to be met when employing linear regression. Assumption testing is necessary to ensure consistent and unbiased estimation results.
In linear regression analysis, variables typically employ interval and ratio scales (parametric measurement). Can we include non-parametric variables in the regression equation?
This is the topic we will discuss deeper in this article. Furthermore, we will explore how to interpret the estimation results of non-parametric variables. Do their interpretations align with other independent variables? Let’s discuss it in more detail in the following paragraph.
Definition of dummy variables
A dummy variable is a non-parametric variable that can be included in the linear regression equation using the ordinary least squares (OLS) method. In linear regression analysis, we can use dummy variables as part of the independent variables.
Dummy variables are categorized as non-parametric variables because they are typically measured in two categories, commonly referred to as binary dummies. However, there are also dummy variables with more than two categories.
This article will focus on the interpretation of dummy variables for two categories, known as binary dummies. The measurement scale for dummy variables utilizes a nominal scale.
In previous articles, Kanda Data has discussed measurement scales several times. In statistics, measurement scales can be divided into four types: nominal scale, ordinal scale, interval scale, and ratio scale.
In a nominal scale, variables are categorized without any levels or hierarchy. Examples: rural and urban areas, male and female genders, before and after a policy, and so on.
Scoring method for dummy variables
As mentioned in the previous paragraph, the focus of this article is on binary dummies. Therefore, the scoring technique for binary dummy variables involves two categories of scores.
The scoring technique for dummy variables employs scores of 1 and 0. How are these scores assigned? Which category receives a score of 1, and which category receives a score of 0?
To provide a deeper understanding, let’s consider a researcher examining the impact of the COVID-19 pandemic on the profitability of frozen chicken business. The COVID-19 pandemic can be represented as a dummy variable, measuring the influence of the pre-pandemic and post-pandemic periods on the profit of the frozen chicken business.
For the scoring technique, a score of 1 is assigned to the condition hypothesized to have an effect. In this research, for instance, it is hypothesized that the COVID-19 pandemic affects an increase in sales revenue for frozen chicken, thus predicting an increase in profitability for the business.
Therefore, after the COVID-19 pandemic occurs, it is assigned a score of 1. On the other hand, the period before the COVID-19 pandemic is assigned a score of 0. After scoring is performed, the data is analyzed along with other independent variables.
What about the assumptions of OLS linear regression when using dummy variables?
Dummy variables measured on a nominal scale, when used as independent variables on their own, would be difficult to meet the assumptions required for OLS linear regression. Therefore, these dummy variables are included in the equation together with other independent variables.
For example, a researcher specifies an equation consisting of 1 dependent variable and 3 independent variables. After adding a dummy variable, the equation becomes 1 dependent variable and 4 independent variables (3 independent variables and 1 dummy variable).
Although we include dummy variables in the regression equation, we still need to test the assumptions to meet the required criteria. The assumptions include the normal distribution of residuals in the regression equation.
Another assumption is the absence of strong correlation among independent variables (non-multicollinearity). Furthermore, the constancy of residual variance (homoscedasticity) is necessary. These assumption tests aim to obtain the best linear unbiased estimator.
Interpreting the estimation coefficient of a dummy variable
Is the interpretation of the estimation coefficient for a dummy variable the same or different from other independent variables? The interpretation of a dummy variable is slightly different from other independent variables, especially when interpreting the partial effect of independent variables on the dependent variable.
For example, let’s say the estimation result yields a coefficient of 4.7 for the dummy variable. Based on this estimated coefficient of 4.7, it can be interpreted that the profitability of the frozen chicken business after the pandemic is 4.7 times higher compared to before the pandemic.
Based on this interpretation example, we can see the distinction in how we interpret the estimation results of a dummy variable compared to other independent variables. For the interpretation of other independent variables, you can refer to previous articles by Kanda Data.
Conclusion
Dummy variables are non-parametric variables measured on a nominal scale. Binary dummy variables are commonly used in linear regression analysis. Although we include dummy variables, we must also ensure that all the required assumptions of linear regression are met.
The interpretation of estimation results for dummy variables differs slightly from other independent variables. With that, we conclude this article. Hopefully, it has been beneficial in expanding our knowledge. Stay tuned for the upcoming articles from Kanda Data in the following week.