Multiple linear regression analysis is a well-known technique frequently used by researchers to analyze the influence of independent variables on dependent variables. The ordinary least squares (OLS) method is one of the most commonly used methods in this analysis.
In OLS regression, there are several assumptions that must be met to ensure that the estimates produced are unbiased and consistent, often referred to as the Best Linear Unbiased Estimator (BLUE).
Based on my experience observing comments on my YouTube channel, many people have asked about the use of non-parametric variables in OLS regression. The common question is whether it is possible to use non-parametric variables in this analysis.
In this article, I will discuss how to integrate non-parametric variables measured on a nominal scale into multiple linear regression, specifically how to add a non-parametric variable into the multiple linear regression equation.
Understanding Dummy Variables
In multiple linear regression analysis, we can use non-parametric variables as independent variables by converting them into dummy variables. Dummy variables are often binary variables measured on a nominal scale.
The response options are usually divided into two categories, such as yes/no, adopter/non-adopter, rural/urban, before policy/after policy, and others. Respondents are asked to respond to questions categorized binarily, where these categories are only used to differentiate, not to provide a level or ranking.
Dummy Variables as Independent Variables
As mentioned earlier, there are several assumptions that must be met in multiple linear regression analysis. Typically, researchers use parametric data (interval/ratio scale) in OLS regression. However, dummy variables measured on a nominal scale can also be included as independent variables in the multiple linear regression equation.
For instance, if we have a multiple linear regression equation with five independent variables affecting the domestic production of commodity ABC (dependent variable), and we want to know the impact of an import tariff increase policy on the domestic production of commodity ABC. In this example, we can create the import policy variable as a dummy variable, adding it to the equation along with the five other independent variables.
Approximately, the equation will be as follows:
Y = b0+b1X1+b2X2+b3X3+b4X4+b5X5+b6D+e
Where D is the dummy variable representing the import tariff.
How to Score Dummy Variables
To score a dummy variable, we use values 1 and 0. For example, if our hypothesis states that an increase in import tariffs will increase domestic production of commodity ABC, then the time period after the tariff increase is given a score of 1, and the period before the tariff increase is given a score of 0. This technique makes it easier to interpret whether the tariff increase truly boosts domestic production.
Using dummy variables allows researchers to incorporate categorical data into multiple linear regression analysis without violating the basic assumptions of the OLS method. By understanding and applying this technique, we can expand our analytical capabilities and gain deeper insights from our data.
I hope this article is useful and adds to the knowledge of those who need it. Stay tuned for the next article update from Kanda Data. Thank you.