Thursday, July 25, 2024
HomeStatisticsHow to use data transformation to address issues with non-normally distributed data

# How to use data transformation to address issues with non-normally distributed data

Some quantitative analyzes of parametric statistics require the assumption of normally distributed data. For example, when researchers want to test differences in paired samples using the t-test, researchers need to test the normality of the data.

Besides that, in other examples, it was also found that when the researcher carried out a linear regression analysis with the Ordinary Least Square method, one of the assumptions was that the residuals were normally distributed. Therefore, it is essential for researchers to understand the normality test.

In several examples of research case studies, it may be found that the data are not normally distributed. Meanwhile, following the research objectives, the assumptions required are normally distributed data.

One method researchers can use to deal with data that is not normally distributed is to transform the data. The transformation was carried out for all data used in the study.

On this occasion, Kanda Data will write a tutorial on how to use data transformation to address issues with non-normally distributed data. For a tutorial on how to transform data into natural logarithm (Ln) form, you can read the previous article entitled: “How to transform data into natural logarithm (Ln) in Excel“.

## Case Study Example

An example of a case study used in this article is the difference in rice production per 1000 m2 in an area. The assumption required for the paired sample t-test is that the data is normally distributed.

For example, researchers have collected data on rice production from 17 farmers after introducing a new technology for six months. The data that researchers have collected can be seen in the table below:

Based on the data collected above, to carry out a paired sample t-test, the researcher must carry out a normality test. Various ways can be done to perform the normality test. Here I will use the Shapiro-Wilk test to test the normality of the data using SPSS.

## Normality test tutorial using SPSS

Researchers are required to input data first in the data view of the SPSS application to carry out the Shapiro-Wilk normality test using SPSS. Researchers can input manually one by one into SPSS or can copy and paste data that has been input into Excel. Furthermore, researchers can set variables in the View Windows variable in SPSS.

The steps for carrying out the normality test using SPSS, first, the researcher clicks “Analyze”, then clicks “descriptive statistics”. Based on the choices from the existing descriptive statistics, please select and click “Explore” as shown in the image below:

Next, an “Explore” window will appear in SPSS. In the next step, researchers need to move the tested variable, paddy production, to the dependent list box on the right. Furthermore, to bring up the results of the normality test, the researcher needs to click “Plots”, which in detail can be seen in the image below:

The next step that needs to be done by researchers is to ensure that the normality plots with test after being activated. If this option has not been activated, the researcher must enable “Normality plots with test”. Next, to bring up the SPSS results, please click “Continue” which in detail can be seen in the image below:

Then the normality test output using SPSS will appear. The normality test results in SPSS will appear in the test of normality table, which can be seen in the image below:

Based on the picture above, the Shapiro-Wilk test showed the sig value is 0.040. This value indicates that the P value is less than 0.05. It can be concluded that the data is not normally distributed.

Following the research objectives, which will test the differences using the paired sample t-test, the researcher needs to fulfill the assumption that the data is normally distributed. I will give an example of transforming data using natural logarithms and then doing a normality test again.

## How to transform the natural logarithm (Ln)

Researchers need to know that several transformations can be used according to the characteristics of the existing data. Following the characteristics of the data on the variable rice production, I will give an example using the natural logarithm transformation.

Natural logarithmic transformations can easily be done using SPSS. However, transformation using Excel can also be used by researchers. On this occasion, I will provide a data transformation tutorial using Excel.

The method used by researchers for natural logarithmic transformations only requires typing the formula =LN(…). In detail, how to transform natural logarithms using Excel can be seen in the image below:

Based on the picture above, it is clear that the natural logarithmic transformation using Excel is quite simple and can be done easily. The next step is that the researcher can copy and paste the results of the natural logarithmic transformation in the SPSS application to do the Shapiro-Wilk normality test again.

## Normality test using data that has been transformed

The method was the same as what I wrote in the previous paragraph, To carry out the normality test using transformed data. The difference is that variable moved into the dependent list box is the data that has been transformed. These stages in detail can be seen in the image below:

Furthermore, following the same steps in the paragraph above, the normality test SPSS output will appear. The results of the Shapiro-Wilk normality test using the transformed data can be seen in the image below:

Based on the picture above, it can be seen that based on the Shapiro-Wilk test, the sig value is 0.057. This value indicates that the p-value is greater than 0.05. based on these results, it can be concluded that the data is normally distributed.

Based on the tutorial on this occasion, it is clear that in certain data cases, data transformation can be an alternative in dealing with data that is not normally distributed. It is an article that Kanda Data can write on this occasion. Kanda Data hope it is helpful and provides new insight.

RELATED ARTICLES