How to Perform an Independent Sample t-Test and Interpret the Results in R Studio

The independent sample t-test in R Studio is used to compare two independent groups. Through this t-test, we can determine whether there is a significant difference between the means of the two groups being compared.

The independent sample t-test is a commonly used method for analyzing differences between two independent and unpaired groups. Kanda Data wrote this article to explain how to perform an independent sample t-test in R Studio along with the interpretation of its results. This article is also accompanied by a case study comparing rice production between two villages to give a better understanding to loyal readers of the Kanda Data website.

Definition and Purpose of the Independent Sample t-Test

The independent sample t-test is used to compare two groups that are independent of each other. The main purpose of using the independent sample t-test is to determine if there is a significant difference between the means of the two groups being compared.

In this article, the example of applying the independent sample t-test is based on research observing rice production in two different villages. Before conducting the independent sample t-test, we need to understand that there are several prerequisite assumptions for using this test.

The first assumption that needs to be met when using the independent sample t-test is that the data from both groups should be normally distributed. Furthermore, as indicated by its name, the two groups of data must be independent of each other.

Additionally, another assumption is that the variances of the data from both groups should be homogeneous. Therefore, we need to perform a test for variance homogeneity. Lastly, the final assumption is that the measurement scale used should be either interval or ratio (parametric).

Case Study and Data Example

To better understand how to perform the independent sample t-test using R Studio, Kanda Data provides a case study as an example. Suppose an agricultural institution wants to determine if there is a significant difference in rice production between two villages, namely Village X and Village Y.

Coincidentally, Village X and Village Y are target areas managed by the agricultural institution. They want to know if the rice production between the two villages is relatively the same or if there is a significant difference.

Next, the agricultural institution randomly selects a sample of 30 farmers from each of Village X and Village Y. This ensures that every member of the population has an equal chance of being selected as a sample.

The samples taken from Village X and Village Y are independent of each other. They took samples from 30 farmers in each village and recorded the rice production of each farmer in tons. The rice production data in Village X and Village Y are as follows:

ObsRice Production in Village X (X)Rice Production in Village Y (Y)
14.05.2
22.55.8
33.75.0
44.35.5
54.15.3
63.45.6
74.25.4
83.85.1
94.15.3
102.96.0
114.35.5
124.05.2
133.55.8
143.75.0
154.35.5
164.15.3
174.45.6
184.25.4
193.85.1
204.15.3
213.36.0
224.35.5
234.05.2
243.25.8
253.75.0
264.35.5
274.15.3
284.45.6
294.25.4
303.85.1
How to Perform a Normality Test in R Studio and Interpret the Results

Before conducting an independent sample t-test, we need to test the prerequisite assumptions, especially ensuring that the data is normally distributed. Therefore, we should test whether the data in both Village X and Village Y are normally distributed. The normality test can be performed using the Shapiro-Wilk test in R Studio.

First, open your R Studio application. If this is your first time performing a normality test in R Studio, install and load the ‘ggpubr’ library for visualization and normality testing by entering the following commands:

install.packages(“ggpubr”)

library(ggpubr)

Next, input the data into R. You can import data from Excel into R or input it directly into R using the following commands:

production_x <- c(4.0, 2.5, 3.7, 4.3, 4.1, 3.4, 4.2, 3.8, 4.1, 2.9, 4.3, 4.0, 3.5, 3.7, 4.3, 4.1, 4.4, 4.2, 3.8, 4.1, 3.3, 4.3, 4.0, 3.2, 3.7, 4.3, 4.1, 4.4, 4.2, 3.8)

production_y <- c(5.2, 5.8, 5.0, 5.5, 5.3, 5.6, 5.4, 5.1, 5.3, 6.0, 5.5, 5.2, 5.8, 5.0, 5.5, 5.3, 5.6, 5.4, 5.1, 5.3, 6.0, 5.5, 5.2, 5.8, 5.0, 5.5, 5.3, 5.6, 5.4, 5.1)

shapiro.test(production_x)

shapiro.test(production_y)

After you press enter, the output of the Shapiro-Wilk test in R Studio will appear as follows:

Shapiro-Wilk normality test

data:  production_x

W = 0.86825, p-value = 0.001532

data:  production_y

W = 0.94889, p-value = 0.1578

Based on the results of the normality test, it is known that the rice production variable in Village X has a W value of 0.86825 and a p-value of 0.001532 (p-value < 0.05). Because the p-value from the Shapiro-Wilk test is less than 0.05, the normality assumption is not met.

Meanwhile, for the rice production variable in Village Y, the W value is 0.94889 with a p-value of 0.1578 (p-value > 0.05). Since the p-value from the Shapiro-Wilk test is greater than 0.05, the normality assumption is met.

Thus, one of the groups does not have a normal distribution, and we may consider using a non-parametric test (Mann-Whitney). However, to provide a learning experience, I will continue using the above case study as a practice example.

How to Perform a Homogeneity Test in R Studio and Interpret the Results

The next step is to ensure that the variances of the two groups in Village X and Village Y are homogeneous. We can perform a variance homogeneity test using the Levene’s test in R Studio.

To conduct a homogeneity test using the Levene’s test in R Studio, enter the following command:

var.test(production_x, production_x)

After pressing enter, the R Studio output will be as follows:

F test to compare two variances

data:  production_x and production_x

F = 1, num df = 29, denom df = 29, p-value = 1

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

 0.4759648 2.1009958

sample estimates:

ratio of variances                 

1

Based on the case study above, the p-value is 1 (p-value > 0.05), so the homogeneity of variance assumption is met, and we can proceed with the independent sample t-test assuming equal variances. If p-value < 0.05, use the setting for unequal variances.

In the example of the analysis results above, a p-value of 1 appears to be perfect. However, if we use real field data, it is generally unlikely to obtain a p-value exactly equal to 1. Once again, the data used in this article is for exercise purposes, so that readers can practice the analysis and learn how to interpret the results.

How to Perform an Independent Sample t-Test and Interpret the Results

After confirming that the data is normally distributed and has homogeneous variance, we can continue with the independent sample t-test in R Studio. Because the variances are homogeneous, use the following command:

t.test(production_x, production_y, var.equal = TRUE)

If your homogeneity test shows different variances, use the command below:

t.test(production_x, production_y, var.equal = FALSE

The R output will provide the following results:

Two Sample t-test

data:  production_x and production_y

t = -15.42, df = 58, p-value < 2.2e-16

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -1.717321 -1.322679

sample estimates:

mean of x mean of y     

3.89      5.41

Based on the independent sample t-test results above, the p-value < 2.2e-16 indicates a significant difference in rice production between Village X and Village Y. This means we have strong evidence to reject the null hypothesis (H₀). In other words, the average production in Village X and Village Y is significantly different.

The t-value of -15.42 indicates a large difference between the average production in Village X and Village Y. The negative value shows that the average production in Village X is lower compared to Village Y.

Conclusion

The independent sample t-test can be used to compare two independent groups. In this example, the test results show that there is a significant difference in rice production between Village X and Village Y.

That concludes Kanda Data’s article that we can share this time. I hope it is useful for all of you. Thank you for reading this article and stay tuned for more updates from Kanda Data next week!