Paired sample t-tests, which aim to identify differences between two paired data sets, can be analyzed using R Studio. Through paired sample t-tests, we can determine whether there are significant changes after a certain treatment or program carried out during the research activity.
The paired sample t-test is a commonly used method to compare data before and after treatment/experiments. This article by Kanda Data is written to explain how to perform paired sample t-tests in R Studio using a case study on rice production before and after a training program and how to interpret the results.
Required Assumptions for Paired Sample t-Tests
Paired sample t-tests are used to test whether there is a significant difference between the means of two paired data sets. An example would be measuring a variable before and after an intervention on the same subjects.
The main purpose is to determine whether the intervention conducted in the study has a significant impact on the change in the measured variable. This test can be performed if measurements are taken on the same group or subjects but at two different times or under different conditions.
Before performing a paired sample t-test, we need to understand several assumptions that are required. When using a paired sample t-test, make sure the data is normally distributed. To test whether the data is normally distributed or not, we can perform a normality test on the data.
Another assumption is that the measurement scale should be interval or ratio. Additionally, each pair of observations (before and after) should be independent of other pairs. It is considered paired because we are using the same subjects or respondents for observations before and after the intervention.
Case Study and Data Example
An agricultural organization in Village X conducted a training program for 6 months to increase rice production. They want to determine if the program was effective by comparing rice production before and after the program, measured in tons. Data were collected from 30 farmers. Below is the rice production data before and after the program:
Observations | Rice Production Before the Program (tons) – Pretest | Rice Production After the Program (tons) – Posttest |
1 | 4.5 | 5.2 |
2 | 5.0 | 5.8 |
3 | 4.2 | 5.0 |
4 | 4.8 | 5.5 |
5 | 4.6 | 5.3 |
6 | 4.9 | 5.6 |
7 | 4.7 | 5.4 |
8 | 4.3 | 5.1 |
9 | 4.6 | 5.3 |
10 | 5.2 | 6.0 |
11 | 4.8 | 5.5 |
12 | 4.5 | 5.2 |
13 | 5.0 | 5.8 |
14 | 4.2 | 5.0 |
15 | 4.8 | 5.5 |
16 | 4.6 | 5.3 |
17 | 4.9 | 5.6 |
18 | 4.7 | 5.4 |
19 | 4.3 | 5.1 |
20 | 4.6 | 5.3 |
21 | 5.2 | 6.0 |
22 | 4.8 | 5.5 |
23 | 4.5 | 5.2 |
24 | 5.0 | 5.8 |
25 | 4.2 | 5.0 |
26 | 4.8 | 5.5 |
27 | 4.6 | 5.3 |
28 | 4.9 | 5.6 |
29 | 4.7 | 5.4 |
30 | 4.3 | 5.1 |
How to Perform Normality Tests in R Studio and Interpret the Results
To perform a paired sample t-test, we must ensure that the data is normally distributed. We can test the distribution of the data for both variables being compared. Additionally, we can also test whether the difference between rice production before and after the program is normally distributed.
In this article, Kanda Data will provide a tutorial on how to perform a normality test on the difference between rice production before and after the program. The normality test can be done using the Shapiro-Wilk test in R Studio.
The first step is to open your R Studio application. If this is your first time performing a normality test in R Studio, install and load the ‘ggpubr’ library for visualization and normality testing by typing the following commands:
install.packages(“ggpubr”)
library(ggpubr)
Next, input the data into R. You can import data from Excel to R or input it directly into R as shown in the following commands:
pretest <- c(4.5, 5.0, 4.2, 4.8, 4.6, 4.9, 4.7, 4.3, 4.6, 5.2, 4.8, 4.5, 5.0, 4.2, 4.8, 4.6, 4.9, 4.7, 4.3, 4.6, 5.2, 4.8, 4.5, 5.0, 4.2, 4.8, 4.6, 4.9, 4.7, 4.3)
posttest <- c(5.2, 5.8, 5.0, 5.5, 5.3, 5.6, 5.4, 5.1, 5.3, 6.0, 5.5, 5.2, 5.8, 5.0, 5.5, 5.3, 5.6, 5.4, 5.1, 5.3,6.0, 5.5, 5.2, 5.8, 5.0, 5.5, 5.3, 5.6, 5.4, 5.1)
difference <- posttest – pretest
After pressing ‘Enter’ or clicking ‘Run’, the next step is to perform the Shapiro-Wilk test by typing the following command:
shapiro.test(difference)
R will output the following information:
Shapiro-Wilk normality test
data: difference
W = 0.61191, p-value = 1.023e-07
The results above show that the Shapiro-Wilk test indicates a W value of 0.61191 with a p-value of 1.023e-7 (less than 0.05), which means that the normality assumption is not met. If the data is not normally distributed, one of the required assumptions is not fulfilled. In such cases, we can either perform a data transformation or use a non-parametric test to analyze the differences.
However, for practice purposes, in this article, Kanda Data will proceed with a paired sample t-test for learning purposes. The focus is on understanding the analysis process and its interpretation.
How to Perform Paired Sample t-Tests and Interpret the Results
After performing the normality test, we can proceed with the paired sample t-test in R Studio. To perform the paired sample t-test, type the following command in R Studio:
t.test(pretest, posttest, paired = TRUE)
R will output the following information:
Paired t-test
data: pretest and posttest
t = -82.322, df = 29, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-0.7549685 -0.7183648
sample estimates:
mean difference
-0.7366667
From the t-test results above, we obtained a t-value of -82.322 and a p-value of 2.2e-16 (p-value < 0.05). This indicates that the null hypothesis (Ho) is rejected, and we accept the alternative hypothesis. This means that there is a statistically significant difference between rice production before and after the training program. Thus, we can conclude that the program was effective in increasing rice production.
Conclusion
The paired sample t-test is a useful method for comparing two paired data sets, such as rice production before and after a training program. In this example, the test results show that the training program in Village X was effective in increasing rice production, as the average difference in rice production before and after the program was statistically significant.
This is the article Kanda Data has written for this occasion. Hopefully, it is useful and provides new insights for those in need. Thank you for reading this article. Stay tuned for more updates from Kanda Data in future opportunities!