Simple Linear Regression Analysis Using R Studio and How to Interpret It

In the real world, accurate decisions need to be based on a deep understanding of data. One tool for processing and elaborating data is simple linear regression analysis. Simple linear regression analysis allows us to read patterns among scattered data points. A correct understanding of regression analysis gives us the power to make more accurate decisions and minimize uncertainty.

By analyzing the relationship between dependent and independent variables, we can predict future trends and identify factors that most significantly influence outcomes. This article aims to provide a step-by-step tutorial on simple linear regression analysis using R Studio.

The goal of this article is to provide a strong understanding and ensure that you are able to apply simple linear regression analysis in the context of everyday decision-making. In addition, you can also perform simple linear regression analysis independently using R Studio.

Through this article, you are expected to understand the basic concepts of simple linear regression analysis and be able to implement the steps of regression analysis using R Studio. Furthermore, you will also be able to correctly and accurately interpret the results of the analysis in accordance with scientific principles.

Basic Concepts of Simple Linear Regression Analysis

Simple linear regression analysis is used to analyze the linear relationship between two variables. In this context, we have one dependent variable (which we want to predict) and one independent variable (used for prediction). For example, we can observe whether there is a relationship between the price of a product (X) and the number of product sales (Y). As an example, the data used for analysis in the case study in this article can be seen in the table below:

Based on the data collected, the following simple linear regression mathematical formula can be made:

Y = bo + b1X + e

Y = Dependent variable (number of product sales) measured in units

bo = Intercept

b1 = Coefficient estimate of variable X

X = Independent variable (product price) measured in USD

e = Error term

This formula creates a regression line that tries to minimize the sum of the squares of the differences between the actual values and the predicted values.

Tutorial on Simple Linear Regression Analysis in R Studio

In implementing simple linear regression analysis using R Studio, understanding the steps of installation, use of statistical packages, coding, and interpretation of output is crucial for researchers.

Before starting the analysis, researchers must have already installed R Studio on their computers. In addition, for those using R Studio for the first time, installation of some packages for simple linear regression analysis is necessary.

Install the necessary statistical packages using the command install.packages(“package_name”). For simple linear regression analysis, packages like lmtest and car need to be ensured. Make sure you have activated these packages using the command library(package_name).

Next, we need to prepare the dataset used for simple linear regression analysis in R Studio. If you have previously entered data in Ms Excel, then you can import the data into R Studio. If the import of the dataset has been done, then the data display in the case study example above can be seen in the image below:

The next step is to create a command in R Studio. Use the lm() function to create a linear regression model. For example, model <- lm(y ~ x, data = dataset) where y is the dependent variable, x is the independent variable, and dataset is the data we use.

The final step, print a summary of the model using the command summary(model) to see outputs that include regression coefficients, p-value, and several other outputs in simple linear regression analysis in R Studio.

Interpreting Linear Regression Output

Based on the results of simple linear regression analysis using R Studio, the command and analysis output can be seen in the image below:

Based on the image above, the estimated linear regression equation is Y = 436.770 – 20.955X. The value of multiple R-squared: 0.845 indicates the percentage of variation in the dependent variable (Y) that can be explained by the independent variable (X). The adjusted R-squared shows the variation from R-squared adjusted for the number of predictors in the model and the number of samples.

The estimated coefficient of the product price variable (X) of -20.955 indicates the change in Y for each unit change in X. The p-value of the variable X is 0.000169, which is smaller than 0.05, indicating that the null hypothesis is rejected (accepting the alternative hypothesis). Thus, it can be concluded that the product price (X) significantly affects the number of product sales (Y).

The negative sign of the X variable coeffiecient, indicates that the product price has a negative impact on the number of product sales. This can be interpreted as an increase in product price is estimated to decrease the number of product sales. Conversely, a decrease in product price is estimated to increase the number of product sales.

Well, it is the article that Kanda Data can write on this occasion. Hopefully, it is beneficial and adds insight for us regarding the ways of regression analysis and interpreting the results in R Studio. Stay tuned for the next article updates from Kanda Data.

Leave a Comment