2  Basics of R

2.1 Operators and data types

2.1.1 Basic operators

In this section, we will learn about some basic R operators that are used to perform operations on variables. Some most commonly used operators are shown in the table below.

R follows the conventional order (sequence) to solve mathematical operations, abbreviated as BODMAS: Brackets, Orders (exponents), Division, Multiplication, Addition, and Subtraction

2+4+7 # Sum
[1] 13
4-5   # Subtraction
[1] -1
2*3   # Multiplication
[1] 6
1/2   # Division
[1] 0.5
# Order of operation
1/2*3+4-5
[1] 0.5
1/2*(3+4-5)
[1] 1
1/(2*(3+4-5))
[1] 0.25
1/(2*3+4-5) 
[1] 0.2
# Notice how output changes with the placement of operators

# Other operators:
2^3
[1] 8
log(10)
[1] 2.302585
sqrt(4)
[1] 2
pi
[1] 3.141593
# Clear the Environment
rm(list=ls()) # rm is for remove,ls is short for list. The empty parenthesis i.e. () signifies all content. 

2.1.2 Basic data operations

In this section, we will create some vector data and apply built-in operations to examine the properties of a dataset.

# The "is equal to" or "assignment operator in R is "<-" or "=" 

# Generate sample data. Remember "c" comes from for "concatenate". 
data<-c(1,4,2,3,9)    # Try data = c(1,4,2,3,9). Is there any difference in data in both cases?

# rbind combines data by rows, and hence "r"bind
# cbind combines data by columns, and hence "c"bind

# Checking the properties of a dataset. Note: the na.rm argument ignores NA values in the dataset.
data=rbind(1,4,2,3,9) 
dim(data)           # [5,1]: 5 rows, 1 column
[1] 5 1
data[2,1]           # Show the value in row 2, column 1
[1] 4
data[c(2:5),1]      # Show a range of values in column 1
[1] 4 2 3 9
mean(data, na.rm=T) # Mean
[1] 3.8
max(data)           # Maximum
[1] 9
min(data)           # Minimum
[1] 1
sd(data)            # Standard deviation
[1] 3.114482
var(data)           # Variance
     [,1]
[1,]  9.7
summary(data) 
       V1     
 Min.   :1.0  
 1st Qu.:2.0  
 Median :3.0  
 Mean   :3.8  
 3rd Qu.:4.0  
 Max.   :9.0  
str(data)        # Prints structure of data
 num [1:5, 1] 1 4 2 3 9
head(data)       # Returns the 1st 6 items in the object
     [,1]
[1,]    1
[2,]    4
[3,]    2
[4,]    3
[5,]    9
head(data, 2)    # Print first 2
     [,1]
[1,]    1
[2,]    4
tail(data, 2)    # Print last 2
     [,1]
[4,]    3
[5,]    9
# Do the same, but with "c()" instead of "rbind"
data=c(1,4,2,3,9) 
dim(data)        # Note: dim is NULL
NULL
length(data)     # Length of a dataset is the number of variables (columns)
[1] 5
data[2]          # This should give you 4 
[1] 4
# Other operators work in the same way
mean(data)       # Mean
[1] 3.8
max(data)        # Maximum
[1] 9
min(data)        # Minimum
[1] 1
sd(data)         # Standard deviation
[1] 3.114482
var(data)        # Variance
[1] 9.7
# Text data
data=c("LSU","SPESS","AgCenter","Tigers") 
data             # View
[1] "LSU"      "SPESS"    "AgCenter" "Tigers"  
data[1]
[1] "LSU"
# Mixed data
data=c(1,"LSU",10,"AgCenter") # All data is treated as text if one value is text
data[3]                       # Note how output is in quotes i.e. "10"
[1] "10"

For help with a function in R, just type ? followed by the function to display information in the help menu. Try pasting ?sd in the console.

2.1.3 Data types

In R, data is stored as an “array”, which can be 1-dimensional or 2-dimensional. A 1-D array is called a “vector” and a 2-D array is a “matrix”. A table in R is called a “data frame” and a “list” is a container to hold a variety of data types. In this section, we will learn how to create matrices, lists and data frames in R.

# Lets make a random matrix
test_mat = matrix( c(2, 4, 3, 1, 5, 7), # The data elements 
  nrow=2,         # Number of rows 
  ncol=3,         # Number of columns 
  byrow = TRUE)   # Fill matrix by rows 

test_mat = matrix( c(2, 4, 3, 1, 5, 7),nrow=2,ncol=3,byrow = TRUE) # Same result 
test_mat
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    1    5    7
test_mat[,2]      # Display all rows, and second column
[1] 4 5
test_mat[2,]      # Display second row, all columns
[1] 1 5 7
# Types of datasets
out = as.matrix(test_mat)
out               # This is a matrix
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    1    5    7
out = as.array(test_mat)
out               # This is also a matrix
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    1    5    7
out = as.vector(test_mat)
out               # This is just a vector
[1] 2 1 4 5 3 7
# Data frame and list
data1=runif(50,20,30) # Create 50 random numbers between 20 and 30  
data2=runif(50,0,10)  # Create 50 random numbers between 0 and 10  

# Lists
out = list()        # Create and empty list
out[[1]] = data1    # Notice the brackets "[[ ]]" instead of "[ ]"
out[[2]] = data2
out[[1]]          # Contains data1 at this location
 [1] 24.12522 21.71949 27.04566 25.90448 25.81391 23.45744 27.70215 21.61938
 [9] 21.59878 21.77457 27.05394 27.58472 23.01064 26.08787 29.75952 22.14624
[17] 20.43209 22.90049 26.15541 22.39551 21.34183 22.70940 29.62503 25.85931
[25] 20.04430 22.10472 22.34179 24.61477 23.66253 24.25012 29.58889 20.66558
[33] 20.13019 28.02615 20.25990 24.53998 21.71251 20.22447 22.28056 22.09891
[41] 22.03546 27.93162 24.46315 24.84276 20.14157 25.83789 20.34895 20.18546
[49] 21.05511 24.62577
# Data frame
out=data.frame(x=data1, y=data2)

# Let's see how it looks!
plot(out$x, out$y)

plot(out[,1])

For a data frame, the dollar “$” sign invokes the variable selection. Imagine how one would receive merchandise in a store if you give $ to the cashier. Data frame will list out the variable names for you of you when you show it some $.

2.2 Plotting with base R

If you need to quickly visualize your data, base R has some functions that will help you do this in a pinch. In this section we’ll look at some basics of visualizing univariate and multivariate data.

2.2.1 Overview

# Create 50 random numbers between 0 and 100  
data=runif(50, 0, 100)   #runif stands for random numbers from a uniform distribution

# Let's plot the data
plot(data)            # The "plot" function initializes the plot.

plot(data, type="l")  # The "type" argument changes the plot type. "l" calls up a line plot

plot(data, type="b")  # Buffered points joined by lines

# Try options type = "o" and type = "c" as well.

# We can also quickly visualize boxplots, histograms, and density plots using the same procedure
boxplot(data)        # Box-and-whisker plot

hist(data)           # Histogram points

plot(density(data))  # Plot with density distribution 

2.2.2 Plotting univariate data

Let’s dig deeper into the plot function. Here, we will look at how to adjust the colors, shapes, and sizes for markers, axis labels and titles, and the plot title.

# Line plots
plot(data,type="o", col="red",
     xlab="x-axis title",ylab ="y-axis title", 
     main="My plot", # Name of axis labels and title
     cex.axis=2, cex.main=2,cex.lab=2,            # Size of axes, title and label
     pch=23,       # Change marker style
     bg="red",     # Change color of markers
     lty=5,        # Change line style
     lwd=2         # Selecting line width
) 
# Adding legend
legend(1, 100, legend=c("Data 1"),
       col=c("red"), lty=2, cex=1.2)

# Histograms
hist(data,col="red",
     xlab="Number",ylab ="Value", main="My plot", # Name of axis labels and title
     border="blue"
) 

# Try adjusting the parameters:
# hist(data,col="red",
#      xlab="Number",ylab ="Value", main="My plot", # Name of axis labels and title
#      cex.axis=2, cex.main=2,cex.lab=2,            # Size of axes, title and label
#      border="blue", 
#      xlim=c(0,100), # Control the limits of the x-axis
#      las=0,      # Try different values of las: 0,1,2,3 to rotate labels
#      breaks=5    # Try using 5,20,50, 100
# ) # Using more options and controls

2.2.3 Plotting multivariate data

Here, we introduce you to data frames: equivalent of tables in R. A data frame is a table with a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

plot_data=data.frame(x=runif(50,0,10), 
                     y=runif(50,20,30), 
                     z=runif(50,30,40)) 

plot(plot_data$x, plot_data$y) # Scatter plot of x and y data

# Mandatory beautification
plot(plot_data$x,plot_data$y, xlab="Data X", ylab="Data Y", main="X vs Y plot",
     col="darkred",pch=20,cex=1.5) # Scatter plot of x and y data

# Multiple lines on one axis
matplot(plot_data, type = c("b"),pch=16,col = 1:4) 

matplot(plot_data, type = c("b","l","o"),pch=16,col = 1:4) # Try this now. Any difference? 
legend("topleft", legend = 1:4, col=1:4, pch=1)            # Add legend to a top left
legend("top", legend = 1:4, col=1:4, pch=1)                # Add legend to at top center
legend("bottomright", legend = 1:4, col=1:4, pch=1)        # Add legend at the bottom right

2.2.4 Time series data

Working with time series data can be tricky at first, but here’s a quick look at how to quickly generate a time series using the as.Date function.

date=seq(as.Date('2011-01-01'),as.Date('2011-01-31'),by = 1) # Generate a sequence 31 days
data=runif(31,0,10)                 # Generate 31 random values between 0 and 10
df=data.frame(Date=date,Value=data) # Combine the data in a data frame
plot(df,type="o")

2.2.5 Combining plots

You can built plots that contain subplots. Using base R, we call start by using the “par” function and then plot as we saw before.

par(mfrow=c(2,2)) # Call a plot with 4 quadrants

# Plot 1
matplot(plot_data, type = c("b"),pch=16,col = 1:4) 

# Plot 2
plot(plot_data$x,plot_data$y) 

# Plot 3
hist(data,col="red",
     xlab="Number",ylab ="Value", main="My plot", 
     border="blue") 

# Plot4
plot(data,type="o", col="red",
     xlab="Number",ylab ="Value", main="My plot",
     cex.axis=2, cex.main=2,cex.lab=2, 
     pch=23,   
     bg="red", 
     lty=5, 
     lwd=2 
) 

# Alternatively, we can call up a plot using a matrix
matrix(c(1,1,2,3), 2, 2, byrow = TRUE) # Plot 1 is plotted for first two spots, followed by plot 2 and 3 
     [,1] [,2]
[1,]    1    1
[2,]    2    3
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE)) # Fixes a layout of the plots we want to make

 # Plot 1
matplot(plot_data, type = c("b"),pch=16,col = 1:4)

# Plot2
plot(plot_data$x,plot_data$y) 

# Plot 3
hist(data,col="red",
     xlab="Number",ylab ="Value", main="My plot",
     border="blue"
)

2.2.6 Saving figures to disk

Plots can be saved as image files or a PDF. This is done by specifying the output file type, its size and resolution, then calling the plot.

png("awesome_plot.png", width=4, height=4, units="in", res=400) 
#Tells R we will plot image in png of given specification

matplot(plot_data, type = c("b","l","o"),pch=16,col = 1:4)  
legend("topleft", legend = 1:4, col=1:4, pch=1)

dev.off() # Very important: this sends the image to disc
png 
  2 
# Keep pressing till you get the following: 
# Error in dev.off() : cannot shut down device 1 (the null device) 
# This ensures that we are no longer plotting.

# It looks like what everything we just plotted was squeezed together to tightly. Let's change the size.
png("awesome_plot.png", width=6, height=4, units="in", res=400)  #note change in dimension
#Tells R we will plot image in png of given specification

matplot(plot_data, type = c("b","l","o"),pch=16,col = 1:3)  
legend("topleft", legend = 1:3, col=1:3, pch=16)

dev.off() 
png 
  2 


Some useful resources

If you want to plot something a certain way and don’t know how to do it, the chances are that someone has asked that question before. Try a Google search for what your are trying to do and check out some of the forums. There is TONS of material online. Here are some additional resources:

  1. The R Graph Gallery: https://www.r-graph-gallery.com/

  2. Graphical parameters: https://www.statmethods.net/advgraphs/parameters.html

  3. Plotting in R: https://www.harding.edu/fmccown/r/

  4. Histogram: https://www.r-bloggers.com/how-to-make-a-histogram-with-basic-r/

2.3 Plotting with ggplot2

2.3.1 Import libraries and create sample dataset

For this section, we will use the ggplot2, gridExtra, utils, and tidyr packages. gridExtra and cowplot are used to combine ggplot objects into one plot and utils and tidyr are useful for manipulating and reshaping the data. We will also install some packages here that will be required for the later sections. You will find more information in the sections to follow.

###############################################################
#~~~ Load required libraries
lib_names=c("ggplot2","gridExtra","utils","tidyr","cowplot", "RColorBrewer")

# If you see a prompt: Do you want to restart R prior to installing: Select **No**. 

# Install all necessary packages (Run once)
# invisible(suppressMessages
#           (suppressWarnings
#             (lapply
#               (lib_names,install.packages,repos="http://cran.r-project.org",
#                 character.only = T))))

# Load necessary packages
invisible(suppressMessages
          (suppressWarnings
            (lapply
              (lib_names,library, character.only = T))))

In more day-to-day use, you will see yourself using a simpler version of these commands, such as, if you were to install the “ggplot2”,“gridExtra” libraries, you will type:

# To install the package. Install only once

install.packages("ggplot2")

# To initialize the package. Invoke every time a new session begins.

library(ggplot2)

Similarly, again for gridExtra ,

install.packages("gridExtra")

library(gridExtra)

For this exercise, let us generate a sample dataset.

###############################################################
#~~~ Generate a dataset containing random numbers within specified ranges
Year = seq(1913,2001,1)
Jan = runif(89, -18.4, -3.2)
Feb = runif(89, -19.4, -1.2)
Mar = runif(89, -14, -1.8)
January = runif(89, 1, 86)
dat = data.frame(Year, Jan, Feb, Mar, January)

2.3.2 Basics of ggplot

Whereas base R has an “ink on paper” plotting paradigm, ggplot has a “grammar of graphics” paradigm that packages together a variety plotting functions. With ggplot, you assign the result of a function to an object name and then modify it by adding additional functions. Think of it as adding layers using pre-designed functions rather than having to build those functions yourself, as you would have to do with base R.

l1 = ggplot(data=dat, aes(x = Year, y = Jan, color = "blue")) + # Tell which data to plot
  geom_line() +      # Add a line
  geom_point() +     # Add a points
  xlab("Year") +     # Add labels to the axes
  ylab("Value")

# Or, they can be specified for any individual geometry
l1 + geom_line(linetype = "solid", color="Blue")  # Add a solid line

l1 + geom_line(aes(x = Year, y = January)) # Add a different data set

# There are tons of other built-in color scales and themes, such as scale_color_grey(), scale_color_brewer(), theme_classic(), theme_minimal(), and theme_dark()

# OR, CREATE YOUR OWN THEME! You can group themes together in one list
theme1 = theme(
  legend.position = "none",
  panel.background = element_blank(),
  plot.title = element_text(hjust = 0.5),
  axis.line = element_line(color = "black"),
  axis.text.y   = element_text(size = 11),
  axis.text.x   = element_text(size = 11),
  axis.title.y  = element_text(size = 11),
  axis.title.x  = element_text(size = 11),
  panel.border = element_rect(
    colour = "black",
    fill = NA,
    size = 0.5
  )
)

2.3.3 Multivariate plots

For multivariate data, ggplot takes the data in the form of groups. This means that each data row should be identifiable to a group. To get the most out of ggplot, we will need to reshape our dataset.

library(tidyr)

# There are two generally data formats: wide (horizontal) and long (vertical). In the horizontal format, every column represents a category of the data. In the vertical format, every row represents an observation for a particular category (think of each row as a data point). Both formats have their comparative advantages. We will now convert the data frame we randomly generated in the previous section to the long format. Here are several ways to do this:

# Using the gather function (the operator %>% is called pipe operator)
dat2 = dat %>% gather(Month, Value, -Year)

# This is equivalent to: 
dat2 = gather(data=dat, Month, Value, -Year)

# Using pivot_longer and selecting all of the columns we want. This function is the best!
dat2 = dat %>% pivot_longer(cols = c(Jan, Feb, Mar), names_to = "Month", values_to = "Value") 

# Or we can choose to exclude the columns we don't want
dat2 = dat %>% pivot_longer(cols = -c(Year,January), names_to = "Month", values_to = "Value") 

head(dat2) # The data is now shaped in the long format
# A tibble: 6 × 4
   Year January Month  Value
  <dbl>   <dbl> <chr>  <dbl>
1  1913    62.7 Jan    -6.57
2  1913    62.7 Feb    -9.99
3  1913    62.7 Mar   -13.3 
4  1914    59.3 Jan    -4.06
5  1914    59.3 Feb   -16.3 
6  1914    59.3 Mar   -10.2 

Line plot

# LINE PLOT
l = ggplot(dat2, aes(x = Year, y = Value, group = Month)) +
  geom_line(aes(color = Month)) +
  geom_point(aes(color = Month))
l

Density plot

# DENSITY PLOT
d = ggplot(dat2, aes(x = Value))
d = d + geom_density(aes(color = Month, fill = Month), alpha=0.4) # Alpha specifies transparency
d

Histogram

# HISTOGRAM
h = ggplot(dat2, aes(x = Value))
h = h + geom_histogram(aes(color = Month, fill = Month), alpha=0.4,
                 fill = "white",
                 position = "dodge")
h

Grid plotting and saving files to disk

There are multiple ways to arrange multiple plots and save images. One method is using grid.arrange() which is found in the gridExtra package. You can then save the file using ggsave, which comes with the ggplot2 library.

# The plots can be displayed together on one image using 
# grid.arrange from the gridExtra package
img = grid.arrange(l, d, h, nrow=3)

# Finally, plots created using ggplot can be saved using ggsave
ggsave("grid_plot_1.png", 
       plot = img, 
       device = "png", 
       width = 6, 
       height = 4, 
       units = c("in"), 
       dpi = 600)

Another approach is to use the plot_grid function, which is in the cowplot library. Notice how the axes are now beautifally aligned.

img2=cowplot::plot_grid(l, d, h, nrow = 3, align = "v") # "v" aligns vertical axes and "h" aligns horizontal axes

ggsave("grid_plot_2.png", 
       plot = img2, 
       device = "png", 
       width = 6, 
       height = 4, 
       units = c("in"), 
       dpi = 600)

2.3.4 Using patchwork for combining ggplots

Patchwork works with simple operators to combine plots. The operator | arranges plots in a row. The plus sign + does the same but it will try to wrap the plots symmetrically as a square whenever possible. The division i.e. /operator layers a plot on top of another.

#install.packages("patchwork")
library(patchwork)

l+d

l/ (h+d)

# Try: l/d/h or (l+d)/h 

# Make your own design for arranging plots (the # sign means empty space): 
design <- "
  111
  2#3
"
l + d + h + plot_layout(design = design)


Some useful resources

The links below offer a treasure trove of examples and sample code to get you started.

2.4 Exercise #1

The U.S. Climate Reference Network (USCRN) is a systematic and sustained network of climate monitoring stations. USCRN has sites across Contiguous U.S. along with some in Alaska, and Hawaii. These stations are instrumented to measure meteorological information such as temperature, precipitation, wind speed, along with other relevant hydrologic variables such as soil moisture at uniform depths (5, 10, 20, 50, 100 cm) at sub-hourly, daily and monthly time scales. Users can access daily data set from all station suing the following link: Index of /pub/data/uscrn/products/daily01 (noaa.gov)

Let us extract sample data from a USCRN site in Lafayette, LA, USA for 2021.

# Yearly data from the sample station
CRNdat = read.csv(url("https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2021/CRND0103-2021-LA_Lafayette_13_SE.txt"), header=FALSE,sep="")

# Data headers
headers=read.csv(url("https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/headers.txt"), header=FALSE,sep="")

# Column names as headers from the text file
colnames(CRNdat)=headers[2,1:ncol(CRNdat)]

# Replace fill values with NA
CRNdat[CRNdat == -9999]=NA
CRNdat[CRNdat == -99]=NA
CRNdat[CRNdat == 999]=NA

# View data sample
library(kableExtra)
dataTable = kbl(head(CRNdat,6),full_width = F)
kable_styling(dataTable,bootstrap_options = c("striped", "hover", "condensed", "responsive"))
WBANNO LST_DATE CRX_VN LONGITUDE LATITUDE T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN T_DAILY_AVG P_DAILY_CALC SOLARAD_DAILY SUR_TEMP_DAILY_TYPE SUR_TEMP_DAILY_MAX SUR_TEMP_DAILY_MIN SUR_TEMP_DAILY_AVG RH_DAILY_MAX RH_DAILY_MIN RH_DAILY_AVG SOIL_MOISTURE_5_DAILY SOIL_MOISTURE_10_DAILY SOIL_MOISTURE_20_DAILY SOIL_MOISTURE_50_DAILY SOIL_MOISTURE_100_DAILY SOIL_TEMP_5_DAILY SOIL_TEMP_10_DAILY SOIL_TEMP_20_DAILY SOIL_TEMP_50_DAILY SOIL_TEMP_100_DAILY
53960 20210101 2.622 -91.87 30.09 14.0 5.2 9.6 11.0 0.0 12.16 C 18.7 4.6 11.6 92.8 49.3 72.0 0.401 0.372 0.380 0.405 0.381 16.2 15.3 15.5 15.7 15.5
53960 20210102 2.622 -91.87 30.09 10.4 1.9 6.1 6.5 0.0 8.95 C 15.3 0.4 7.3 98.6 61.6 78.4 0.396 0.370 0.377 0.406 0.376 14.4 13.3 14.1 15.2 15.0
53960 20210103 2.622 -91.87 30.09 16.3 -0.1 8.1 7.9 0.0 13.93 C 24.3 -0.9 9.5 100.0 42.1 76.3 0.392 0.368 0.374 0.404 0.373 12.8 11.8 12.8 14.4 14.2
53960 20210104 2.622 -91.87 30.09 22.2 3.7 12.9 12.5 0.0 11.56 C 26.4 2.6 13.2 98.9 47.7 80.2 0.389 0.366 0.370 0.400 0.372 13.0 12.2 12.7 14.0 14.0
53960 20210105 2.622 -91.87 30.09 20.7 4.5 12.6 11.4 0.0 14.37 C 28.9 3.1 13.3 100.0 27.7 71.0 0.388 0.364 0.368 0.399 0.369 13.0 12.1 12.7 13.9 14.0
53960 20210106 2.622 -91.87 30.09 19.4 4.9 12.2 12.6 20.7 9.79 C 23.1 3.5 12.8 98.5 54.7 78.9 0.390 0.363 0.369 0.399 0.370 12.8 12.1 12.5 13.7 13.7

Notice the variables provided in the dataset. As an example, we can plots soil moisture data from a depth of 20 cm for this station for our reference:

# Sample plot for soil moisture
x=CRNdat$SOIL_MOISTURE_20_DAILY

# Plot time series and density distribution 
plot(x, type="l", ylab="Soil moisture (v/v)", 
     col="cyan4", lwd=3)
plot(density(na.omit(x)), main=" ", xlab="", 
     col="cyan4", lwd=3)
(a) Time series of SM
(b) SM kernel density
Figure 2.1: Soil moisture values at the selected USCRN station

Exercise:

  1. Taking examples of any two USCRN stations across contrasting hydroclimates, compare and contrast any two recorded variables using time series plots, probability density distribution histograms and scatter plots. Select any year of your liking for the analysis.

  2. Select two seasons for each elected variable and demonstrate the seasonal variability in the records for summer (MAMJJA) and winter (SONDJF) seasons using any two types of multivariate plots.

  3. [EXTRA]: For any chosen station, plot a time-series of soil moisture from all available layers with precipitation added as an inverted secondary axis. For inspiration, see Figure 4 in Cheng, et al. 2021. On change of soil moisture distribution with vegetation reconstruction in Mu Us sandy land of China, with newly designed lysimeter. Frontiers in Plant Science, 12, p.60952 at https://www.frontiersin.org/articles/10.3389/fpls.2021.609529/full