Vectors in `R`

Vectors are a specific type of object (in the first lab we discussed object types of integer, numeric, character and logical) with dimension $n\times1$ , e.g. a place to store $n$ observations of a single variable. Vector elements are indexed by positive integers. Many mathematical functions in R are “vectorized”, i.e. we can perform operations on every element of a vector at the same time (e.g. add 3 to each element of a vector.) Other functions can aggregate members of the vector (e.g. give the mean of the vector.)

`rep()`

We can create a vector with repeated numbers.

help(rep)

z <- rep(1,5)
z

## [1] 1 1 1 1 1

q <- rep(c(1, 2), 5)
q

##  [1] 1 2 1 2 1 2 1 2 1 2

t <- rep(c(1,2), c(5, 5))
t

##  [1] 1 1 1 1 1 2 2 2 2 2

`seq()`

We can create a vector with a sequence of numbers.

help(seq)

a <- 1:5
a

## [1] 1 2 3 4 5

b <- seq(1,5)
b

## [1] 1 2 3 4 5

c <- seq(5,1)
c

## [1] 5 4 3 2 1

d <- seq(from=0, to=15, by=3)
d

## [1]  0  3  6  9 12 15

e <- seq(0, 15, 3)
e

## [1]  0  3  6  9 12 15

`c()`

To create a vector with specific numbers or objects inside, we need to use the c() function:

f <- (0, 5, 10)  #This doesn't work

f <- c(0, 5, 10) #This works!!!
f

## [1]  0  5 10

`rnorm()`

We can also create a vector from randomly generated numbers. Let’s draw 10 random numbers drawn from the standard normal distribution (mean of 0, standard deviation of 1). Note: you can draw from any normal distribution of your choosing by changing the mean=} andsd=} arguments.

g <- rnorm(10, mean=0, sd=1)
g

##  [1] -1.3271844 -1.4665512  0.4433269 -0.6511719 -2.6024570  1.3078938
##  [7]  1.5704528 -2.1911601 -0.5491848  0.6541069

Sometimes you will want to clear every object you’ve stored in your environment. This is done using the rm() function.

rm(list=ls())

Subsetting Vectors

Sometimes we will need to access only certain elements of a vector. Let’s create a new vector x that we will work with to subset:

x <- seq(60, 70, 1)
x

##  [1] 60 61 62 63 64 65 66 67 68 69 70

Often subsetting in R involves using brackets [ ].

x[1]

## [1] 60

x[2]

## [1] 61

QUESTION: What will the following code do?

x[c(1, 3)]
x[c(2, 3, 6)]
x[c(6, 3, 2)]

Often we will use Boolean logical operators in subsetting.

Some operators:

== : equal to
$\leq$ : less than
$\geq$ : greater than
&, && : and
$\vert$ , $\vert \vert$ : or

x<60

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

x<65

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Note: without using [ ], R simply returns a vector of class logical containing elements of either TRUE or FALSE.

x[x < 65]

## [1] 60 61 62 63 64

x > 60 & x < 65

##  [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

x[x > 60 & x < 65]

## [1] 61 62 63 64

x[x < 60 | x > 65]

## [1] 66 67 68 69 70

x[x == 60]

## [1] 60

Some more helpful vector functions

Finding the length of a vector.

n = length(x)
n

## [1] 11

What would the following code do?

x[1:length(x)]
x[3:length(x)]

x[1:n]
x[3:n]

Is an object a vector?

is.vector(x)

## [1] TRUE

Let’s convert an object to a vector. Note: for every class of object that has an “is” function, there is also an “as” function.

vec.x2<-as.vector(x)
is.vector(vec.x2)

## [1] TRUE

Many of R’s statistics functions take vectors as arguments.

mean(x)

## [1] 65

sd(x)

## [1] 3.316625

Matrices in `R`

There are many ways to use the `matrix()} function to create a matrix.

?matrix
x <- matrix(1:12, nrow=3)
x

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

Alternatively:

y <- matrix(1:12, ncol=4)
y

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

Alternatively, we can coerce a vector into a matrix:

z <- 1:10
matrix.z <- matrix(z, ncol=5)
matrix.z

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

Let’s create a matrix of zeroes and add 1’s to the diagonal.

matrix.zero <- matrix(0, nrow=5, ncol=5)
matrix.zero

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0

diag(matrix.zero) = 1
matrix.zero

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    0    1    0    0    0
## [3,]    0    0    1    0    0
## [4,]    0    0    0    1    0
## [5,]    0    0    0    0    1

Similarly to finding the length of a vector, we often want to find the dimensions of a matrix

dim(x)

## [1] 3 4

Subsetting a matrix

To isolate or look at parts of a matrix we will still use brackets [ ], but now we have two dimensions.

z <- 1:30
matrix.z <- matrix(z, ncol=5)
matrix.z

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    7   13   19   25
## [2,]    2    8   14   20   26
## [3,]    3    9   15   21   27
## [4,]    4   10   16   22   28
## [5,]    5   11   17   23   29
## [6,]    6   12   18   24   30

#Display only the fifth row of matrix.a
matrix.z[5, ]

## [1]  5 11 17 23 29

#Display only the third column
matrix.z[ , 3]

## [1] 13 14 15 16 17 18

#Display the third and fourth columns
matrix.z[ , 3:4]

##      [,1] [,2]
## [1,]   13   19
## [2,]   14   20
## [3,]   15   21
## [4,]   16   22
## [5,]   17   23
## [6,]   18   24

#Display the second and fourth columns
matrix.z[ , c(2,4)]

##      [,1] [,2]
## [1,]    7   19
## [2,]    8   20
## [3,]    9   21
## [4,]   10   22
## [5,]   11   23
## [6,]   12   24

#Display the first and fifth rows
matrix.z[c(1,5), ]

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    7   13   19   25
## [2,]    5   11   17   23   29

#Change the value/s of an element or elements in the matrix
#Change all of column 1 to zeros
matrix.z[ , 1] = 0
matrix.z

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    7   13   19   25
## [2,]    0    8   14   20   26
## [3,]    0    9   15   21   27
## [4,]    0   10   16   22   28
## [5,]    0   11   17   23   29
## [6,]    0   12   18   24   30

#Change all of column 3 to 50
matrix.z[3, ] = 50
matrix.z

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    7   13   19   25
## [2,]    0    8   14   20   26
## [3,]   50   50   50   50   50
## [4,]    0   10   16   22   28
## [5,]    0   11   17   23   29
## [6,]    0   12   18   24   30

#Change row 1, column 4 to 999
matrix.z[1, 4] = 999
matrix.z

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    7   13  999   25
## [2,]    0    8   14   20   26
## [3,]   50   50   50   50   50
## [4,]    0   10   16   22   28
## [5,]    0   11   17   23   29
## [6,]    0   12   18   24   30

We can also create a new matrix by combining columns or rows from a pre-existing matrix. The command cbind combines columns, and the command rbind combines rows.

Note: You need to have the same number of columns to use cbind, and rows to use rbind.

matrix.a <- matrix(1:25, nrow=5)
matrix.a

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25

matrix.b <- matrix(50:74, nrow=5)
matrix.b

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   50   55   60   65   70
## [2,]   51   56   61   66   71
## [3,]   52   57   62   67   72
## [4,]   53   58   63   68   73
## [5,]   54   59   64   69   74

#Combine matrix a and b by column.
matrix.c <- cbind(matrix.a, matrix.b)
matrix.c

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    6   11   16   21   50   55   60   65    70
## [2,]    2    7   12   17   22   51   56   61   66    71
## [3,]    3    8   13   18   23   52   57   62   67    72
## [4,]    4    9   14   19   24   53   58   63   68    73
## [5,]    5   10   15   20   25   54   59   64   69    74

#Combine matrix a and b by row.
matrix.d <- rbind(matrix.a, matrix.b)
matrix.d

##       [,1] [,2] [,3] [,4] [,5]
##  [1,]    1    6   11   16   21
##  [2,]    2    7   12   17   22
##  [3,]    3    8   13   18   23
##  [4,]    4    9   14   19   24
##  [5,]    5   10   15   20   25
##  [6,]   50   55   60   65   70
##  [7,]   51   56   61   66   71
##  [8,]   52   57   62   67   72
##  [9,]   53   58   63   68   73
## [10,]   54   59   64   69   74

#Combine column 1 in matrix a with column 1 of matrix b.
matrix.col1 <- cbind(matrix.a[,c(1)],
                     matrix.b[,c(1)])
matrix.col1

##      [,1] [,2]
## [1,]    1   50
## [2,]    2   51
## [3,]    3   52
## [4,]    4   53
## [5,]    5   54

#Combine row 5 in matrix a with row 3 in matrix b.
matrix.row <- rbind(matrix.a[c(5),], 
                    matrix.b[c(3),])
matrix.row

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    5   10   15   20   25
## [2,]   52   57   62   67   72

Mathematical operations using matrices

First we will go over addition and subtraction. Recall, we can only add and subtract matrices with the same number of dimensions.

matrix.a <- matrix(1, ncol=5, nrow=5)
matrix.a

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    1    1
## [2,]    1    1    1    1    1
## [3,]    1    1    1    1    1
## [4,]    1    1    1    1    1
## [5,]    1    1    1    1    1

matrix.b <- matrix(5, ncol=5, nrow=5)
matrix.b

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    5    5    5    5    5
## [2,]    5    5    5    5    5
## [3,]    5    5    5    5    5
## [4,]    5    5    5    5    5
## [5,]    5    5    5    5    5

matrix.a - matrix.b

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   -4   -4   -4   -4   -4
## [2,]   -4   -4   -4   -4   -4
## [3,]   -4   -4   -4   -4   -4
## [4,]   -4   -4   -4   -4   -4
## [5,]   -4   -4   -4   -4   -4

matrix.b - matrix.a

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    4    4    4    4    4
## [2,]    4    4    4    4    4
## [3,]    4    4    4    4    4
## [4,]    4    4    4    4    4
## [5,]    4    4    4    4    4

matrix.a + matrix.b

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    6    6    6    6    6
## [2,]    6    6    6    6    6
## [3,]    6    6    6    6    6
## [4,]    6    6    6    6    6
## [5,]    6    6    6    6    6

To multiply matrices we need the left matrix to have the same number of columns as the number of rows in the right matrix. Instead of * we use %*% to multiply matrices.

Take note of what happens when we try to do the following operations:

$CD$ : matrix.c \%*\% matrix.d
$DC$ : matrix.d \%*\% matrix.c

matrix.c <- matrix(3, ncol=4, nrow=5)
matrix.c

##      [,1] [,2] [,3] [,4]
## [1,]    3    3    3    3
## [2,]    3    3    3    3
## [3,]    3    3    3    3
## [4,]    3    3    3    3
## [5,]    3    3    3    3

dim(matrix.c)

## [1] 5 4

matrix.d <- matrix(7, ncol=5, nrow=3)
matrix.d

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    7    7    7    7    7
## [2,]    7    7    7    7    7
## [3,]    7    7    7    7    7

dim(matrix.d)

## [1] 3 5

matrix.c %*% matrix.d 

matrix.d %*% matrix.c

Simple Linear Regression Example

Practice 1

Create a vector with the numbers 0-20 sequenced by 5. What is the length of the vector?
Create a vector that draws 20 random numbers from a normal distribution with a mean of 0 and a standard deviation of 1. Is the mean of this vector zero?
Create a matrix that includes the numbers 1-6, with the following dimensions: $2 \times 3$ . What number is in the first row, second column?
Create another matrix (give it a different name) that includes the numbers 1-6 with the following dimensions: $3 \times 2$
Before multiplying these matrices in R, multiply these matrices by hand on paper. What will be the dimensions of the results matrix?
Use R to multiply the matrices that you created in a) and b) and check your results.

Reading and Writing Files

The Working Directory

To open files in R we need to specify the directory our datafiles are stored in. There are two ways to do this: using code or via the dropdown menus (this will vary by Windows or Mac).

setwd("~/Dropbox/MathCamp/2020/Lecture2/Lab2/")

To set the working directory in your .Rmd document, you will need to include the following line of code:

knitr::opts_knit$set(root.dir = '~/Dropbox/MathCamp/2020/Lecture2/Lab2')

There are many ways to read in files to R, depending on the file type. ## Read and Write .csv

?read.csv
data <- read.csv("Seattle_Pet_Licenses.csv")
write.csv(data, file = 'Seattle_Pets_copy.csv',
          row.names = F)

Read and write R data files, .rda

data_copy <- data
save(data_copy, file = 'SeattlePets.rda') 
rm(data_copy)

What is the name of the data set that loaded by the line of code below?

load('SeattlePets.rda')

Stata files, .dta

To load Stata data files we either need to use the package foreign or haven.

library(foreign)
write.dta(data_copy, file = 'SeattlePets.dta') #Save as a stata data frame
rm(data_copy)

data_copy <- read.dta('SeattlePets.dta')

Recall if you have not installed the foreign package you can do so using the following line of code.

install.packages('foreign', dependencies = T)

Other file types

Try Google-ing! Chances are there’s a package for the file type of your choice. I’ve loaded ASCII files, .txt, .xls, .xlsx files among others.

`data.frame`

A data.frame is a type of R object used for storing data. It can store non-numeric data as well.

Let’s go through some commands for exploring and viewing data frames.

Test whether an object is a data.frame object.

is.data.frame(data)

## [1] TRUE

is.data.frame(data_copy)

## [1] TRUE

rm(data_copy)

View variable names.

# VARIABLE NAMES
names(data)

## [1] "License.Issue.Date" "License.Number"     "Animal.s.Name"     
## [4] "Species"            "Primary.Breed"      "Secondary.Breed"   
## [7] "ZIP.Code"

colnames(data)

## [1] "License.Issue.Date" "License.Number"     "Animal.s.Name"     
## [4] "Species"            "Primary.Breed"      "Secondary.Breed"   
## [7] "ZIP.Code"

rownames(data)

Find dimensions.

dim(data) # this gives rows and then columns (n X p)

## [1] 51754     7

nrow(data)

## [1] 51754

ncol(data)

## [1] 7

length(data) # NOT ADVISED TO USE WITH MATRICES OR DATA FRAMES

## [1] 7

Data Manipulation

Like we did with vectors and matrices, we may want to select or view partial data frames.

Whether you go down the base R or tidyr/dplyr path is up to you, but I want you to have some familiarity with both.

Vignette base R vs tidyverse:
tidyr, dplyr cheat sheet: https://rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

Selecting variables

In base R, we select a column in one of two ways

library(dplyr)
library(tidyr)
data$Species
data[ , c("Species")]

In the Hadleyverse we use the select function:

select(data, Species)

Subsetting data

In base R, we subset data using Boolean logic tests. Here is a new data.frame of all the observations whose species is ``Cat’’.

head(data)

##   License.Issue.Date License.Number Animal.s.Name Species      Primary.Breed
## 1      April 19 2003         200097   Tinkerdelle     Cat Domestic Shorthair
## 2   February 07 2006          75432        Pepper     Cat               Manx
## 3        May 21 2014         727943        Ashley     Cat Domestic Shorthair
## 4        May 08 2015         833836          Lulu     Cat             LaPerm
## 5        May 13 2015         361031        My Boy     Cat       Russian Blue
## 6       July 21 2015         203480        Rocket     Cat Domestic Shorthair
##   Secondary.Breed ZIP.Code
## 1                    98116
## 2             Mix    98103
## 3                    98115
## 4                    98136
## 5                    98121
## 6                    98144

table(data$Species)

## 
##   Cat   Dog  Goat   Pig 
## 16829 34882    38     5

cat.base <- data[data$Species == "Cat", ]
dim(cat.base)

## [1] 16829     7

head(cat.base)

##   License.Issue.Date License.Number Animal.s.Name Species      Primary.Breed
## 1      April 19 2003         200097   Tinkerdelle     Cat Domestic Shorthair
## 2   February 07 2006          75432        Pepper     Cat               Manx
## 3        May 21 2014         727943        Ashley     Cat Domestic Shorthair
## 4        May 08 2015         833836          Lulu     Cat             LaPerm
## 5        May 13 2015         361031        My Boy     Cat       Russian Blue
## 6       July 21 2015         203480        Rocket     Cat Domestic Shorthair
##   Secondary.Breed ZIP.Code
## 1                    98116
## 2             Mix    98103
## 3                    98115
## 4                    98136
## 5                    98121
## 6                    98144

In the Hadleyverse one would use the filter function.

cat.tidy <- filter(data, Species == "Cat")
dim(cat.tidy)
head(cat.tidy)

We can also use what is called ``the pipeline’’ to do the same operation:

cat.tidy2 <- data %>% filter(Species == "Cat" )
dim(cat.tidy2)
head(cat.tidy2)

or to do multiple sequential operations.

Note: you must link the sequential functions by %>%. To make your code clean you probably want to use multiple lines, but the %>% must come at the end of a line or R will end your operation. What happens if you run this chunk of code?

data %>% filter(Species == "Cat" ) %>%
  select(Species)

data %>% filter(Species == "Cat" ) %>%
  select(Species)

Making a new variable

caf.data <- read.csv('caffeine.csv', header = T)
head(data)

##   License.Issue.Date License.Number Animal.s.Name Species      Primary.Breed
## 1      April 19 2003         200097   Tinkerdelle     Cat Domestic Shorthair
## 2   February 07 2006          75432        Pepper     Cat               Manx
## 3        May 21 2014         727943        Ashley     Cat Domestic Shorthair
## 4        May 08 2015         833836          Lulu     Cat             LaPerm
## 5        May 13 2015         361031        My Boy     Cat       Russian Blue
## 6       July 21 2015         203480        Rocket     Cat Domestic Shorthair
##   Secondary.Breed ZIP.Code
## 1                    98116
## 2             Mix    98103
## 3                    98115
## 4                    98136
## 5                    98121
## 6                    98144

caf.data$CaffKg <- caf.data$Caffeine/1000

# dplyr
caf.dplyr <- mutate(caf.data,
                     CaffKg = Caffeine/1000)
head(caf.dplyr)

caf.dplyr <- caf.dplyr %>% 
  mutate(CaffKg = Caffeine/1000)

Summary statistics

caf.sum.base <- data.frame(CaffMean = mean(caf.data$Caffeine),
                           CaffSd = sd(caf.data$Caffeine))
head(caf.sum.base)

##   CaffMean   CaffSd
## 1 39.32504 5.517254

# Use the "aggregate" function
## Column names might have to be changed afterwards
caf.sum.base <- aggregate(formula = 
                            Caffeine ~ 1, 
                           data = caf.data,
                           FUN = function(x) c(mean = mean(x), sd = sd(x)))
head(caf.sum.base)

##   Caffeine.mean Caffeine.sd
## 1     39.325042    5.517254

caf.sum.dplyr <- summarise(caf, 
                            CaffMean = mean(Caffeine),
                            CaffSD  = sd(Caffeine))
head(caf.sum.dplyr)

caf.sum.dplyr <- summarise_at(caf, 
                               .vars = c("Caffeine"), 
                               .funs = c("mean", "sd"))
names(caf.sum.dplyr)

Summary statistics by group

In many cases it’s inconsequential whether you use base R or the tidyverse. Often tidyr and dplyr functions are a bit faster than R, but I find the summarisefunction in the tidyverse to be MUCH, MUCH slower than aggregate in base R.

data(mtcars)
mtcars.sum.by <- aggregate(formula = cbind(mpg, wt) ~ cyl + gear, 
          data = mtcars, 
          FUN = function(x){
            c(mean = mean(x), sd = sd(x))
          },
          drop = T)
mtcars.sum.by

##   cyl gear   mpg.mean     mpg.sd   wt.mean     wt.sd
## 1   4    3 21.5000000         NA 2.4650000        NA
## 2   6    3 19.7500000  2.3334524 3.3375000 0.1732412
## 3   8    3 15.0500000  2.7743959 4.1040833 0.7683069
## 4   4    4 26.9250000  4.8073604 2.3781250 0.6006243
## 5   6    4 19.7500000  1.5524175 3.0937500 0.4131460
## 6   4    5 28.2000000  3.1112698 1.8265000 0.4433560
## 7   6    5 19.7000000         NA 2.7700000        NA
## 8   8    5 15.4000000  0.5656854 3.3700000 0.2828427

mtcars$total <- 1
mtcars.sum.by2 <- aggregate(formula = total ~ cyl + gear, 
          data = mtcars, 
          FUN = sum, drop = T)
mtcars.sum.by2

##   cyl gear total
## 1   4    3     1
## 2   6    3     2
## 3   8    3    12
## 4   4    4     8
## 5   6    4     4
## 6   4    5     2
## 7   6    5     1
## 8   8    5     2

table(mtcars$cyl, mtcars$gear)

##    
##      3  4  5
##   4  1  8  2
##   6  2  4  1
##   8 12  0  2

mtcars.sum.dplyr <- mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise(mpg.mean = mean(mpg),
            mpg.sd = sd(mpg),
            wt.mean = mean(wt),
            wt.sd = sd(wt),
            total = n()) %>% 
  ungroup()
mtcars.sum.dplyr

`summary()`

The summary() function will summarize variables in a data set, based on their class.

summary(data)

##         License.Issue.Date License.Number  Animal.s.Name   Species     
##  July 24 2018    :  346    21091  :    2   Lucy   :  434   Cat :16829  
##  November 07 2017:  291    S100636:    2   Luna   :  395   Dog :34882  
##  January 16 2018 :  286    S102467:    2   Charlie:  376   Goat:   38  
##  August 07 2018  :  276    S104231:    2   Bella  :  327   Pig :    5  
##  December 05 2017:  239    S104449:    2          :  294               
##  March 20 2018   :  237    S104953:    2   Daisy  :  264               
##  (Other)         :50079    (Other):51742   (Other):49664               
##                Primary.Breed                Secondary.Breed     ZIP.Code    
##  Domestic Shorthair   : 9819                        :26842   98115  : 4537  
##  Retriever, Labrador  : 4636   Mix                  :13511   98103  : 4394  
##  Domestic Medium Hair : 2030   Poodle, Standard     : 1149   98117  : 3804  
##  Retriever, Golden    : 1872   Poodle, Miniature    :  909   98125  : 2798  
##  Chihuahua, Short Coat: 1859   Retriever, Labrador  :  885   98122  : 2480  
##  Domestic Longhair    : 1317   Chihuahua, Short Coat:  423   98107  : 2426  
##  (Other)              :30221   (Other)              : 8035   (Other):31315

min(caf.data$Caffeine)

## [1] 28.43

max(caf.data$Caffeine)

## [1] 52.54

mean(caf.data$Caffeine)

## [1] 39.32504

sd(caf.data$Caffeine)

## [1] 5.517254

var(caf.data$Caffeine)

## [1] 30.44009

sqrt(var(caf.data$Caffeine)) # same as the sd

## [1] 5.517254

median(caf.data$Caffeine)

## [1] 38.78

quantile(caf.data$Caffeine,0.5)

##   50% 
## 38.78

quantile(caf.data$Caffeine,0.25)

##    25% 
## 34.865

quantile(caf.data$Caffeine,0.75)

##    75% 
## 43.815

quantile(caf.data$Caffeine,c(0.25,0.5,0.75))

##    25%    50%    75% 
## 34.865 38.780 43.815

Math Camp 2020: R Lab 2

Jessica Godwin

September 22, 2020

Vectors in `R`

`rep()`

`seq()`

`c()`

`rnorm()`

Subsetting Vectors

QUESTION: What will the following code do?

Some more helpful vector functions

Matrices in `R`

Subsetting a matrix

Mathematical operations using matrices

Simple Linear Regression Example

Practice 1

Reading and Writing Files

The Working Directory

Read and write R data files, .rda

Stata files, .dta

Other file types

`data.frame`

Data Manipulation

Selecting variables

Subsetting data

Making a new variable

Summary statistics

Summary statistics by group

`summary()`

Math Camp 2020: R Lab 2

Jessica Godwin

September 22, 2020

Vectors in R

rep()

seq()

c()

rnorm()

Subsetting Vectors

QUESTION: What will the following code do?

Some more helpful vector functions

Matrices in R

Subsetting a matrix

Mathematical operations using matrices

Simple Linear Regression Example

Practice 1

Reading and Writing Files

The Working Directory

Read and write R data files, .rda

Stata files, .dta

Other file types

data.frame

Data Manipulation

Selecting variables

Subsetting data

Making a new variable

Summary statistics

Summary statistics by group

summary()

Vectors in `R`

`rep()`

`seq()`

`c()`

`rnorm()`

Matrices in `R`

`data.frame`

`summary()`