Learning Objectives

  • Know the distinctions between how R handles different types of data types (numbers, strings, and logicals).
  • Describe what a vector is.
  • Create vectors of different data types.
  • Use indexing to subset and modify specific portions of vectors.

1 Data types

Every programming language has the ability to store data of different types. R recognizes several important basic data types (there are others, but these cover most cases):

Type Description Example
double Number with a decimal place (aka “float”) 3.14, 1.61803398875
integer Number without a decimal place 1, 42
character Text in quotes (aka “string”) "this is some text", "3.14"
logical True or False (for comparing things) TRUE, FALSE

If you want to check with type a value is, you can use the function typeof(). For example:

typeof("hello")
## [1] "character"

1.1 Numeric types

Numbers in R have the numeric data type, which is also the default computational type. There are two types of numbers:

  • Integers
  • Non-integers (aka “double” or “float”)

The difference is that integers don’t have decimal values. A non-integer in R has the type “double”:

typeof(3.14)
## [1] "double"

By default, R assumes all numbers have a decimal place, even if it looks like an integer:

typeof(3)
## [1] "double"

In this case, R assumes that 3 is really 3.0. To make sure R knows you really do mean to create an integer, you have to add an L to the end of the number1:

typeof(3L)
## [1] "integer"

1.2 Character types

A character value is used to represent string values in R. Anything put between single quotes ('') or double quotes ("") will be stored as a character. For example:

typeof('3')
## [1] "character"

Notice that even though the value looks like a number, because it is inside quotes R interprets it as a character. If you mistakenly thought it was a a number, R will gladly return an error when you try to do a numerical operation with it:

'3' + 7
## Error in "3" + 7: non-numeric argument to binary operator

It doesn’t mattef if you use single or double quotes to create a character. The only time is does matter is if the character is a quote symbole itself. For example, if you wanted to type the word "don't", you should use double quotes so that R knows the single quote is part of the character:

typeof("don't")
## [1] "character"

If you used single quotes, you’ll get an error because R reads 'don' as a character:

typeof('don't')
## Error: <text>:1:13: unexpected symbol
## 1: typeof('don't
##                 ^

We will go into much more detail about working with character values later on in Week 7.

1.3 Logical types

Logical data only have two values: TRUE or FALSE. Note that these are not in quotes and are in all caps.

typeof(TRUE)
## [1] "logical"
typeof(FALSE)
## [1] "logical"

R uses these two special values to help answer questions about logical statements. For example, let’s compare whether 1 is greater than 2:

1 > 2
## [1] FALSE

R returns the values FALSE because 1 is not greater than 2. If I flip the question to whether 1 is less than 2, I’ll get TRUE:

1 < 2
## [1] TRUE

1.4 Special values

In addition to the four main data types mentioned, there are a few additional “special” types: Inf, NaN, NA and NULL.

Infinity: Inf corresponds to a value that is infinitely large (or infinitely small with -Inf). The easiest way to get Inf is to divide a positive number by 0:

1/0
## [1] Inf

Not a Number: NaN is short for “not a number”, and it’s basically a reserved keyword that means “there isn’t a mathematically defined number for this.” For example:

0/0
## [1] NaN

Not available: NA indicates that the value that is “supposed” to be stored here is missing. We’ll see these much more when we start getting into data structures like vectors and data frames.

No value: NULL asserts that the variable genuinely has no value whatsoever, or does not even exist.

1.5 Converting data types

You can convert an object from one type to another using as.______(), replacing “______” with a data type:

  • character
  • logical
  • numeric / double / integer

Convert numeric types:

as.numeric("3.1415")
## [1] 3.1415
as.double("3.1415")
## [1] 3.1415
as.integer("3.1415")
## [1] 3

Convert non-numeric types:

as.character(3.1415)
## [1] "3.1415"
as.logical(3.1415)
## [1] TRUE

A few notes to keep in mind:

  1. When converting from a numeric to a logical, as.logical() will always return TRUE for any numeric value other than 0, for which it returns FALSE.

    as.logical(7)
    ## [1] TRUE
    as.logical(0)
    ## [1] FALSE

    The reverse is also true

    as.numeric(TRUE)
    ## [1] 1
    as.numeric(FALSE)
    ## [1] 0
  2. Not everything can be converted. For example, if you try to coerce a character that contains letters into a number, R will return NA, because it doesn’t know what number to choose:

    as.numeric('foo')
    ## Warning: NAs introduced by coercion
    ## [1] NA
  3. The as.integer() function behaves the same as floor():

    as.integer(3.14)
    ## [1] 3
    as.integer(3.99)
    ## [1] 3

1.6 Checking data types

Similar to the as.______() format, you can check if an object is a specific data type using is.______(), replacing “______” with a data type.

Checking numeric types:

is.numeric(3.1415)
## [1] TRUE
is.double(3.1415)
## [1] TRUE
is.integer(3.1415)
## [1] FALSE

Checking non-numeric types:

is.character(3.1415)
## [1] FALSE
is.logical(3.1415)
## [1] FALSE

One thing you’ll notice is that is.integer() often gives you a surprising result. For example, why did is.integer(7) return FALSE?. Well, this is because numbers are doubles by default in R, so even though 7 looks like an integer, R thinks it’s a double.

The safer way to check if a number is an integer in value is to compare it against itself converted into an integer:

7 == as.integer(7)
## [1] TRUE

2 Vectors

A vector is the most common and basic data structure in R, and is pretty much the workhorse of R. It’s basically just a list of values, mainly either numbers or characters.

Watch this 1-minute video for a quick summary of vectors

2.1 Creating vectors

The most basic way of creating a vector is to use the c() function (“c” is for “concatenate”):

x <- c(1, 2, 3)
length(x)
## [1] 3

You can also create vectors by making a sequence of numbers using the : operator or the seq() function:

1:5
## [1] 1 2 3 4 5
seq(1, 10)
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(1, 10, by = 2)
## [1] 1 3 5 7 9

You can also create a vector by using the rep() function, which replicates the same value n times:

y <- rep(5, 10) # The number 5 ten times
y
##  [1] 5 5 5 5 5 5 5 5 5 5
z <- rep("foo", 5) # The character "foo" five times
z
## [1] "foo" "foo" "foo" "foo" "foo"

In fact, you can use the rep() function to create longer vectors made up of repeated vectors:

rep(c(1, 2), 3) # Repeat the vector c(1, 2) three times
## [1] 1 2 1 2 1 2

If you add the each argument, rep() will repeat each element in the vector:

rep(c(1, 2), each = 3) # Repeat each element of the vector c(1, 2) three times
## [1] 1 1 1 2 2 2

You can see how long a vector is using the length() function:

length(y)
## [1] 10
length(z)
## [1] 5

2.2 Vector coercion

Each element in a vector must have the same type. If you mix types in a vector, R will coerce all the elements to either a numeric or character type.

If a vector has a single character element, R makes everything a character:

c(1, 2, "3")
## [1] "1" "2" "3"
c(TRUE, FALSE, "TRUE")
## [1] "TRUE"  "FALSE" "TRUE"

If a vector has numeric and logical elements, R makes everything a number:

c(1, 2, TRUE, FALSE)
## [1] 1 2 1 0

If a vector has integers and floats, R makes everything a float:

c(1L, 2, pi)
## [1] 1.000000 2.000000 3.141593

2.3 Deleting vectors

You can delete a vector by assigning NULL to it:

x <- seq(1, 10)
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x <- NULL
x
## NULL

2.4 Numeric vectors

Numeric vectors are vectors of numbers (either integers or doubles):

v <- c(pi, 7, 42, 365)
v
## [1]   3.141593   7.000000  42.000000 365.000000
typeof(v)
## [1] "double"

R has many built-in functions that are designed to give summary information about numeric vectors. Note that these functions take a vectors of numbers and return single values. Here are some common ones:

Function Description Example
mean(x) Mean of values in x mean(c(1,2,3,4,5)) returns 3
median(x) Median of values in x median(c(1,2,2,4,5)) returns 2
max(x) Max element in x max(c(1,2,3,4,5)) returns 5
min(x) Min element in x min(c(1,2,3,4,5)) returns 1
sum(x) Sums the elements in x sum(c(1,2,3,4,5)) returns 15
prod(x) Product of the elements in x prod(c(1,2,3,4,5)) returns 120

2.5 Character vectors

Character vectors are vectors where each element is a string:

stringVector <- c('oh', 'what', 'a', 'beautiful', 'morning')
stringVector
## [1] "oh"        "what"      "a"         "beautiful" "morning"
typeof(stringVector)
## [1] "character"

As we’ll see in the next lesson on strings, you can “collapse” a character vector into a single string using the str_c() function from the stringr library:

library(stringr)
str_c(stringVector, collapse = ' ')
## [1] "oh what a beautiful morning"

2.6 Logical vectors

Logical vectors contain only TRUE or FALSE elements:

logicalVector <- c(rep(TRUE, 3), rep(FALSE, 3))
logicalVector
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

If you add a numeric type to a logical vector, the logical elements will be converted to either a 1 for TRUE or 0 for FALSE:

c(logicalVector, 42)
## [1]  1  1  1  0  0  0 42

Warning: If you add a character type to a logical vector, the logical elements will be converted to strings of "TRUE" and "FALSE". So even though they may still look like logical types, they aren’t:

y <- c(logicalVector, 'string')
y
## [1] "TRUE"   "TRUE"   "TRUE"   "FALSE"  "FALSE"  "FALSE"  "string"
typeof(y)
## [1] "character"

2.7 Comparing vectors

If you want to check if two vectors are identical (in that they contain all the same elements), you can’t use the typical == operator by itself. The reason is because the == operator is performed element-wise, so it will return a logical vector:

x <- c(1,2,3)
y <- c(1,2,3)
x == y
## [1] TRUE TRUE TRUE

Instead of getting one TRUE, you get a vector of TRUEs, because the individual elements are indeed equal. To compare if all the elements in the two vectors are identical, wrap the comparison inside the all() function:

all(x == y)
## [1] TRUE

Keep in mind that there are really two steps going on here: 1) x == y creates a logical vectors of TRUE’s and FALSE’s based on element-wise comparisons, and 2) the all() function compares whether all of the values in the logical vector are TRUE.

You can also use the all() function to compare if other types of conditions are all TRUE for all elements in two vectors:

a <- c(1,2,3)
b <- -1*c(1,2,3)
all(a > b)
## [1] TRUE

In contrast to the all() function, the any() function will return TRUE if any of the elements in a vector are TRUE:

a <- c(1,2,3)
b <- c(-1,2,-3)
a == b
## [1] FALSE  TRUE FALSE
any(a == b)
## [1] TRUE

For most situations, the all() function works just fine for comparing vectors, but it only compares the elements in the vectors, not their attributes. In some situations, you might also want to check if the attributes of vector, such as their names and data types, are also the same. In this case, you should use the identical() function.

names(x) <- c('a', 'b', 'c')
names(y) <- c('one', 'two', 'three')
all(x == y) # Only compares the elements
## [1] TRUE
identical(x, y) # Also compares the **names** of the elements
## [1] FALSE

Notice that for the identical() function, you don’t need to add a conditional statement - you just provide it the two vectors you want to compare. This is because identical() by definition is comparing if two things are the same.

3 Accessing elements in a vector

You can access elements from a vector using brackets [] and indices inside the brackets. You can use integer indices (probably the most common way), character indices (by naming each element), and logical indices.

3.1 Using integer indices

Vector indices start from 1 (this is important - most programming languages start from 0):

x <- seq(1, 10)
x[1] # Returns the first element
## [1] 1
x[3] # Returns the third element
## [1] 3
x[length(x)] # Returns the last element
## [1] 10

You can access multiple elements by using a vector of indices inside the brackets:

x[c(1:3)]  # Returns the first three elements
## [1] 1 2 3
x[c(2, 7)] # Returns the 2nd and 7th elements
## [1] 2 7

You can also use negative integers to remove elements, which returns all elements except that those specified:

x[-1] # Returns everything except the first element
## [1]  2  3  4  5  6  7  8  9 10
x[-c(2, 7)] # Returns everything except the 2nd and 7th elements
## [1]  1  3  4  5  6  8  9 10

But you cannot mix positive and negative integers while indexing:

x[c(-2, 7)]
## Error in x[c(-2, 7)]: only 0's may be mixed with negative subscripts

If you try to use a float as an index, it gets rounded down to the nearest integer:

x[3.1415] # Returns the 3rd element
## [1] 3
x[3.9999] # Still returns the 3rd element
## [1] 3

3.2 Using characters indices

You can name the elements in a vector and then use those names to access elements. To create a named vector, use the names() function:

x <- seq(5)
names(x) <- c('a', 'b', 'c', 'd', 'e')
x
## a b c d e 
## 1 2 3 4 5

You can also create a named vector by putting the names directly in the c() function:

x <- c('a' = 1, 'b' = 2, 'c' = 3, 'd' = 4, 'e' = 5)
x
## a b c d e 
## 1 2 3 4 5

Once your vector has names, you can then use those names as indices:

x['a'] # Returns the first element
## a 
## 1
x[c('a', 'c')] # Returns the 1st and 3rd elements
## a c 
## 1 3

3.3 Using logical indices

When using a logical vector for indexing, the position where the logical vector is TRUE is returned. This is helpful for filtering vectors based on conditions:

x <- seq(1, 10)
x > 5 # Create logical vector
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
x[x > 5] # Put logical vector in brackets to filter out the TRUE elements
## [1]  6  7  8  9 10

You can also use the which() function to find the numeric indices for which a condition is TRUE, and then use those indices to select elements:

which(x < 5) # Returns indices of TRUE elements
## [1] 1 2 3 4
x[which(x < 5)] # Use which to select elements based on a condition
## [1] 1 2 3 4

4 Vectorized operations

Most base functions in R are “vectorized”, meaning that when you give them a vector, they perform the operation on each element in the vector.

4.1 Arithmetic operations

When you perform arithmetic operations on vectors, they are executed on an element-by-element basis:

x1 <- c(1, 2, 3)
x2 <- c(4, 5, 6)
# Addition
x1 + x2 # Returns (1+4, 2+5, 3+6)
## [1] 5 7 9
# Subtraction
x1 - x2 # Returns (1-4, 2-5, 3-6)
## [1] -3 -3 -3
# Multiplicattion
x1 * x2 # Returns (1*4, 2*5, 3*6)
## [1]  4 10 18
# Division
x1 / x2 # Returns (1/4, 2/5, 3/6)
## [1] 0.25 0.40 0.50

When performing vectorized operations, the vectors need to have the same dimensions, or one of the vectors needs to be a single-value vector:

# Careful! Mis-matched dimensions will only give you a warning, but will still return a value:
x1 <- c(1, 2, 3)
x2 <- c(4, 5)
x1 + x2
## Warning in x1 + x2: longer object length is not a multiple of shorter object
## length
## [1] 5 7 7

What R does in these cases is repeat the shorter vector, so in the above case the last value is 3 + 4.

If you have a single value vector, R will add it element-wise:

x1 <- c(1, 2, 3)
x2 <- c(4)
x1 + x2
## [1] 5 6 7

4.2 Sorting

You can reorder the arrangement of elements in a vector by using the sort() function:

a = c(2, 4, 6, 3, 1, 5)
sort(a)
## [1] 1 2 3 4 5 6
sort(a, decreasing = TRUE)
## [1] 6 5 4 3 2 1

To get the index values of the sorted order, use the order() function:

order(a)
## [1] 5 1 4 2 6 3

These indices tell us that the first value in the sorted arrangement of vector a is element number 5 (which is a 1), the second value is element number 1 (which is a 2), and so on. If you use order() as the indices to the vector, you’ll get the sorted vector:

a[order(a)] # Same as sort(a)
## [1] 1 2 3 4 5 6

Page sources:

Some content on this page has been modified from other courses, including:


  1. Why L? Well, it’s a bit complicated, but R supports complex numbers which are denoted by i, so i was already taken. A quick answer is that R uses 32-bit long integers, so L for “long”.↩︎


View the source code for this site
LICENSE: CC-BY-SA