Learning Objectives
- Know the distinctions between how R handles different types of data types (numbers, strings, and logicals).
- Describe what a vector is.
- Create vectors of different data types.
- Use indexing to subset and modify specific portions of vectors.
Every programming language has the ability to store data of different types. R recognizes several important basic data types (there are others, but these cover most cases):
Type | Description | Example |
---|---|---|
double |
Number with a decimal place (aka “float”) | 3.14 , 1.61803398875 |
integer |
Number without a decimal place | 1 , 42 |
character |
Text in quotes (aka “string”) | "this is some text" , "3.14" |
logical |
True or False (for comparing things) | TRUE , FALSE |
If you want to check with type a value is, you can use the function
typeof()
. For example:
typeof("hello")
## [1] "character"
Numbers in R have the numeric
data type, which is also
the default computational type. There are two types of numbers:
The difference is that integers don’t have decimal values. A
non-integer in R has the type “double
”:
typeof(3.14)
## [1] "double"
By default, R assumes all numbers have a decimal place, even if it looks like an integer:
typeof(3)
## [1] "double"
In this case, R assumes that 3
is really
3.0
. To make sure R knows you really do mean to create an
integer, you have to add an L
to the end of the number1:
typeof(3L)
## [1] "integer"
A character value is used to represent string values in R. Anything
put between single quotes (''
) or double quotes
(""
) will be stored as a character. For example:
typeof('3')
## [1] "character"
Notice that even though the value looks like a number, because it is inside quotes R interprets it as a character. If you mistakenly thought it was a a number, R will gladly return an error when you try to do a numerical operation with it:
'3' + 7
## Error in "3" + 7: non-numeric argument to binary operator
It doesn’t mattef if you use single or double quotes to create a
character. The only time is does matter is if the character is
a quote symbole itself. For example, if you wanted to type the word
"don't"
, you should use double quotes so that R knows the
single quote is part of the character:
typeof("don't")
## [1] "character"
If you used single quotes, you’ll get an error because R reads
'don'
as a character:
typeof('don't')
## Error: <text>:1:13: unexpected symbol
## 1: typeof('don't
## ^
We will go into much more detail about working with character values later on in Week 7.
Logical data only have two values: TRUE
or
FALSE
. Note that these are not in quotes and are in all
caps.
typeof(TRUE)
## [1] "logical"
typeof(FALSE)
## [1] "logical"
R uses these two special values to help answer questions about
logical statements. For example, let’s compare whether 1
is
greater than 2
:
1 > 2
## [1] FALSE
R returns the values FALSE
because 1 is not greater than
2. If I flip the question to whether 1
is less
than 2
, I’ll get TRUE
:
1 < 2
## [1] TRUE
In addition to the four main data types mentioned, there are a few
additional “special” types: Inf
, NaN
,
NA
and NULL
.
Infinity: Inf
corresponds to a value
that is infinitely large (or infinitely small with -Inf
).
The easiest way to get Inf
is to divide a positive number
by 0:
1/0
## [1] Inf
Not a Number: NaN
is short for “not a
number”, and it’s basically a reserved keyword that means “there isn’t a
mathematically defined number for this.” For example:
0/0
## [1] NaN
Not available: NA
indicates that the
value that is “supposed” to be stored here is missing. We’ll see these
much more when we start getting into data structures like vectors and
data frames.
No value: NULL
asserts that the
variable genuinely has no value whatsoever, or does not even exist.
You can convert an object from one type to another using
as.______()
, replacing “______
” with a data
type:
character
logical
numeric
/ double
/
integer
Convert numeric types:
as.numeric("3.1415")
## [1] 3.1415
as.double("3.1415")
## [1] 3.1415
as.integer("3.1415")
## [1] 3
Convert non-numeric types:
as.character(3.1415)
## [1] "3.1415"
as.logical(3.1415)
## [1] TRUE
A few notes to keep in mind:
When converting from a numeric to a
logical, as.logical()
will always return
TRUE
for any numeric value other than 0
, for
which it returns FALSE
.
as.logical(7)
## [1] TRUE
as.logical(0)
## [1] FALSE
The reverse is also true
as.numeric(TRUE)
## [1] 1
as.numeric(FALSE)
## [1] 0
Not everything can be converted. For example, if you try to
coerce a character that contains letters into a number, R will return
NA
, because it doesn’t know what number to choose:
as.numeric('foo')
## Warning: NAs introduced by coercion
## [1] NA
The as.integer()
function behaves the same as
floor()
:
as.integer(3.14)
## [1] 3
as.integer(3.99)
## [1] 3
Similar to the as.______()
format, you can check if an
object is a specific data type using is.______()
, replacing
“______
” with a data type.
Checking numeric types:
is.numeric(3.1415)
## [1] TRUE
is.double(3.1415)
## [1] TRUE
is.integer(3.1415)
## [1] FALSE
Checking non-numeric types:
is.character(3.1415)
## [1] FALSE
is.logical(3.1415)
## [1] FALSE
One thing you’ll notice is that is.integer()
often gives
you a surprising result. For example, why did is.integer(7)
return FALSE
?. Well, this is because numbers are
doubles by default in R, so even though 7
looks like an integer, R thinks it’s a double.
The safer way to check if a number is an integer in value is to compare it against itself converted into an integer:
7 == as.integer(7)
## [1] TRUE
A vector is the most common and basic data structure in R, and is pretty much the workhorse of R. It’s basically just a list of values, mainly either numbers or characters.
The most basic way of creating a vector is to use the
c()
function (“c” is for “concatenate”):
x <- c(1, 2, 3)
length(x)
## [1] 3
You can also create vectors by making a sequence of numbers using the
:
operator or the seq()
function:
1:5
## [1] 1 2 3 4 5
seq(1, 10)
## [1] 1 2 3 4 5 6 7 8 9 10
seq(1, 10, by = 2)
## [1] 1 3 5 7 9
You can also create a vector by using the rep()
function, which replicates the same value n
times:
y <- rep(5, 10) # The number 5 ten times
y
## [1] 5 5 5 5 5 5 5 5 5 5
z <- rep("foo", 5) # The character "foo" five times
z
## [1] "foo" "foo" "foo" "foo" "foo"
In fact, you can use the rep()
function to create longer
vectors made up of repeated vectors:
rep(c(1, 2), 3) # Repeat the vector c(1, 2) three times
## [1] 1 2 1 2 1 2
If you add the each
argument, rep()
will
repeat each element in the vector:
rep(c(1, 2), each = 3) # Repeat each element of the vector c(1, 2) three times
## [1] 1 1 1 2 2 2
You can see how long a vector is using the length()
function:
length(y)
## [1] 10
length(z)
## [1] 5
Each element in a vector must have the same type. If you mix types in a vector, R will coerce all the elements to either a numeric or character type.
If a vector has a single character element, R makes everything a character:
c(1, 2, "3")
## [1] "1" "2" "3"
c(TRUE, FALSE, "TRUE")
## [1] "TRUE" "FALSE" "TRUE"
If a vector has numeric and logical elements, R makes everything a number:
c(1, 2, TRUE, FALSE)
## [1] 1 2 1 0
If a vector has integers and floats, R makes everything a float:
c(1L, 2, pi)
## [1] 1.000000 2.000000 3.141593
You can delete a vector by assigning NULL
to it:
x <- seq(1, 10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
x <- NULL
x
## NULL
Numeric vectors are vectors of numbers (either integers or doubles):
v <- c(pi, 7, 42, 365)
v
## [1] 3.141593 7.000000 42.000000 365.000000
typeof(v)
## [1] "double"
R has many built-in functions that are designed to give summary information about numeric vectors. Note that these functions take a vectors of numbers and return single values. Here are some common ones:
Function | Description | Example |
---|---|---|
mean(x) |
Mean of values in x |
mean(c(1,2,3,4,5)) returns 3 |
median(x) |
Median of values in x |
median(c(1,2,2,4,5)) returns 2 |
max(x) |
Max element in x |
max(c(1,2,3,4,5)) returns 5 |
min(x) |
Min element in x |
min(c(1,2,3,4,5)) returns 1 |
sum(x) |
Sums the elements in x |
sum(c(1,2,3,4,5)) returns 15 |
prod(x) |
Product of the elements in x |
prod(c(1,2,3,4,5)) returns 120 |
Character vectors are vectors where each element is a string:
stringVector <- c('oh', 'what', 'a', 'beautiful', 'morning')
stringVector
## [1] "oh" "what" "a" "beautiful" "morning"
typeof(stringVector)
## [1] "character"
As we’ll see in the next lesson on strings, you can “collapse” a character
vector into a single string using the str_c()
function from
the stringr
library:
library(stringr)
str_c(stringVector, collapse = ' ')
## [1] "oh what a beautiful morning"
Logical vectors contain only TRUE
or FALSE
elements:
logicalVector <- c(rep(TRUE, 3), rep(FALSE, 3))
logicalVector
## [1] TRUE TRUE TRUE FALSE FALSE FALSE
If you add a numeric type to a logical vector, the logical elements
will be converted to either a 1
for TRUE
or
0
for FALSE
:
c(logicalVector, 42)
## [1] 1 1 1 0 0 0 42
Warning: If you add a character type to a logical
vector, the logical elements will be converted to strings of
"TRUE"
and "FALSE"
. So even though they may
still look like logical types, they aren’t:
y <- c(logicalVector, 'string')
y
## [1] "TRUE" "TRUE" "TRUE" "FALSE" "FALSE" "FALSE" "string"
typeof(y)
## [1] "character"
If you want to check if two vectors are identical (in that they
contain all the same elements), you can’t use the typical
==
operator by itself. The reason is because the
==
operator is performed element-wise, so it will return a
logical vector:
x <- c(1,2,3)
y <- c(1,2,3)
x == y
## [1] TRUE TRUE TRUE
Instead of getting one TRUE
, you get a vector of
TRUE
s, because the individual elements are indeed equal. To
compare if all the elements in the two vectors are identical,
wrap the comparison inside the all()
function:
all(x == y)
## [1] TRUE
Keep in mind that there are really two steps going on here: 1)
x == y
creates a logical vectors of TRUE
’s and
FALSE
’s based on element-wise comparisons, and 2) the
all()
function compares whether all of the values in the
logical vector are TRUE
.
You can also use the all()
function to compare if other
types of conditions are all TRUE
for all elements in two
vectors:
a <- c(1,2,3)
b <- -1*c(1,2,3)
all(a > b)
## [1] TRUE
In contrast to the all()
function, the
any()
function will return TRUE
if
any of the elements in a vector are TRUE
:
a <- c(1,2,3)
b <- c(-1,2,-3)
a == b
## [1] FALSE TRUE FALSE
any(a == b)
## [1] TRUE
For most situations, the all()
function works just fine
for comparing vectors, but it only compares the elements in the
vectors, not their attributes. In some situations, you might
also want to check if the attributes of vector, such as their
names and data types, are also the same. In this case,
you should use the identical()
function.
names(x) <- c('a', 'b', 'c')
names(y) <- c('one', 'two', 'three')
all(x == y) # Only compares the elements
## [1] TRUE
identical(x, y) # Also compares the **names** of the elements
## [1] FALSE
Notice that for the identical()
function, you don’t need
to add a conditional statement - you just provide it the two vectors you
want to compare. This is because identical()
by definition
is comparing if two things are the same.
You can access elements from a vector using brackets []
and indices inside the brackets. You can use integer indices (probably
the most common way), character indices (by naming each element), and
logical indices.
Vector indices start from 1 (this is important - most programming languages start from 0):
x <- seq(1, 10)
x[1] # Returns the first element
## [1] 1
x[3] # Returns the third element
## [1] 3
x[length(x)] # Returns the last element
## [1] 10
You can access multiple elements by using a vector of indices inside the brackets:
x[c(1:3)] # Returns the first three elements
## [1] 1 2 3
x[c(2, 7)] # Returns the 2nd and 7th elements
## [1] 2 7
You can also use negative integers to remove elements, which returns all elements except that those specified:
x[-1] # Returns everything except the first element
## [1] 2 3 4 5 6 7 8 9 10
x[-c(2, 7)] # Returns everything except the 2nd and 7th elements
## [1] 1 3 4 5 6 8 9 10
But you cannot mix positive and negative integers while indexing:
x[c(-2, 7)]
## Error in x[c(-2, 7)]: only 0's may be mixed with negative subscripts
If you try to use a float as an index, it gets rounded down to the nearest integer:
x[3.1415] # Returns the 3rd element
## [1] 3
x[3.9999] # Still returns the 3rd element
## [1] 3
You can name the elements in a vector and then use those names to
access elements. To create a named vector, use the names()
function:
x <- seq(5)
names(x) <- c('a', 'b', 'c', 'd', 'e')
x
## a b c d e
## 1 2 3 4 5
You can also create a named vector by putting the names directly in
the c()
function:
x <- c('a' = 1, 'b' = 2, 'c' = 3, 'd' = 4, 'e' = 5)
x
## a b c d e
## 1 2 3 4 5
Once your vector has names, you can then use those names as indices:
x['a'] # Returns the first element
## a
## 1
x[c('a', 'c')] # Returns the 1st and 3rd elements
## a c
## 1 3
When using a logical vector for indexing, the position where the
logical vector is TRUE
is returned. This is helpful for
filtering vectors based on conditions:
x <- seq(1, 10)
x > 5 # Create logical vector
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
x[x > 5] # Put logical vector in brackets to filter out the TRUE elements
## [1] 6 7 8 9 10
You can also use the which()
function to find the
numeric indices for which a condition is TRUE
, and then use
those indices to select elements:
which(x < 5) # Returns indices of TRUE elements
## [1] 1 2 3 4
x[which(x < 5)] # Use which to select elements based on a condition
## [1] 1 2 3 4
Most base functions in R are “vectorized”, meaning that when you give them a vector, they perform the operation on each element in the vector.
When you perform arithmetic operations on vectors, they are executed on an element-by-element basis:
x1 <- c(1, 2, 3)
x2 <- c(4, 5, 6)
# Addition
x1 + x2 # Returns (1+4, 2+5, 3+6)
## [1] 5 7 9
# Subtraction
x1 - x2 # Returns (1-4, 2-5, 3-6)
## [1] -3 -3 -3
# Multiplicattion
x1 * x2 # Returns (1*4, 2*5, 3*6)
## [1] 4 10 18
# Division
x1 / x2 # Returns (1/4, 2/5, 3/6)
## [1] 0.25 0.40 0.50
When performing vectorized operations, the vectors need to have the same dimensions, or one of the vectors needs to be a single-value vector:
# Careful! Mis-matched dimensions will only give you a warning, but will still return a value:
x1 <- c(1, 2, 3)
x2 <- c(4, 5)
x1 + x2
## Warning in x1 + x2: longer object length is not a multiple of shorter object
## length
## [1] 5 7 7
What R does in these cases is repeat the shorter vector, so
in the above case the last value is 3 + 4
.
If you have a single value vector, R will add it element-wise:
x1 <- c(1, 2, 3)
x2 <- c(4)
x1 + x2
## [1] 5 6 7
You can reorder the arrangement of elements in a vector by using the
sort()
function:
a = c(2, 4, 6, 3, 1, 5)
sort(a)
## [1] 1 2 3 4 5 6
sort(a, decreasing = TRUE)
## [1] 6 5 4 3 2 1
To get the index values of the sorted order, use the
order()
function:
order(a)
## [1] 5 1 4 2 6 3
These indices tell us that the first value in the sorted arrangement
of vector a
is element number 5 (which is a
1
), the second value is element number 1
(which is a 2
), and so on. If you use order()
as the indices to the vector, you’ll get the sorted vector:
a[order(a)] # Same as sort(a)
## [1] 1 2 3 4 5 6
Page sources:
Some content on this page has been modified from other courses, including:
Why L
? Well, it’s a bit complicated,
but R supports complex numbers which are denoted by i
, so
i
was already taken. A quick answer is that R uses 32-bit
long integers, so L
for “long”.↩︎