This post demonstrates beginner operations with R data frames. Below I’ll cover some simple techniques for subsetting data frames and adding-and-removing columns. I’ll also explain how to perform vector operations on a factorized column.
Selecting Data Frame Columns with the $ Operator
The simplest of beginner operations with R data frames would be selecting a column of data. I’ll use the ChickWeight
data set that comes built-in to the R runtime environment.
In order to select, for example, just the “Weight” column, use the $
operator as follows:
> ChickWeight$weight
[1] 42 51 59 64 76 93 106 125 149 171 199 205 40 49
[15] 58 72 84 103 122 138 162 187 209 215 43 39 55 67
[29] 84 99 115 138 163 187 198 202 42 49 56 67 74 87
The above output is clearly a vector, and that can be verified by running is.vector(ChickWeight$weight)
Selecting Data Frame Columns with Brackets
In a previous post, I demonstrated that subsetting matrices can be tricky. The same is true for data frames, to a degree. A clear difference between data frames and matrices is that you can select a single level row of a data frame, and it will stay a data frame:
> ChickWeight[1,]
weight Time Chick Diet
1 42 0 1 1
> is.data.frame(ChickWeight[1,])
[1] TRUE
However, just like matrices, R will convert the selection of a single level data frame column to a vector:
Again, like matrices, you can use the drop
parameter to prevent default behavior and maintain the integrity of the data frame:
Add and Remove Data Frame Columns with the $ Operator
Adding and removing data frame columns is straightforward with the $
operator.
Here is an example of adding a column. Just specify the column name and that’s it:
In the same manner, specify the column and set its value to NULL
and it will be removed:
Performing Vector Operations on Factorized Columns
Recall that R will intuit when a column represents a group and factorize it. With the ChickWeight
data set, this occurs with the Chick
and Diet
columns.
I had a similar structure on a production data set, and analogously, I wanted to multiple weight
by Chick
. If you try to do that, you’ll receive the error, '*' is not meaningful for ordered factors.
Thank you Stack Overflow for the save. From that post, you can see that the ordered factor must be converted to numeric:
as.numeric(as.character(new[,2])) * 5
When I first tried this update to my code, I was lazy and excluded the as.character
portion of the casting. Can you guess what happened? The value was cast to the numeric version of its ordered factor grouping, not the conversion of the literal value:
Actually, in its own right this was a useful discovery, but I needed the literal value in my production data. In order to get the correct conversion, I used the method as covered on the Stack Overflow post:
That’s a summary of some beginner operations with R data frames.
Categories: R Programming
Leave a Reply