Beginner Operations with R Data Frames

This post demonstrates beginner operations with R data frames. Below I’ll cover some simple techniques for subsetting data frames and adding-and-removing columns. I’ll also explain how to perform vector operations on a factorized column.

Selecting Data Frame Columns with the $ Operator

The simplest of beginner operations with R data frames would be selecting a column of data. I’ll use the ChickWeight data set that comes built-in to the R runtime environment.

R ChickWeight sample data.

In order to select, for example, just the “Weight” column, use the $ operator as follows:

> ChickWeight$weight
[1] 42 51 59 64 76 93 106 125 149 171 199 205 40 49
[15] 58 72 84 103 122 138 162 187 209 215 43 39 55 67
[29] 84 99 115 138 163 187 198 202 42 49 56 67 74 87

The above output is clearly a vector, and that can be verified by running is.vector(ChickWeight$weight)

Selecting Data Frame Columns with Brackets

In a previous post, I demonstrated that subsetting matrices can be tricky. The same is true for data frames, to a degree. A clear difference between data frames and matrices is that you can select a single level row of a data frame, and it will stay a data frame:

> ChickWeight[1,]
weight Time Chick Diet
1 42 0 1 1
> is.data.frame(ChickWeight[1,])
[1] TRUE

However, just like matrices, R will convert the selection of a single level data frame column to a vector:

R single level select of data frame.

Again, like matrices, you can use the drop parameter to prevent default behavior and maintain the integrity of the data frame:

R data frame subset with brackets using drop.

Add and Remove Data Frame Columns with the $ Operator

Adding and removing data frame columns is straightforward with the $ operator.

Here is an example of adding a column. Just specify the column name and that’s it:

R using $ to add column.

In the same manner, specify the column and set its value to NULL and it will be removed:

R data frame removing columns.

Performing Vector Operations on Factorized Columns

Recall that R will intuit when a column represents a group and factorize it. With the ChickWeight data set, this occurs with the Chick and Diet columns.

R factorized column example.

I had a similar structure on a production data set, and analogously, I wanted to multiple weight by Chick. If you try to do that, you’ll receive the error, '*' is not meaningful for ordered factors.

'*'  is not meaningful for ordered factors.

Thank you Stack Overflow for the save. From that post, you can see that the ordered factor must be converted to numeric:

as.numeric(as.character(new[,2])) * 5

When I first tried this update to my code, I was lazy and excluded the as.character portion of the casting. Can you guess what happened? The value was cast to the numeric version of its ordered factor grouping, not the conversion of the literal value:

R ordered factor converted to grouping value.

Actually, in its own right this was a useful discovery, but I needed the literal value in my production data. In order to get the correct conversion, I used the method as covered on the Stack Overflow post:

R ordered factor correctly cast.

That’s a summary of some beginner operations with R data frames.



Categories: R Programming

Tags: ,

1 reply

Trackbacks

  1. R Data Frame Filters Using Brackets - Westmorr Consulting

Leave a Reply

%d bloggers like this: