# Beginner Operations with R Data Frames

This post demonstrates beginner operations with R data frames. Below I’ll cover some simple techniques for subsetting data frames and adding-and-removing columns. I’ll also explain how to perform vector operations on a factorized column.

#### Selecting Data Frame Columns with the \$ Operator

The simplest of beginner operations with R data frames would be selecting a column of data. I’ll use the `ChickWeight` data set that comes built-in to the R runtime environment.

In order to select, for example, just the “Weight” column, use the `\$` operator as follows:

> ChickWeight\$weight
[1] 42 51 59 64 76 93 106 125 149 171 199 205 40 49
[15] 58 72 84 103 122 138 162 187 209 215 43 39 55 67
[29] 84 99 115 138 163 187 198 202 42 49 56 67 74 87

The above output is clearly a vector, and that can be verified by running `is.vector(ChickWeight\$weight)`

#### Selecting Data Frame Columns with Brackets

In a previous post, I demonstrated that subsetting matrices can be tricky. The same is true for data frames, to a degree. A clear difference between data frames and matrices is that you can select a single level row of a data frame, and it will stay a data frame:

> ChickWeight[1,]
weight Time Chick Diet
1 42 0 1 1
> is.data.frame(ChickWeight[1,])
[1] TRUE

However, just like matrices, R will convert the selection of a single level data frame column to a vector:

Again, like matrices, you can use the `drop` parameter to prevent default behavior and maintain the integrity of the data frame:

#### Add and Remove Data Frame Columns with the \$ Operator

Adding and removing data frame columns is straightforward with the `\$` operator.

Here is an example of adding a column. Just specify the column name and that’s it:

In the same manner, specify the column and set its value to `NULL` and it will be removed:

#### Performing Vector Operations on Factorized Columns

Recall that R will intuit when a column represents a group and factorize it. With the `ChickWeight` data set, this occurs with the `Chick` and `Diet` columns.

I had a similar structure on a production data set, and analogously, I wanted to multiple `weight` by `Chick`. If you try to do that, you’ll receive the error, `'*' is not meaningful for ordered factors.`

Thank you Stack Overflow for the save. From that post, you can see that the ordered factor must be converted to numeric:

`as.numeric(as.character(new[,2])) * 5`

When I first tried this update to my code, I was lazy and excluded the `as.character` portion of the casting. Can you guess what happened? The value was cast to the numeric version of its ordered factor grouping, not the conversion of the literal value:

Actually, in its own right this was a useful discovery, but I needed the literal value in my production data. In order to get the correct conversion, I used the method as covered on the Stack Overflow post:

That’s a summary of some beginner operations with R data frames.

Categories: R Programming