Closed Captioning Closed captioning available on our YouTube channel

R tip: Learn dplyr’s case_when() function

InfoWorld | May 22, 2018

In this second episode of Do More with R, Sharon Machlis, director of Editorial Data & Analytics at IDG Communications, shows how dplyr's case_when() function helps avoid a lot of nested ifelse statements

Copyright © 2018 IDG Communications, Inc.

Similar
Hi, this is Sharon Machlis, Director of Editorial Data & Analytics at IDG Communications.

In this second episode of Do More with R, we’ll see how dplyr case_when() function helps \ avoid a lot of nested ifelse statements.

For data, I have a list of US states and their estimated populations, which you can see here.

I also set up R variables showing which states are in each region


First, let me load the state population data and the USRegions file, and also load the dplyr package.

And look at the structure of the data.


The task here is to assign each state to its proper division.

There are a couple of different ways to do this. One common way is to use R’s ifelse function. In R, if you want to run an if statement across an entire vector at once, you typically use the special ifelse function, that’s ifelse all one word.

In this case, it might look something like this:

That does work if you’ve only got a few alternatives. But I find that format difficult to read with more than a couple of options. And, it’s easy for me to make mistakes with closing parentheses or commas in the wrong place.

And, what if I wanted to assign states by Division, instead of Region?


That’s 9 levels of nested if-elses!

Dplyr’s case_when() has an easier format. Here’s the syntax:


Each if-then statement has its own line. The condition, if test, is on the left. Then there’s a tilde, and then the value is on the right. Each line needs a comma, except the last line. If you want to have a catch-all value for everything you haven’t defined, put the last condition as TRUE (I’m not sure why it’s TRUE, but it is), and then your catch-all value on the right. Done.

Let me show a simple example testing whether a few numbers are even, odd, or – if they’re not whole integers – neither.

I’ll create a vector of numbers 1, 2, 3, 4, and 5.7. Let’s run a case_when to see if they’re even or odd. If the remainder when dividing by two is 0, it’s even. If the remainer is 1,it’s odd. Otherwise, it’s neither. Results should be odd, even, odd, even, neither. Let me run this code block:

And we’ve got odd, even, odd, even, neither!

Now let’s see what that looks like for the state region examples.

Here I’m importing my state population file into R, then adding a column called Division with dplyr’s mutate function. The values of Division are based on the case_when statement. If the State is my vector of Northeast state names, I’ll assign the value Northeast. And so on.

Let me run that


And then look at the results


Looks good. And, no “Other” value, so all the states were assigned.

That’s it for this episode, thanks for watching! For more R tips, head to the More With R video page at bit.ly/morewithR. That’s https B I T period L Y slash more with R, all lowercase except for the R. So long!
Popular
Featured videos from IDG.tv