Closed Captioning Closed captioning available on our YouTube channel

R tip: Access nested list items with purrr

InfoWorld | Aug 31, 2018

In this ninth episode of Do More with R, learn how to easily access and modify nested list items with the purrr package’s modify_depth function

Copyright © 2018 IDG Communications, Inc.

Similar
Hi, I’m Sharon Machlis, Director of Editorial Data & Analytics at IDG Communications. I’m here with Episode 9 of Do More With R: Access nested list items with the purrr package.
Lists can be one of the harder things to wrap your head around in R, even if you’ve been working in the language for awhile. “List columns” within data frames can be even more challenging if the structure isn’t the same for each value. Let me show you an example – and how easy it is to deal with using purrr.
First, the data set. I use Bob Rudis’s R geocodio package to geocode addresses. Results come back with latitude and longitude nested in a list column.
This is the code I ran to get the data. First, I loaded the packages I need. Next I created a tibble – that’s a special type of tidyverse data frame – with names and addresses of 5 tourist attractions in Boston. Finally, I used R geocodio’s batch geocoding function to get latitude and longitudes for the address column.
Let’s take a look at the results with glimpse.
Do you see that second column, response_results? It’s a list. Let’s take a look at that:
It’s a list of 5 data frames. Now let’s investigate the first data frame – reminder that we need double brackets to see the contents of a list item.
A few things here. Latitude is in the location.lat column. Longitude is in the location.lng column. And there are TWO entries for the address, not one, using 2 different sources: Commonwealth of Massachusetts and City of Boston. That adds an extra challenge, since I only want one geolocation for each address, not two.
So let’s think about what we’d want to do.
We want to work with the response_results column in the data, which happens to be a list.
And each item in that list happens to be a data frame.
For each data frame, we want to get the value of the first row of the location.lat column, and then also the first row of the location.lng column.
Separately, doing each of those things are basic.
The question is, how do we want to do that for each data frame in the list column?
Like pretty much everything in R, there are multiple ways to do this. I’d like to show an elegant answer, using purrr’s modify_depth function.
I’m first going to save just the response_results list column into a variable called mylist, just to make the rest of the code easier to read.
First argument is the list. Second argument is a number for how deep you want to go in the list, in case you’ve got a list of lists with lists. Third argument is a function or formula you want to use to do the modification.
The things we want from mylist are only 1 level deep. How do I know that? Each item in the list is a data frame, and that’s what we want to operate on.
So say we wanted to change every value in the formatted_address column in each data frame to be all lower case. This first line is the code:
I’m modifying mylist, at depth one, and I want to change the formatted_address column to lower. That tilde after the one comma says “what follows is a formula.” The dot represents each item being modified – remember, that’s each data frame 1 level into my list.
You can see the results. That’s the equivalent of a for loop like the code below.
OK, but what if we just want to get the value of a column, and not modify it? Just use the column as the value
The first two lines show formulas for extracting the value of each data frame’s location.lat column – the tilde, the dot for data frame and then the usual way of representing a data frame column, either with the dollar sign or double brackets.
It actually also works to use the column name in quotation marks as a function – so no tilde first, as in a formula – as you can see in the third line.
We’re almost done, but remember – we want only the first value in each data frame, not both of them if there are two. Each value in our results is a vector – remember, in R, even one value is a vector of length 1. So we can use bracket 1 to get the first result in each data frame row, whether there are 1 or 2.
Now we’ve got exactly what we want … but in a list. To turn that into a vector, pipe the results into purrr’s as_vector function
That’s it! Now it’s a simple task to add vectors to the original data frame with mutate.
Create the latitude vector, create the longitude vector, and then add them to the original data.
That’s it for this episode, thanks for watching! For more R tips, head to the More With R video page at go.infoworld.com/morewithR. That’s https go dot infoworld dot com slash more with R, all lowercase except for the R. Or, you can subscribe to “Do More With R” on YouTube. So long, and hope to see you next episode!
Popular
Featured videos from IDG.tv