I am pleased to introduce requestsR, an R interface for Python’s Requests module.

Background

R is a great language for dealing with web data. R has a bunch of fantastic packages for handling web-data the ones I recommend most being curl, httr, xml2, jsonlite and rvest.

This package is not meant to replace these packages, and with the exception of httr, this package is actually meant to be used in concert with them.

That said, performing complex web-requests is extremely difficult to do in R. Turns out, our brethren in the Python community developed a module that elegently deals with web requests meant to the most complex of web requests.

I built this package to unleash the power Requests inside of R.

I like to think of Requests as the Bo Jackson of web interaction tools.

Now that it has an R package it plays 2 languages but more importantly it’s API is clean, agile, elegant while also packing the force of a freight-train.

However, like Bo Jackson, there are times where using its powers may be overkill.

For really basic web-scraping tasks requestsR isn’t any better than httr or rvest, it will work just the same but if you are facing a daunting task and you cannot figure out a way to get it into the end-zone using httr or curl call on requestsR. If you can replicate the inputs to the request call this package will get the job done.


Quickstart

Now lets demonstrate how to use the package.


Installation

Right now the package is only available on Github.

You can install the package as follows:

devtools::install_github("abresler/requestsR")

Load packages

First thing we need to do is load the packages we are going to use.

library(dplyr)
library(reticulate)
library(jsonlite)
library(requestsR)
library(rvest)
library(listviewer)

Basic GET request

This example demonstrates how to perform the most basic GET request.

resp <-
  Get(url = 'https://api.github.com/events')

In addition to mimicking the Python API the package contains some additional functions that assist in parsing the response’s JSON and html.

Let’s demonstrate how to take the response and return parsed JSON.

json <- 
  resp %>%
  parse_response_json(is_data_frame = F)

We can now view the parsed JSON data.

json %>% 
  jsonedit()

Basic POST request

This example demonstrates how to use the [POST](https://en.wikipedia.org/wiki/POST_(HTTP) API.

In this example we will pass along additional parameters to the request, in this case data.

To pass along this parameter one must use either a named list or a reticulate::dict which mimics Python’s dictionary structure.

There are other cases where you may want to pass along reticulate::tuple parameters, something I will demonstrate in the next example.

post_resp <- 
  resp <-
  Post(url = 'http://httpbin.org/post', data = list(key = "value"))

As we did in the prior example we can parse the response and explore the JSON.

post_resp %>%
  parse_response_json(is_data_frame = FALSE) %>% 
  jsonedit()

Working with Tuples

There are times when you need to pass along tuples as inputs.

A tuple is the Python version of an unnamed list. Using the reticulate package we can easily create a tuple object in R, making it easy to generate requests that require tuple inputs

payload <-
  tuple(tuple('key1', 'value1'),
        tuple('key1', 'value2'))
resp_tuple <- Post('http://httpbin.org/post', data = payload)
resp_tuple %>% 
  parse_response_json()
$args
named list()

$data
[1] ""

$files
named list()

$form
$form$key1
[1] "value1" "value2"


$headers
$headers$Accept
[1] "*/*"

$headers$`Accept-Encoding`
[1] "gzip, deflate"

$headers$Connection
[1] "close"

$headers$`Content-Length`
[1] "23"

$headers$`Content-Type`
[1] "application/x-www-form-urlencoded"

$headers$Host
[1] "httpbin.org"

$headers$`User-Agent`
[1] "python-requests/2.18.4"


$json
NULL

$origin
[1] "67.254.199.56"

$url
[1] "http://httpbin.org/post"

Authenticating

Requests also makes it very easy to authenticate.

This package contains a special input parameter called auth.

If you need to authenticate just include a named list with the user information and it should work.

Let me demonstrate an example using Github’s API. Please note in order for you to recreate this you must substitute my github information with your own.

'https://api.github.com/user' %>% 
  Get(auth = list(user = "abresler",
            password = pwd)) %>% 
  parse_response_json(is_data_frame = TRUE) %>% 
  select(1:5)

Complex Requests

As stated in the introduction this is the primary reason why I built this package.

If you are trying to pull data from somewhere that requires headers, cookies, data and/or a payload that can be extremely difficult to do in R but really easy to do in Python via requests.

Let’s demonstrate using an example that includes headers, cookies and a payload parameter.

In this example I will also use a package function that converts a python dictionary object into an R named list. This is makes converting cURL parameters via a tool like this quick and easy for R use.

payload <- 
  "{'key1': 'value1', 'key2': 'value2'}" %>% 
  convert_dictionary_to_list()

headers <-
  "{'user-agent': 'my-app/0.0.1'}" %>%
  convert_dictionary_to_list()

Next lets use a named list to create the custom cookies.

cookies_list <-
  list(cookies_are = 'working')

Now we can put them all together and issue a POST request to showcase how simply you can go about making a complex request.

resp_complex <-
  Post(
    "http://httpbin.org/post",
    data = payload,
    headers = headers,
    cookies = cookies_list
  )

Now we can explore the results.

resp_complex %>% 
  parse_response_json() %>% 
  jsonedit()

Working with HTML and XML

One thing to always remember when using requestsR is that you can use it the same as you would use httr or any other R web interaction package.

All you need to do is to ensure that the response’s content is parsed to html and then it is ready to use with xml2, rvest or any package you use to work with HTML/XML content.

Let’s demonstrate how to do this with an example that pulls in the top web-story urls on the website Drudge Report

resp_drudge <- 
  "http://www.drudgereport.com/" %>% 
  Get()

page <- 
  resp_drudge %>% 
  parse_response_html()

page %>% 
  html_nodes(css = "#app_topstories a") %>% 
  html_attr('href')

Other Reponse Object Features

One of the benefits of the requests API is that it returns an object with tons of pertinent information. This includes not just the content of a successfully executed request but number of other potentially useful bis of information.

Lets use the object generated from the first request to take a look at a few of these useful bits of information.

Encoding

resp$apparent_encoding

Status Code

resp$status_code

Content

resp$content

Headers

resp$headers

There are a number of other things you can do with a response object. To better understand what they are you should spend some time working through the Python documentation.

That’s all for this introduction. I hope you find this package useful and that it also demonstrates the power of linking the Python and R.

