Getting data from pdfs using the pdftools package - Econometrics and Free Software

Scraping pdf files into R

It is often the case that data is trapped inside pdfs, but thankfully there are ways to extract it from the pdfs. A very nice package for this task is pdftools (Github link) and this blog post will describe some basic functionality from that package.

First, letâs find some pdfs that contain interesting data. For this post, I am using the diabetes country profiles from the World Health Organization. You can find them here. If you open one of these pdfs, you are going to see this:

Getting data from pdfs using the pdftools package – Econometrics and Free Software

Like this:

Related

Share this:

Like this:

Related