As an HR professional, LinkedIn has been a popular tool for recruitment over the last few years.
When it comes to HR analytics, the big question is how can we get useful information from LinkedIn? You may have heard some rumours to suggest that you shouldn’t or are not allowed to scrape data from LinkedIn, but that is not the case (as we’ll get into shortly).
LinkedIn is packed with data and if you know how to use it, you can gain some valuable insights for your company. Let’s take a look:
The data engine of LinkedIn
To understand why LinkedIn is such a valuable source of data, you have to see the size of it and the number of data points it gathers. Here are just a few key statistics:
- There are around 610 million users of the platform
- 40% of its 303 million active monthly users visit the site daily
- 90 million users are senior-level influencers
- There are 40 million students and recent graduates on LinkedIn
- There are 50,000 skills listed on LinkedIn
- There are around 30 million companies and 20 million job listings
- There are 100 million job applications via LinkedIn each month
- There have been 11 billion endorsements made on LinkedIn.
That’s just the tip of the iceberg. There are millions of pieces of data on the platform, which is a huge reason why it attracted the attention of data scientists with the ability to scrape it.
LinkedIn offers millions of data points, updating daily, a goldmine for HR Analytics Click To TweetYes, you can scrape LinkedIn
The reason you may have heard rumours that scraping LinkedIn data is prohibited is because of a recent court case about the matter. The thrust of the case was that LinkedIn alleged scraping data was a violation of the privacy of its users.
A San Francisco startup, hiQ Labs, harvests data from public LinkedIn profiles and uses it to perform analysis, answering questions such as when people might be likely to leave their job or where skills shortages are likely to happen.
LinkedIn took steps to block hiQ from scraping the data, for which hiQ won an injunction a couple of years ago to remove the block. That decision was recently upheld in the 9th U.S. Circuit Court of Appeals. The underlying opinion is that people who make data publicly available on their profiles do not have a reasonable expectation of privacy for that data.
The bottom line? Yes, HR Analysts and other data scientists can scrape LinkedIn.
LinkedIn’s own insights platform
LinkedIn launched its own analytics platform, Talent Insights, in 2018. Built on the premise that “every update and every action is a realtime look at the global workforce,” the purpose of the platform is to deliver key insights into talent and markets, while delivering data points in an intuitive way. They even state that data should be easily interpreted by someone who isn’t a data scientist.
Thus far, they’ve had positive reviews from some big players, including Intel who used Talent Insights to discover the best way to target an employer branding exercise. They also give examples such as companies analysing data on where talent pools are concentrated so that they know where to set up a new office.
The platform allows analysts to run two different reports; the Talent Pool report or the Company report. The former helps to answer questions about talent, such as where it is and which schools are producing it, while the latter gives specific company insights.
The sorts of questions that can be answered include:
- How does this talent engage with my brand?
- Where is the talent pool located?
- Where is my company winning and losing talent?
- What workforce segments are most at risk?
- What skills does this company have?
- Where are our competitors recruiting from?
- Where should we open our next office?
Talent Insights is robust enough to help companies with workforce planning, sourcing strategy, employer branding, competitive intelligence and geolocation decisions. It makes sense to harness the millions of daily data points for key HR analytical purposes.
With Talent Insights providing a lot of valuable information, you might wonder why other companies (or yourself) would want to scrape data themselves. From what I can see, they do because they have their own sets of questions to answer. Also, this is a service that LinkedIn charges for – companies will put their own data scientists onto it if they can, and companies like hiQ can profit.
The fact that LinkedIn is investing in an HR Analytics platform says a lot in itself about the growth of the discipline. Businesses are slowly but surely catching on to the value that insights from people analytics can bring and LinkedIn is there to capitalise early on the need for usable data.
Scraping LinkedIn for data insights
As established, if you know how, you can scrape data from LinkedIn yourself. The information is considered to be public, it’s not the same as your company HR data collected directly from employees, where they can expect privacy.
So how do you go about scraping LinkedIn? If you’ve seen other posts from me, you’ll know that I’m a fan of R as a tool for data scraping and analysis. You might also choose to use something like Python or Selenium, if either of those are your preference.
One thing to know is that the data you can scrape is limited to that which is publicly available already. LinkedIn was previously blocking scraping tools in an effort to maintain exclusive abilities to use the data themselves, but they were ordered to stop doing this.
The rvest package in R can help you to scrape LinkedIn for useful information. For example, you can run a program to capture a person’s name, location, number of contacts, summary, skills and endorsements from LinkedIn. The web scraper you set up will need to login to LinkedIn as you can’t get this information without being logged in. Below is a scraper code using rvest, taken from Github.
library(rvest) | |
scrape_linkedin <- function(user_url) { | |
linkedin_url <- “http://linkedin.com/” | |
pgsession <- html_session(linkedin_url) | |
pgform <- html_form(pgsession)[[1]] | |
filled_form <- set_values(pgform, | |
session_key = username, | |
session_password = password) | |
submit_form(pgsession, filled_form) | |
pgsession <- jump_to(pgsession, user_url) | |
page_html <- read_html(pgsession) | |
name <- | |
page_html %>% html_nodes(“#name”) %>% html_text() | |
location <- | |
page_html %>% html_nodes(“#location .locality”) %>% html_text() | |
num_connections <- | |
page_html %>% html_nodes(“.member-connections strong”) %>% html_text() | |
summary <- | |
page_html %>% html_nodes(“#summary-item-view”) %>% html_text() | |
skills_nodes <- | |
page_html %>% html_nodes(“#profile-skills .skill-pill”) | |
skills <- | |
lapply(skills_nodes, function(node) { | |
num <- node %>% html_nodes(“.num-endorsements”) %>% html_text() | |
name <- node %>% html_nodes(“.endorse-item-name-text”) %>% html_text() | |
data.frame(name = name, num = num) | |
}) | |
skills <- do.call(rbind, skills) | |
list( | |
name = name, | |
location = location, | |
num_connections = num_connections, | |
summary = summary, | |
skills = skills | |
) |
As you can see, if you can get your hands on public LinkedIn data at-scale, you can answer the questions that LinkedIn Talent Insights can answer, and more
Final thoughts
LinkedIn can be a valuable platform from which to gather valuable data that your company can use for decision-making. One of the factors that makes it unique is that it is already a tool devoted to workforce talent.
The whole purpose of the platform is for people to connect and further or promote their careers, or find team members for their business. The data there all relates to HR in some way, it’s just a matter of accessing it and interpreting it at scale.
Have you used the Talent Insights platform, or do you choose to scrape LinkedIn data yourself? I’d be interested to hear of your experiences.