A couple of my good friends also recently started a sports analytics blog. We’ve decided to collaborate on a couple of studies revolving around NBA data found at www.basketball-reference.com. This will be the first part of that project!
Data scientists need data. The internet has lots of data. How can I get that data into R? Scrape it!
People have been scraping websites for as long as there have been websites. It’s gotten pretty easy using R/Python/whatever other tool you want to use. This post shows how to use R to scrape the demographic information for all NBA and ABA players listed at www.basketball-reference.com.
Here’s the code:
###### Settings
library(XML)
###### URLs
url<-paste0("http://www.basketball-reference.com/players/",letters,"/")
len<-length(url)
###### Reading data
tbl<-readHTMLTable(url[1])[[1]]
for (i in 2:len)
{tbl<-rbind(tbl,readHTMLTable(url[i])[[1]])}
###### Formatting data
colnames(tbl)<-c("Name","StartYear","EndYear","Position","Height","Weight","BirthDate","College")
tbl$BirthDate<-as.Date(tbl$BirthDate[1],format="%B %d, %Y")
Created by Pretty R at inside-R.org
And here’s the result:Result
Source: http://www.r-bloggers.com/scraping-xml-tables-with-r/
Data scientists need data. The internet has lots of data. How can I get that data into R? Scrape it!
People have been scraping websites for as long as there have been websites. It’s gotten pretty easy using R/Python/whatever other tool you want to use. This post shows how to use R to scrape the demographic information for all NBA and ABA players listed at www.basketball-reference.com.
Here’s the code:
###### Settings
library(XML)
###### URLs
url<-paste0("http://www.basketball-reference.com/players/",letters,"/")
len<-length(url)
###### Reading data
tbl<-readHTMLTable(url[1])[[1]]
for (i in 2:len)
{tbl<-rbind(tbl,readHTMLTable(url[i])[[1]])}
###### Formatting data
colnames(tbl)<-c("Name","StartYear","EndYear","Position","Height","Weight","BirthDate","College")
tbl$BirthDate<-as.Date(tbl$BirthDate[1],format="%B %d, %Y")
Created by Pretty R at inside-R.org
And here’s the result:Result
Source: http://www.r-bloggers.com/scraping-xml-tables-with-r/
No comments:
Post a Comment