Can't import this excel file into R -
i'm having trouble importing file r. file obtained website: https://report.nih.gov/award/index.cfm, clicked "import table" , downloaded .xls file year 1992.
this image might describe how retrieved data
here's i've tried typing console, along results:
input:
> library('readxl') > data1992 <- read_excel("1992.xls")
output:
not excel file error in eval(substitute(expr), envir, enclos) : failed open /home/chrx/documents/nih funding awards, 1992 - 2016/1992.xls
input:
> data1992 <- read.csv ("1992.xls", sep ="\t")
output:
error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns column names
i'm not sure whether or not relevant, i'm using galliumos (linux). because i'm using linux, excel isn't installed on computer. libreoffice is.
why bother getting data in , out of .csv if it's right there on web page scrape?
# note query parameters in url when apply filter, e.g. fy= url <- 'http://report.nih.gov/award/index.cfm?fy=1992' library('rvest') library('magrittr') library('dplyr') df <- url %>% read_html() %>% html_nodes(xpath='//*[@id="orgtable"]') %>% html_table()%>% extract2(1) %>% mutate(funding = as.numeric(gsub('[^0-9.]','',funding))) head(df)
returns
organization city state country awards funding 1 a.t. still university of health sciences kirksville mo united states 3 356221 2 aac associates, inc. vienna va united states 10 1097158 3 aaron diamond aids research center new york ny united states 3 629946 4 abbott laboratories north chicago il united states 4 1757241 5 abiomed, inc. danvers ma united states 6 2161146 6 abratech corporation sausalito ca united states 1 450411
if need loop through years 1992 present, or similar, programmatic approach save lot of time versus handling bunch of flat files.
Comments
Post a Comment