Package gcite. R topics documented: February 2, Type Package Title Google Citation Parser Version Date Author John Muschelli

Similar documents
Package kirby21.base

Package PubMedWordcloud

Package gifti. February 1, 2018

Package scraep. July 3, Index 6

Package htmltidy. September 29, 2017

Package messaging. May 27, 2018

Package datasets.load

Package clipr. June 23, 2018

Package fastdummies. January 8, 2018

Package robotstxt. November 12, 2017

Package spark. July 21, 2017

Package oasis. February 21, 2018

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package scraep. November 15, Index 6

Package calpassapi. August 25, 2018

Package githubinstall

Package projector. February 27, 2018

Package inpdfr. December 20, 2017

Package dkanr. July 12, 2018

Package nonlinearicp

Package stapler. November 27, 2017

Package gtrendsr. August 4, 2018

Package wordcloud. August 24, 2018

Package crossword.r. January 19, 2018

Package canvasxpress

Package sankey. R topics documented: October 22, 2017

Package wikitaxa. December 21, 2017

Package fastqcr. April 11, 2017

Package imgur. R topics documented: December 20, Type Package. Title Share plots using the imgur.com image hosting service. Version 0.1.

Package condusco. November 8, 2017

Package repec. August 31, 2018

Package validara. October 19, 2017

Package narray. January 28, 2018

Package sessioninfo. June 21, 2017

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package postal. July 27, 2018

Package IgorR. May 21, 2017

Package geniusr. December 6, 2017

Package rvest. R topics documented: August 29, Version Title Easily Harvest (Scrape) Web Pages

Package gtrendsr. October 19, 2017

Package fitbitscraper

Package rcmdcheck. R topics documented: November 10, 2018

Package WordR. September 7, 2017

Package spelling. December 18, 2017

Package filelock. R topics documented: February 7, 2018

Package raker. October 10, 2017

Package smapr. October 20, 2017

Package io. January 15, 2018

Package rgho. R topics documented: January 18, 2017

Package SFtools. June 28, 2017

Package epitab. July 4, 2018

Package geogrid. August 19, 2018

Package WhiteStripe. April 19, 2018

Package lexrankr. December 13, 2017

Package tm.plugin.factiva

Package datapasta. January 24, 2018

Package ECctmc. May 1, 2018

Package rsppfp. November 20, 2018

Package ecoseries. R topics documented: September 27, 2017

Package tm.plugin.factiva

Package mdftracks. February 6, 2017

Package autoshiny. June 25, 2018

Package CatEncoders. March 8, 2017

Package data.world. April 5, 2018

Package comparedf. February 11, 2019

Package rvest. R topics documented: February 20, Version Title Easily Harvest (Scrape) Web Pages

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package hypercube. December 15, 2017

Package tidyimpute. March 5, 2018

Package GDELTtools. February 19, 2015

Package loggit. April 9, 2018

Package guardianapi. February 3, 2019

Package censusapi. August 19, 2018

Package corenlp. June 3, 2015

Package librarian. R topics documented:

Package essurvey. August 23, 2018

Package stringb. November 1, 2016

Package barcoder. October 26, 2018

Package promote. November 15, 2017

Package covr. October 18, 2018

Package rdryad. June 18, 2018

Package SEMrushR. November 3, 2018

Package semver. January 6, 2017

Package embed. November 19, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package pmg. R topics documented: March 9, Version Title Poor Man s GUI. Author John Verzani with contributions by Yvonnick Noel

Package snakecase. R topics documented: March 25, Version Date Title Convert Strings into any Case

Package inca. February 13, 2018

Package Ohmage. R topics documented: February 19, 2015

Package sejmrp. March 28, 2017

Package PCADSC. April 19, 2017

Package shinyfeedback

Package keyholder. May 19, 2018

Package opencage. January 16, 2018

Package BANEScarparkinglite

Package swatches. December 21, 2017

Package statsdk. September 30, 2017

Package vinereg. August 10, 2018

Transcription:

Type Package Title Google Citation Parser Version 0.9.2 Date 2018-02-01 Author John Muschelli Package gcite February 2, 2018 Maintainer John Muschelli <muschellij2@gmail.com> Scrapes Google Citation pages and creates data frames of citations over time. License GPL-3 LazyData true LazyLoad yes Imports xml2, httr, rvest, stats, pbapply, data.table, wordcloud, tm, graphics RoxygenNote 6.0.1.9000 Encoding UTF-8 Suggests covr, testthat NeedsCompilation no Repository CRAN Date/Publication 2018-02-02 18:05:27 UTC R topics documented: author_cloud........................................ 2 gcite............................................. 3 gcite_author_info...................................... 4 gcite_citation_index..................................... 5 gcite_citation_page..................................... 6 gcite_cite_over_time.................................... 7 gcite_graph......................................... 8 gcite_main_graph...................................... 9 1

2 author_cloud gcite_papers......................................... 9 gcite_paper_df....................................... 10 gcite_stopwords....................................... 11 gcite_url........................................... 11 gcite_username....................................... 12 gcite_user_info....................................... 13 gcite_wordcloud...................................... 14 gcite_wordcloud_spec................................... 14 is_travis........................................... 15 set_cookies_txt....................................... 15 title_cloud.......................................... 16 Index 17 author_cloud Make Wordcloud of authors from Papers Takes a vector of authors and then creates a frequency table of those words and plots a wordcloud author_cloud(authors, addstopwords = gcite_stopwords(), author_pattern = NULL, split = ",", verbose = TRUE, colors = c("#66c2a4", "#41AE76", "#238B45", "#006D2C", "#00441B"),...) author_frequency(authors, author_pattern = NULL, split = ",", addstopwords = gcite_stopwords(), verbose = TRUE) authors Vector of authors of papers addstopwords Additional words to remove from wordcloud author_pattern regular expression for patterns to exclude from individual authors split split author names (default ","), passed to strsplit verbose Print diagnostic messages colors color words from least to most frequent. Passed to gcite_wordcloud_spec... additional options passed to gcite_wordcloud_spec A data.frame of the words and the frequencies of the authors

gcite 3 ## Not run: L = gcite_author_info("john Muschelli") paper_df = L$paper_df authors = paper_df$authors author_cloud(authors) ## End(Not run) gcite Google Citations Information Wraps getting the information from Google Citations and plotting the wordcloud gcite(author, user, plot_wordcloud = TRUE, author_args = list(), title_args = list(), warn = FALSE, force = FALSE, sleeptime = 0,...) author user author name separated by spaces user ID for google Citations plot_wordcloud should the wordcloud be plotted author_args title_args warn force sleeptime to pass to author_cloud to pass to title_cloud should warnings be printed from wordcloud? If passing a URL and there is a failure, should the program return NULL, passed to gcite_citation_page time in seconds between http requests, to avoid Google Scholar rate limit... additional options passed to gcite_user_info and therefore GET List from either gcite_user_info or gcite_author_info

4 gcite_author_info gcite_author_info Getting User Information from name Calls gcite_user_info after getting the user identifier gcite_author_info(author, ask = TRUE, pagesize = 100, verbose = TRUE, secure = TRUE, force = FALSE, read_citations = TRUE, sleeptime = 0,...) author ask pagesize verbose secure force author name separated by spaces If multiple authors are found, should a menu be given Size of pages, max 100, passed to gcite_url Print diagnostic messages use https vs. http If passing a URL and there is a failure, should the program return NULL, passed to gcite_citation_page read_citations Should all citation pages be read? sleeptime time in seconds between http requests, to avoid Google Scholar rate limit... Additional arguments passed to GET A list of citations, citation indices, and a data.frame of authors, journal, and citations, and a data.frame of the links to all paper URLs. ## Not run: df = gcite_author_info(author = "John Muschelli", secure = FALSE) ## End(Not run) df = gcite_author_info(author = "Jiawei Bai", secure = FALSE)

gcite_citation_index 5 gcite_citation_index Parse Google Citation Index Parses a google citation indices (h-index, etc.) from main page gcite_citation_index(doc,...) ## S3 method for class 'xml_node' gcite_citation_index(doc,...) ## S3 method for class 'xml_document' gcite_citation_index(doc,...) ## S3 method for class 'character' gcite_citation_index(doc,...) doc A xml_document or the url for the main page... Additional arguments passed to GET if doc is a URL A matrix of indices library(httr) library(rvest) library(gcite) url = "https://scholar.google.com/citations?user=t9eqzgmaaaaj" url = gcite_url(url = url, pagesize = 10, cstart = 0) ind = gcite_citation_index(url) doc = content(httr::get(url)) ind = gcite_citation_index(doc) ind_nodes = rvest::html_nodes(doc, "#gsc_rsb_st")[[1]] ind = gcite_citation_index(ind_nodes)

6 gcite_citation_page gcite_citation_page Parse Google Citation Index Parses a google citation indices (h-index, etc.) from main page gcite_citation_page(doc, title = NULL, force = FALSE,...) ## S3 method for class 'xml_nodeset' gcite_citation_page(doc, title = NULL, force = FALSE,...) ## S3 method for class 'xml_document' gcite_citation_page(doc, title = NULL, force = FALSE,...) ## S3 method for class 'character' gcite_citation_page(doc, title = NULL, force = FALSE,...) ## S3 method for class 'list' gcite_citation_page(doc, title = NULL, force = FALSE,...) ## Default S3 method: gcite_citation_page(doc, title = NULL, force = FALSE,...) doc title force A xml_document or the url for the main page title of the article If passing a URL and there is a failure, should the program return NULL?... arguments passed to GET A matrix of indices library(httr) library(rvest) url = paste0("https://scholar.google.com/citations?view_op=view_citation&", "hl=en&oe=ascii&user=t9eqzgmaaaaj&pagesize=100&",

gcite_cite_over_time 7 "citation_for_view=t9eqzgmaaaaj:w7oemfmy1hyc") url = gcite_url(url = url, pagesize = 10, cstart = 0) ind = gcite_citation_page(url) doc = content(httr::get(url)) ind = gcite_citation_page(doc) ind_nodes = html_nodes(doc, "#gsc_vcd_table div") ind_nodes = html_nodes(ind_nodes, xpath = '//div[@class = "gs_scl"]') ind = gcite_citation_page(ind_nodes) gcite_cite_over_time Parse Google Citations Over Time Parses a google citations over time from the main Citation page gcite_cite_over_time(doc,...) ## S3 method for class 'xml_node' gcite_cite_over_time(doc,...) ## S3 method for class 'xml_document' gcite_cite_over_time(doc,...) ## S3 method for class 'character' gcite_cite_over_time(doc,...) ## Default S3 method: gcite_cite_over_time(doc,...) doc A xml_document or the url for the main page... arguments passed to GET A matrix of citations library(httr) library(rvest) url = "https://scholar.google.com/citations?user=t9eqzgmaaaaj"

8 gcite_graph url = gcite_url(url = url, pagesize = 10, cstart = 0) ind = gcite_cite_over_time(url) doc = content(httr::get(url)) ind = gcite_cite_over_time(doc) ind_nodes = rvest::html_nodes(doc, ".gsc_md_hist_b") ind = gcite_cite_over_time(ind_nodes) gcite_graph Parse Google Citation Graph Parses a google citation bar graph from html gcite_graph(citations,...) ## S3 method for class 'xml_node' gcite_graph(citations,...) ## S3 method for class 'xml_document' gcite_graph(citations,...) ## S3 method for class 'character' gcite_graph(citations,...) ## Default S3 method: gcite_graph(citations,...) citations A list of nodes or xml_node... arguments passed to GET A matrix of citations and years

gcite_main_graph 9 gcite_main_graph Parse Google Citation Graph Parses a google citation bar graph from html gcite_main_graph(citations,...) ## S3 method for class 'xml_document' gcite_main_graph(citations,...) ## S3 method for class 'character' gcite_main_graph(citations,...) ## Default S3 method: gcite_main_graph(citations,...) citations A list of nodes or xml_node... arguments passed to GET A matrix of citations and years gcite_papers Parse Google Citation Index Parses a google citation indices (h-index, etc.) from main page gcite_papers(doc,...) ## S3 method for class 'xml_nodeset' gcite_papers(doc,...) ## S3 method for class 'xml_document' gcite_papers(doc,...)

10 gcite_paper_df ## S3 method for class 'character' gcite_papers(doc,...) ## Default S3 method: gcite_papers(doc,...) doc A xml_document or the url for the main page... Additional arguments passed to GET if doc is a URL A matrix of indices library(httr) library(rvest) url = "https://scholar.google.com/citations?user=t9eqzgmaaaaj" url = gcite_url(url = url, pagesize = 10, cstart = 0) ind = gcite_papers(url) doc = content(httr::get(url)) ind = gcite_papers(doc) ind_nodes = rvest::html_nodes(doc, "#gsc_a_b") ind = gcite_papers(ind_nodes) gcite_paper_df Get Paper Data Frame from Title URLs Get Paper Data Frame from Title URLs gcite_paper_df(urls, verbose = TRUE, force = FALSE, sleeptime = 0,...) urls verbose force sleeptime A character vector of urls, from all_papers$title_link Print diagnostic messages If passing a URL and there is a failure, should the program return NULL, passed to gcite_citation_page time in seconds between http requests, to avoid Google Scholar rate limit... Additional arguments passed to GET

gcite_stopwords 11 A data.frame of authors, journal, and citations L = gcite_user_info(user = "uervkpyaaaaj", read_citations = FALSE) urls = L$all_papers$title_link paper_df = gcite_paper_df(urls = urls, force = TRUE) gcite_stopwords Google Cite Stopwords Additional stopwords to remove from Google Cite results gcite_stopwords() Character Vector gcite_stopwords() gcite_url Google Citations URL Simple wrapper for adding in pagesize and start values for the page gcite_url(url, cstart = 0, pagesize = 100) gcite_base_url(secure = TRUE) gcite_user_url(user, secure = TRUE)

12 gcite_username url cstart URL of the google citations page Starting value for the citation page pagesize number of citations to return, max is 100 secure user A character string should https be used (default), instead of http Username/user ID for Google Scholar Citations url = "https://scholar.google.com/citations?user=t9eqzgmaaaaj" gcite_url(url = url, pagesize = 100, cstart = 5) gcite_username Google Citation Username Seracher Search Google Citation for an author username gcite_username(author, verbose = TRUE, ask = TRUE, secure = TRUE,...) author verbose ask secure author name separated by spaces Verbose diagnostic printing If multiple authors are found, should a menu be given use https vs. http... arguments passed to GET A character vector of the username of the author gcite_username("john Muschelli")

gcite_user_info 13 gcite_user_info Getting User Information of papers Loops through pages for all information on Google Citations gcite_user_info(user, pagesize = 100, verbose = TRUE, secure = TRUE, force = FALSE, read_citations = TRUE, sleeptime = 0,...) user pagesize verbose secure force user ID for google Citations Size of pages, max 100, passed to gcite_url Print diagnostic messages use https vs. http If passing a URL and there is a failure, should the program return NULL, passed to gcite_citation_page read_citations Should all citation pages be read? sleeptime time in seconds between http requests, to avoid Google Scholar rate limit... Additional arguments passed to GET A list of citations, citation indices, and a data.frame of authors, journal, and citations, and a data.frame of the links to all paper URLs and the character string of the user name. ## Not run: df = gcite_user_info(user = "uervkpyaaaaj") ## End(Not run)

14 gcite_wordcloud_spec gcite_wordcloud Wordcloud of Google Citations Information Simple wrapper for author_cloud and title_cloud gcite_wordcloud(paper_df, author_args = list(), title_args = list(), warn = FALSE) paper_df author_args title_args warn A data.frame with columns of authors and titles to pass to author_cloud to pass to title_cloud should warnings be printed from wordcloud? gcite_wordcloud_spec gcite Wordcloud default Simple wrapper for wordcloud with different defaults gcite_wordcloud_spec(words, freq, min.freq = 1, max.words = Inf, random.order = FALSE, colors = c("#f768a1", "#DD3497", "#AE017E", "#7A0177", "#49006A"), vfont = c("sans serif", "plain"),...) words words to be plotted freq the frequency of those words min.freq words with frequency below min.freq will not be plotted max.words Maximum number of words to be plotted. least frequent terms dropped random.order plot words in random order. If false, they will be plotted in decreasing frequency colors color words from least to most frequent vfont passed to text for the font... additional options passed to wordcloud Nothing

is_travis 15 is_travis Check if on Travis CI Simple check for Travis CI for examples is_travis() Logical if user is named travis is_travis() set_cookies_txt Set Cookies from Text file Set Cookies from Text file set_cookies_txt(file) file tab-delimited text file of cookies, to be read in using readlines. Comments should start the line with the pound symbol Either NULL if no domains contain the word "scholar", or an object of class request from set_cookies Note This function searches for domains that contain the word "scholar"

16 title_cloud title_cloud Make Wordcloud of Titles from Papers Takes a vector of titles and then creates a frequency table of those words and plots a wordcloud title_cloud(titles, addstopwords = gcite_stopwords(),...) paper_cloud(...) title_word_frequency(titles, addstopwords = NULL) titles addstopwords Vector of titles of papers Additional words to remove from wordcloud... additional options passed to gcite_wordcloud_spec A data.frame of the words and the frequencies of the title words ## Not run: L = gcite_author_info("john Muschelli") paper_df = L$paper_df titles = paper_df$title title_cloud(titles) ## End(Not run)

Index author_cloud, 2, 3, 14 author_frequency (author_cloud), 2 gcite, 3 gcite_author_info, 3, 4 gcite_base_url (gcite_url), 11 gcite_citation_index, 5 gcite_citation_page, 3, 4, 6, 10, 13 gcite_cite_over_time, 7 gcite_graph, 8 gcite_main_graph, 9 gcite_paper_df, 10 gcite_papers, 9 gcite_stopwords, 11 gcite_url, 4, 11, 13 gcite_user_info, 3, 4, 13 gcite_user_url (gcite_url), 11 gcite_username, 12 gcite_wordcloud, 14 gcite_wordcloud_spec, 2, 14, 16 GET, 3 10, 12, 13 is_travis, 15 paper_cloud (title_cloud), 16 readlines, 15 set_cookies, 15 set_cookies_txt, 15 strsplit, 2 title_cloud, 3, 14, 16 title_word_frequency (title_cloud), 16 wordcloud, 14 17