Package corenlp. June 3, 2015

Similar documents
Package datasets.load

Package phrasemachine

Package scraep. July 3, Index 6

Package rpst. June 6, 2017

Final Project Discussion. Adam Meyers Montclair State University

Package QCAtools. January 3, 2017

Package cleannlp. January 22, Type Package

Package kirby21.base

Package scraep. November 15, Index 6

Package Rdrools. July 22, 2017

Package MicroStrategyR

Package jstree. October 24, 2017

Package gtrendsr. October 19, 2017

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

Package tm.plugin.lexisnexis

Machine Learning in GATE

Package gtrendsr. August 4, 2018

Package syuzhet. December 14, 2017

Package available. November 17, 2017

Package sankey. R topics documented: October 22, 2017

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package robotstxt. November 12, 2017

Package RYoudaoTranslate

Package gcite. R topics documented: February 2, Type Package Title Google Citation Parser Version Date Author John Muschelli

Package sessioninfo. June 21, 2017

Package clipr. June 23, 2018

Package fastqcr. April 11, 2017

Package gridgraphics

Package CorporaCoCo. R topics documented: November 23, 2017

Package textrank. December 18, 2017

Package qualmap. R topics documented: September 12, Type Package

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Package repec. August 31, 2018

Package DataClean. August 29, 2016

Package WriteXLS. March 14, 2013

Package shiny.router

Package WordR. September 7, 2017

TectoMT: Modular NLP Framework

UIMA-based Annotation Type System for a Text Mining Architecture

Package KoNLP. December 15, 2016

Package gifti. February 1, 2018

A tool for Cross-Language Pair Annotations: CLPA

Package mdftracks. February 6, 2017

Package causalmgm. September 14, 2017

Package sylcount. April 7, 2017

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Package mwa. R topics documented: February 24, Type Package

Package queuecomputer

Package IATScore. January 10, 2018

Package spelling. December 18, 2017

Package fastdummies. January 8, 2018

Package autoshiny. June 25, 2018

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package redux. May 31, 2018

Package ctv. October 8, 2017

Text Corpus Format (Version 0.4) Helmut Schmid SfS Tübingen / IMS StuDgart

Package calpassapi. August 25, 2018

Package gsalib. R topics documented: February 20, Type Package. Title Utility Functions For GATK. Version 2.1.

An UIMA based Tool Suite for Semantic Text Processing

Package APSIM. July 24, 2017

Package tm.plugin.factiva

Package tm.plugin.factiva

Package styler. December 11, Title Non-Invasive Pretty Printing of R Code Version 1.0.0

Package CatEncoders. March 8, 2017

Package tm.plugin.factiva

Package io. January 15, 2018

Package weco. May 4, 2018

Package spark. July 21, 2017

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package opencage. January 16, 2018

Package SFtools. June 28, 2017

Package CuCubes. December 9, 2016

Package shinyhelper. June 21, 2018

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Package imgur. R topics documented: December 20, Type Package. Title Share plots using the imgur.com image hosting service. Version 0.1.

Package GetoptLong. June 10, 2018

ANC2Go: A Web Application for Customized Corpus Creation

Package bisect. April 16, 2018

Package lazydata. December 4, 2016

Package manet. September 19, 2017

Package corclass. R topics documented: January 20, 2016

Information Retrieval

Meaning Banking and Beyond

Package politeness. R topics documented: February 15, Type Package Title Detecting Politeness Features in Text Version 0.2.2

Package gmt. February 20, 2015

Morpho-syntactic Analysis with the Stanford CoreNLP

Package darksky. September 20, 2017

Package ssh. June 4, 2018

Package stringb. November 1, 2016

Package rtandem. June 30, 2018

Package poio. R topics documented: January 29, 2017

Package Rserve. R topics documented: February 19, Version Title Binary R server

Package fst. June 7, 2018

Package OsteoBioR. November 15, 2018

Package arulesnbminer

Package OLScurve. August 29, 2016

Package hypercube. December 15, 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Package kdetrees. February 20, 2015

Transcription:

Type Package Title Wrappers Around Stanford CoreNLP Tools Version 0.4-1 Author Taylor Arnold, Lauren Tilton Package corenlp June 3, 2015 Maintainer Taylor Arnold <taylor.arnold@acm.org> Provides a minimal interface for applying annotators from the 'Stanford CoreNLP' java library. Methods are provided for tasks such as tokenisation, part of speech tagging, lemmatisation, named entity recognition, coreference detection and sentiment analysis. Imports rjava, XML, plotrix SystemRequirements Java (>= 7.0); Stanford CoreNLP <http://nlp.stanford.edu/software/corenlp.shtml> (>= 3.5.2) License GPL-2 LazyData true NeedsCompilation no Repository CRAN Date/Publication 2015-06-03 18:49:11 R topics documented: annoetranger.rda...................................... 2 annohp.rda......................................... 2 annotatefile......................................... 3 annotatestring........................................ 3 downloadcorenlp..................................... 4 getcoreference....................................... 5 getdependency....................................... 5 getparse........................................... 6 getsentiment........................................ 6 gettoken........................................... 7 initcorenlp......................................... 7 1

2 annohp.rda loadxmlannotation.................................... 8 parseannoxml....................................... 8 plot.annotation....................................... 9 print.annotation....................................... 9 universaltagset....................................... 10 Index 11 annoetranger.rda Annotation of first two lines of Albert Camus L Etranger Parsed via the Stanford CoreNLP Java Library annoetranger Format a annotation object Author(s) Taylor Arnold, 2015-06-03 annohp.rda Annotation of first line of JK Rowling s The Philosopher s Stone Parsed via the Stanford CoreNLP Java Library annohp Format a annotation object Author(s) Taylor Arnold, 2015-06-03

annotatefile 3 annotatefile Annotate a text file Runs the CoreNLP annotators for the text contained in a given file. The details for which annotators to run and how to run them are specified in the properties file loaded in via the initcorenlp function (which must be run prior to any annotation). annotatefile(file, format = c("obj", "xml", "text"), outputfile = NA, includexsl = FALSE) file format outputfile includexsl a string giving the location of the file to be loaded. the desired output format. Option obj, the default, returns an R object of class annotation and will likely be the desired choice for users loading the output into R. The xml and text exist primarily for saving the files on the disk. character string indicating where to put the output. If set to NA, the output will be returned by the function. boolean. Whether the xml style sheet should be included in the output. Only used if format is xml and outputfile is not NA. annotatestring Annotate a string of text Runs the CoreNLP annotators over a given string of text. The details for which annotators to run and how to run them are specified in the properties file loaded in via the initcorenlp function (which must be run prior to any annotation). annotatestring(text, format = c("obj", "xml", "text"), outputfile = NA, includexsl = FALSE)

4 downloadcorenlp text format outputfile includexsl a vector of strings for which an annotation is desired. Will be collapsed to length 1 using new line characters prior to the annotation. the desired output format. Option obj, the default, returns an R object of class annotation and will likely be the desired choice for most users. The xml and text exist primarily for subsequently saving to disk. character string indicating where to put the output. If set to NA, the output will be returned by the function. boolean. Whether the xml style sheet should be included in the output. Only used if format is xml and outputfile is not NA. ## Not run: initcorenlp() sin <- "Mother died today. Or, maybe, yesterday; I can t be sure." annoobj <- annotatestring(sin) ## End(Not run) downloadcorenlp Download java files needed for corenlp The corenlp package does not supply the raw java files provided by the Stanford NLP Group as they are quite large. This function downloads the libraries for you, by default into the directory where the package was installed. downloadcorenlp(outputloc, type = c("base", "chinese", "spanish", "sr")) outputloc type a string showing where the files are to be downloaded. If missing, will try to download files into the directory where the package was original installed. type of files to download. The base backage, installed by default is required. Other jars include chinese, spanish, and "shift reduce parser models". These will be installed in addition to the base package. ## Not run: downloadcorenlp() downloadcorenlp(type="spanish") ## End(Not run)

getcoreference 5 getcoreference Get Coreference Returns a dataframe containing all coreferences detected in the text. getcoreference(annotation) annotation getcoreference(annohp) getdependency Get Dependencies Returns a data frame of the coreferences of an annotation getdependency(annotation, type = c("ccprocessed", "basic", "collapsed")) annotation type the class of coreference desired getdependency(annoetranger) getdependency(annohp)

6 getsentiment getparse Get parse tree as character vector Returns a character vector of the parse trees. Mostly use for visualization; the output of gettoken will generally be more conveniant for manipulating in R. getparse(annotation) annotation getparse(annoetranger) getsentiment Get Sentiment scores Returns a data frame of the sentiment scores from an annotation getsentiment(annotation) annotation getsentiment(annoetranger) getsentiment(annohp)

gettoken 7 gettoken Get tokens as data frame Returns a data frame of the tokens from. gettoken(annotation) annotation gettoken(annoetranger) initcorenlp Initialize the CoreNLP java object This must be run prior to calling any other CoreNLP functions. It may be called multiple times in order to specify a different parameter set, but note that if you use a different configuration during the same R session it must have a unique name. initcorenlp(libloc, parameterfile, mem = "4g", annotators) libloc parameterfile mem a string giving the location of the CoreNLP java files. This should point to a directory which contains, for example the file "stanford-corenlp-*.jar", where "*" is the version number. If missing, the function will try to find the library in the environment variable CORENLP_HOME, and otherwise will fail. the path to a parameter file. See the CoreNLP documentation for an extensive list of options. If missing, the package will simply specify a list of standard annotators and otherwise only use default values. a string giving the amount of memory to be assigned to the rjava engine. For example, "6g" assigned 6 gigabytes of memory. At least 2 gigabytes are recommended at a minimum for running the CoreNLP package. On a 32bit machine, where this is not possible, setting "1800m" may also work. This option will only have an effect the first time initcorenlp is called, and also will not have an effect if the java engine is already started by a seperate process.

8 parseannoxml annotators optional character string. When parameterfile is missing, this is taken to be the list of annotators that you want to run. Either a length one character vector with annotator names seperated by commas, or a character vector with one annotator per element. ## Not run: initcorenlp() sin <- "Mother died today. Or, maybe, yesterday; I can t be sure." annoobj <- annotatestring(sin) ## End(Not run) loadxmlannotation Load CoreNLP XML file Loads a properly formated XML file output by the CoreNLP library into in R. loadxmlannotation(file, encoding = "unknown") file encoding connection or character string giving the file name to load encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8: it is not used to re-encode the input. Passed to readlines. parseannoxml Parse annotation xml Returns from a character vector containing the xml. Not exported; use loadxmlannotation instead. parseannoxml(xml) xml character vector containing the xml file from an annotation

plot.annotation 9 plot.annotation Print Dependencies and POS of Annotation Print Dependencies and POS of Annotation ## S3 method for class annotation plot(x, y = 1L,...) x y sentence id... other arguments passed to plot print.annotation Print a summary of Print a summary of ## S3 method for class annotation print(x,...) x... other arguments. Currently unused. print(annoetranger)

10 universaltagset universaltagset Convert Penn TreeBank POS to Universal Tagset Maps a character string of English Penn TreeBank part of speech tags into the universal tagset codes. This provides a reduced set of tags (12), and a better cross-linguist model of speech. universaltagset(pennpos) pennpos a character vector of penn tags to match tok <- gettoken(annoetranger) cbind(tok$pos,universaltagset(tok$pos))

Index Topic datasets annoetranger.rda, 2 annohp.rda, 2 annoetranger (annoetranger.rda), 2 annoetranger.rda, 2 annohp (annohp.rda), 2 annohp.rda, 2 annotatefile, 3 annotatestring, 3 downloadcorenlp, 4 getcoreference, 5 getdependency, 5 getparse, 6 getsentiment, 6 gettoken, 6, 7 initcorenlp, 7 loadxmlannotation, 8 parseannoxml, 8 plot.annotation, 9 print.annotation, 9 universaltagset, 10 11