Package HGNChelper August 29, 2017 Maintainer Levi Waldron <lwaldron.research@gmail.com> Depends R (>= 2.10), methods, utils Author Version 0.3.6 Date 2017-08-29 License GPL (>=2.0) Title Handy Functions for Working with HGNC Gene Symbols and Affymetri Probeset Identifiers Contains functions for identifying and correcting HGNC gene symbols which have been converted to date format by Ecel, for reversibly converting between HGNC symbols and valid R names, identifying invalid HGNC symbols and correcting synonyms and outdated symbols which can be mapped to an official symbol. URL https://bitbucket.org/lwaldron/hgnchelper BugReports https://bitbucket.org/lwaldron/hgnchelper/issues R topics documented: HGNChelper-package.................................... 2 affytor........................................... 2 checkgenesymbols..................................... 3 findecelgenesymbols................................... 4 hgnc.table.......................................... 5 rtoaffy........................................... 5 rtosymbol......................................... 6 symboltor......................................... 7 Inde 8 1
2 affytor HGNChelper-package Handy functions for working with HGNC gene symbols and Affymetri probeset identifiers. Details Contains functions for identifying and correcting HGNC gene symbols which have been converted to date format by Ecel, for reversibly converting between HGNC symbols and valid R names, identifying invalid HGNC symbols and correcting synonyms and outdated symbols which can be mapped to an official symbol. Package: HGNChelper Maintainer: Levi Waldron <lwaldron@hsph.harvard.edu> Depends: R (>= 2.10) Author: Version: 0.3.6 Date: 2017-08-29 License: GPL (>2.0) Title: Handy functions for working with HGNC gene symbols and Affymetri probeset identifiers. URL: https://bitbucket.org/lwaldron/hgnchelper BugReports: https://bitbucket.org/lwaldron/hgnchelper affytor function to convert Affymetri probeset identifiers to valid R names This function simply prepends "affy." to the probeset IDs to create valid R names. Reverse operation is done by the rtoaffy function. affytor() vector of Affymetri probeset identifiers, or any identifier which may with a digit.
checkgenesymbols 3 a character vector that is simply with "affy." prepended to each value. checkgenesymbols function to identify outdated or Ecel-mogrified gene symbols This function identifies gene symbols which are outdated or may have been mogrified by Ecel or other spreadsheet programs. If output is assigned to a variable, it returns a data.frame of the same number of rows as the input, with a second column indicating whether the symbols are valid and a third column with a corrected gene list. checkgenesymbols(, unmapped.as.na=true, hgnc.table=null) Vector of gene symbols to check for mogrified or outdated values unmapped.as.na If TRUE, unmapped symbols will appear as NA in the Suggested.Symbol column. If FALSE, the original unmapped symbol will be kept when no correction can be found. hgnc.table If hgnc.table is a data.frame with colnames(hgnc.table) identical to c("symbol", "Approved.Symbol"), it is used to correct gene symbols in. Otherwise, the default table data("hgnc.table", package="hgnchelper") is used. The function used for creating this the hgnc.table dataframe can be found in the inst/hgnclookup.r file, in the source of this package. By default the hgnc.table dataframe shipped with this package is used (see?hgnc.table) The function will return a data.frame of the same number of rows as the input, with corrections possible from hgnc.table. See Also hgnc.table
4 findecelgenesymbols Eamples library(hgnchelper) = c("fn1", "TP53", "UNKNOWNGENE","7-Sep", "9/7", "1-Mar", "Oct4", "4-Oct", "OCT4-PG4", "C19ORF71", "C19orf71") res <- checkgenesymbols() res if (interactive()){ ##Run checkgenesymbols with a brand-new map downloaded from HGNC: source(system.file("hgnclookup.r", package = "HGNChelper")) ##You should save this if you are going to use it multiple times, ##then load it from file rather than burdening HGNC's servers. ## save(hgnc.table, file="hgnc.table.rda", compress="bzip2") ## load("hgnc.table.rda") res <- checkgenesymbols(, hgnc.table=hgnc.table) } findecelgenesymbols function to identify Ecel-mogrified gene symbols This function identifies gene symbols which may have been mogrified by Ecel or other spreadsheet programs. If output is assigned to a variable, it returns a vector of the same length where symbols which could be mapped have been mapped. Note that this function is superceded by checkgenesymbols, which corrects Ecel-mogrified gene symbols as well as aliases and outdated symbols. findecelgenesymbols(, mog.map = read.csv(system.file("etdata/mog_map.csv", package = "HGNChelper"), as.is = TRUE), rege = "[0-9]\\-(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC) [0-9]\\.[0-9] [0-9]E\\+[[0-9][0-9]") mog.map rege Vector of gene symbols to check for mogrified values Map of known mogrifications. A default map is available with this package by data(mog.map), but any map may be used. This should be a dataframe with two columns: original and mogrified, containing the correct and incorrect symbols, respectively. Regular epression, recognized by the base::grep function which is called with ignore.case=true, to identify mogrified symbols. It is not necessary for all matches to have a corresponding entry in mog.map$mogrified; values in which are matched by this rege but are not found in mog.map$mogrified simply will not be corrected. This rege is based that provided by Zeeberg et al., Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Ecel in bioinformatics. BMC Bioinformatics 2004, 5:80.
hgnc.table 5 if the return value of the function is assigned to a variable, the function will return a vector of the same length as the input, with corrections possible from mog.map made. See Also checkgenesymbols hgnc.table All current and withdrawn HGNC symbols. All current and withdrawn symbols from genenames.org. Format A dataframe with the first column providing a gene symbol or known alias (including withdrawn symbols), second column providing the approved HGNC symbol. Source Etracted from genenames.org prior to package build. Eamples data("hgnc.table", package="hgnchelper", envir=environment()) head(hgnc.table) rtoaffy function to convert the output of affytor back to the original Affymetri probeset identifiers. This function simply strips the "affy." added by the affytor function. rtoaffy() the character vector returned by the affytor function.
6 rtosymbol a character vector of Affymetri probeset identifiers. rtosymbol function to reverse the conversion made by symboltor This function reverses the actions of the symboltor function. rtosymbol() the character vector returned by the symboltor function. a character vector of HGNC gene symbols, which are not in general valid R names. Eamples library(hgnchelper) data("hgnc.table", envir=environment()) hgnc.symbols <- as.character(na.omit(unique(hgnc.table[,2]))) if(!identical(all.equal(hgnc.symbols, rtosymbol(make.names(symboltor(hgnc.symbols)))), TRUE)) stop("hgnc mapping was not reversible.")
symboltor 7 symboltor function to *reversibly* convert HGNC gene symbols to valid R names. This function reversibly converts HGNC gene symbols to valid R names by prepending "symbol.", and making the following substitutions: "-" to "hyphen", "@" to "ampersand", and "/" to "forwardslash". symboltor() vector of HGNC symbols a vector of valid R names, of the same length as, which can be converted to the same HGNC symbols using the rtosymbol function. Eamples library(hgnchelper) data("hgnc.table", envir=environment()) hgnc.symbols <- as.character(na.omit(unique(hgnc.table[,2]))) if(!identical(all.equal(hgnc.symbols, rtosymbol(make.names(symboltor(hgnc.symbols)))), TRUE)) stop("hgnc mapping was not reversible.")
Inde Topic datasets hgnc.table, 5 Topic package HGNChelper-package, 2 affytor, 2, 5 checkgenesymbols, 3 findecelgenesymbols, 4 hgnc.table, 5 HGNChelper-package, 2 rtoaffy, 2, 5 rtosymbol, 6 symboltor, 7 8