Sarah Cohen-Boulakia. Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris-Sud, Université Paris-Saclay
|
|
- Della Tyler
- 6 years ago
- Views:
Transcription
1 Sarah Cohen-Boulakia Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris-Sud, Université Paris-Saclay
2 Understanding Life Sciences Progress in multiple domains: biology, chemistry, maths, computer sciences Emergence of new technologies: Next generation sequencing, Increasing volumes of raw data All stored in Web data sources Raw data are not sufficient Data Annotated by experts Bioinformatics analysis of data New data sources Concrete example: Querying NCBI Entrez («Gquery NCBI» on google ) Sarah Cohen-Boulakia, Université Paris-Sud 2
3 National Center for Biotechnology Information (NCBI) provides access to biomedical and genomic information through its Web portal 33 databases queries Breast cancer genes Sarah Cohen-Boulakia, Université Paris-Sud 3
4 Content of a biological database = a set of pages (a book!) Each database is focused on one biological entity (here, Gene) Each page describes one instance (ie, one gene) Sarah Cohen-Boulakia, Université Paris-Sud 4
5 Various quality features Last update (freshness) Status of the sequence (reliability) «volume of information» (completeness) + Reputation of the data source, number of incoming links (paths) to reach the data Sarah Cohen-Boulakia, Université Paris-Sud 5
6 to determine the importance of results (obtained by several ways) and get complete sets of answers Mammilian carcinoma Mamamilian carcinoma 771 genes not all included in the previous result! 771 Sarah Cohen-Boulakia, Université Paris-Sud 6
7 NCBI ranks answers for each keyword using the relevance Number of occurrences of the keyword (breast cancer) in the files Other approaches have been tested Combining various criteria (weighted functions) Freshness first then reputation of the source then Following PageRank or ObjectRank-like solutions (Biozon, Biorank ) How to deal with several sets of answers? How to aggregate several rankings? Answers obtained by several keywords should have higher priority Wanted: Ranking solution exploiting many criteria Problem : How to combine all such criteria? Alternative: Consensus rankings? Sarah Cohen-Boulakia, Université Paris-Sud 7
8 Need for consensus rankings to order biological data Definition of the problem Experimental Study of existing approaches ConQuR-Bio Conclusion Sarah Cohen-Boulakia, Université Paris-Sud 8
9 Input: a set of rankings (A to D are four genes) Ranking 1 = [A, D, C, B] Ranking 2 = [B, A, D, C] Ranking 3 = [D, A, B, C] Output: one consensus ranking Consensus = [A, D, B, C] A consensus ranking makes the most of several rankings, each obtained by given ranking methods Put emphasis on their common points Does not put too much importance on data classified good by only one or a few ranking methods The optimal consensus is one ranking which is the closest to the input rankings Distance Sarah Cohen-Boulakia, Université Paris-Sud 9
10 The Kendall τ D(π,σ) distance counts the number of pairs of elements inversed (ie in the opposite order) between two rankings Sarah Cohen-Boulakia, Université Paris-Sud 10
11 The Kendall τ D(π,σ) distance counts the number of pairs of elements inversed (ie in the opposite order) between two rankings Kemeny Score Optimal Consensus (median) Sarah Cohen-Boulakia, Université Paris-Sud 11
12 The Kendall τ D(π,σ) distance counts the number of pairs of elements inversed (ie in the opposite order) between two rankings Kemeny Score Optimal Consensus (median) Complexity [Dwork et al 2001, Biedl et al. 2009] NP-Difficult for an odd number of permutations 4 Approximation and heuristics Sarah Cohen-Boulakia, Université Paris-Sud 12
13 Real data sets Have equalities: several items ex-aequo May be incomplete not all on the same sets of elements One unifying bucket at the end of each ranking with the missing elements Unification process Generalized Kendall τ G(r,s) counts the number of pairs of elements inversed between two rankings r et s tied in only one of the two rankings R1=[A, {B,C},D,{E}] R2=[{A,D},E,{B,C}] R3=[{C,E},A,B,{D}] Elements in the unifying bucket are equally unimportant Complexity: Finding an optimal consensus with ties is at least as difficult as in the permutation case [VLDB15] Optimal consensus implemented [VLDB15] Provides exact solutions until n=40 elements Approximation and heuristics Sarah Cohen-Boulakia, Université Paris-Sud 13
14 BioConsert [Cohen-Boulakia et al. 2011] FaginDyn [Fagin et al. 2004] Generalized Kendall-τ distance Ailon3/2 [Fagin et al. 2004] RepeatChoice [Ailon et al. 2010] Pick-A-Perm [Ailon et al. 2008] Kendall-τ distance PNE (exact) [Conitzer, et al. 2006] permutations KwikSort [Ailon et al. 2008] B&B [Ali et al. 2012] Chanas [Chanas et al. 1996] MEDRank [Fagin et al. 2003] BordaCount [Borda 1781] CopelandMethod [Copeland et al. 1951] Positionnal approaches MC4 [Dwork et al. 2001] ChanasBoth [Coleman et al. 2009] Sarah Cohen-Boulakia, Université Paris-Sud 14
15 Compatibility with ties Ability to consider the cost of (un)tying [VLDB15] Sarah Cohen-Boulakia, Université Paris-Sud 15
16 Optimal solution can be found until n~40 Comparison of 12 algorithms Re-implemented and adapted to ties Evaluated on Quality: gap (optimal consensus c*) Time Real datasets Extracted from previous publications 7 datasets: EachMovie, F1, GiantSlalom, SkiCross/Jumping, WebCommunities, WebSearch, BioMedical [VLDB15] Synthetic datasets 1. Uniformly generated with MuPad Combinat 2. With increasing levels of similarity 3. Similarity and unification process Precise understanding of the impact of the similarity and unification process Sarah Cohen-Boulakia, Université Paris-Sud 16
17 BioConsert can be used in a very large majority of the cases For very large data sets (> elements) KwikSort can be preferred If there is a need to seed up then In case of few equalities use BordaCount Otherwise use MEDRank Alternativeley: use both algorithms and pick the best High quality low quality Time (gap) [VLDB15] Sarah Cohen-Boulakia, Université Paris-Sud 17
18 Need for consensus rankings to order biological data Definition of the problem Experimental Study of existing approaches ConQuR-Bio Conclusion Sarah Cohen-Boulakia, Université Paris-Sud 18
19 Synonyms: Breast cancer vs mammalian carcinoma ( vs 771 genes, not all included) Abbreviations: Attention deficit hyperactivity disorders vs ADHD (109 vs 144 genes, 74 common) Linguistics variations: tumour vs tumor (& breast cancer) : 681 vs 291 genes More precise reformulations: colorectal cancer vs Lynch syndrom (+6 new genes) [DILS14] Which are the genes associated to a given disease? Finding all synonyms is time-consuming Querying using all synonyms provide huge amounts of data sets which have to be ranked. Sarah Cohen-Boulakia, Université Paris-Sud 19
20 I) Reformulations (synonyms) can be automatically generated Input: the user s Keyword Automatic search of synonyms in major biomedical terminologies Output : set of Synonyms of the input keyword [DILS14] Sarah Cohen-Boulakia, Université Paris-Sud 20
21 I) Reformulations (synonyms) can be automatically generated Input: the user s Keyword Automatic search of synonyms in major biomedical terminologies Output : set of Synonyms of the input keyword II) DB querying is automated (based on keywords) Input : set of synonyms obtained in I) Querying gene databases with each synonym (#queries = # synonyms) Output : sets of rankings (ranked genes), one ranking per synonym [DILS14] Sarah Cohen-Boulakia, Université Paris-Sud 21
22 I) Reformulations (synonyms) can be automatically generated Input: the user s Keyword Automatic search of synonyms in major biomedical terminologies Output : set of Synonyms of the input keyword II) DB querying is automated (based on keywords) Input : set of synonyms obtained in I) Querying gene databases with each synonym (#queries = # synonyms) Output : sets of rankings (ranked genes), one ranking per synonym III) Aggregating using a series of consensus algorithms with a variant of the Generalized Kendall τ distance [DILS14] Sarah Cohen-Boulakia, Université Paris-Sud 22
23 Highly accessed by members of APHP (Hospitals), Institut Curie, Institut Pasteur Sarah Cohen-Boulakia, Université Paris-Sud 23
24 Ranking Biological data is a complex task Many quality criteria, difficult to combine Consensus ranking approaches Classification of consensus ranking algorithms Rankings with ties, incomplete data sets Study of distances and normalization techniques Complexity results, new distances considered, new heuristics Guide in the choice of distances and algorithms Rank-n-ties platform available Concrete application to real biological data Consensus of query reformulations Currently under evaluation based on real datasets (Leukemia) Increasing number of reformulations + increasing number of answers ranking big data sets Sarah Cohen-Boulakia, Université Paris-Sud 24
25 Bryan Brancotte Mastodon Project QualibioConsensus: LRI (Paris-Sud), LIGM (Marne-lavallée), IFB (Institut Français de Bioinformatique) Other partners: Univ. Montréal, Institut Curie, APHP Georges Pompidou, APHP Paul Brousse Sarah Cohen-Boulakia, Université Paris-Sud 25
26 Rank aggregation with ties: Experiments and Analysis B Brancotte, B Yang, G Blin, S Cohen-Boulakia, A Denise, S Hamel. Proceedings of the VLDB Endowment 8 (11), Interrogation de bases de données biologiques publiques par reformulation de requêtes et classement des résultats avec ConQuR-Bio B Brancotte, B Rance, A Denise, S Cohen-Boulakia. JOBIM (Journées Ouvertes Biologie Informatique Mathématiques) 2015 Conqur-bio: Consensus ranking with query reformulation for biological data B Brancotte, B Rance, A Denise, S Cohen-Boulakia. International Conference on Data Integration in the Life Sciences (DILS), Using medians to generate consensus rankings for biological data S Cohen-Boulakia, A Denise, S Hamel. International Conference on Scientific and Statistical Database Management (SSDBM) Sarah Cohen-Boulakia, Université Paris-Sud 26
Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier
Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier Repositories queried (IR-style) with workflow docum. Open question:
More informationHumboldt-University of Berlin
Humboldt-University of Berlin Exploiting Link Structure to Discover Meaningful Associations between Controlled Vocabulary Terms exposé of diploma thesis of Andrej Masula 13th October 2008 supervisor: Louiqa
More informationSarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR
Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 cohen@lri.fr 01 69 15 32 16 https://www.lri.fr/~cohen/bigdata/biodata-ami2b.html Understanding Life Sciences Progress in multiple domains: biology,
More informationPath-based systems to guide scientists in the maze of biological data sources
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science August 2006 Path-based systems to guide scientists in the maze of biological data sources
More informationLiterature Databases
Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and
More informationInformation Retrieval Rank aggregation. Luca Bondi
Rank aggregation Luca Bondi Motivations 2 Metasearch For a given query, combine the results from different search engines Combining ranking functions Text, links, anchor text, page title, etc. Comparing
More informationConsensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA
Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data
More informationSEEK User Manual. Introduction
SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.
More informationExploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix
Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as
More informationarxiv: v5 [cs.ds] 4 May 2014
An Analysis of Rank Aggregation Algorithms Gattaca Lv No Institute Given arxiv:1402.5259v5 [cs.ds] 4 May 2014 Abstract. Rank aggregation is an essential approach for aggregating the preferences of multiple
More informationUsing a Medical Thesaurus to Predict Query Difficulty
Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query
More informatione-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3
e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 1 National Institute of Pharmaceutical Education and
More informationSarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier
Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier Data Integration in the Life Science (DILS) is more important
More information2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.
2 Navigating the NCBI Instructions Aim: To become familiar with the resources available at the National Center for Bioinformatics (NCBI) and the search engine Entrez. Instructions: Write the answers to
More informationRetrieving factual data and documents using IMGT-ML in the IMGT information system
Retrieving factual data and documents using IMGT-ML in the IMGT information system Authors : Chaume D. *, Combres K. *, Giudicelli V. *, Lefranc M.-P. * * Laboratoire d'immunogénétique Moléculaire, LIGM,
More informationReduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationMachine Learning. Computational biology: Sequence alignment and profile HMMs
10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth
More informationReview of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.
Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and
More information/ Computational Genomics. Normalization
10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program
More informationSELF-SERVICE SEMANTIC DATA FEDERATION
SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical
More informationA Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision
A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of
More informationA System for Ontology-Based Annotation of Biomedical Data
A System for Ontology-Based Annotation of Biomedical Data Clement Jonquet, Mark A. Musen, and Nigam Shah Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Medical
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationBiomedical literature mining for knowledge discovery
Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in
More informationMeSH : A Thesaurus for PubMed
Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH
More informationA Framework for BioCuration (part II)
A Framework for BioCuration (part II) Text Mining for the BioCuration Workflow Workshop, 3rd International Biocuration Conference Friday, April 17, 2009 (Berlin) Martin Krallinger Spanish National Cancer
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationInformation Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si
Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire
More informationData Mining in Bioinformatics: Study & Survey
Data Mining in Bioinformatics: Study & Survey Saliha V S St. Joseph s college Irinjalakuda Abstract--Large amounts of data are generated in medical research. A biological database consists of a collection
More informationOn Demand Phenotype Ranking through Subspace Clustering
On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu
More informationBioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (
Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (http://bioqueries.uma.es) María Jesús García-Godoy, Ismael Navas-Delgado, José Francisco Aldana Montes Computing
More informationEBSCO Discovery Service
EBSCO Discovery Service Presented By: Shaji John Discovery Innovations and FOLIO Africa, Southeast Asia, South Asia & South Korea 1 AGENDA What is Discovery Service Features & Functionalities Client Lists
More informationRank Aggregation for Similar Items
Rank Aggregation for Similar Items D. Sculley Department of Computer Science Tufts University, Medford, MA, USA dsculley@cs.tufts.edu Abstract The problem of combining the ranked preferences of many experts
More informationMeasuring inter-annotator agreement in GO annotations
Measuring inter-annotator agreement in GO annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationAutomatically Constructing a Directory of Molecular Biology Databases
Automatically Constructing a Directory of Molecular Biology Databases Luciano Barbosa, Sumit Tandon, and Juliana Freire School of Computing, University of Utah Abstract. There has been an explosion in
More informationBandit-based Search for Constraint Programming
Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud
More informationEffective and Efficient Similarity Search in Scientific Workflow Repositories
Effective and Efficient Similarity Search in Scientific Workflow Repositories Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan Davidson, Ulf Leser To cite this version: Johannes Starlinger,
More informationSheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms
Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk
More informationBioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data
BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad
More informationText mining tools for semantically enriching the scientific literature
Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the
More informationRelational Retrieval Using a Combination of Path-Constrained Random Walks
Relational Retrieval Using a Combination of Path-Constrained Random Walks Ni Lao, William W. Cohen University 2010.9.22 Outline Relational Retrieval Problems Path-constrained random walks The need for
More informationPresent and Future of the IPOL Journal Machine Learning Applications
Present and Future of the IPOL Journal Machine Learning Applications Miguel Colom http://mcolom.info CMLA, ENS Paris-Saclay 1 What is Reproducible Research? It redefines the result of the research. It
More informationSimilarity Search for Scientific Workflows
Similarity Search for Scientific Worflows Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulaia, Ulf Leser To cite this version: Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulaia, Ulf Leser.
More informationDrug versus Disease (DrugVsDisease) package
1 Introduction Drug versus Disease (DrugVsDisease) package The Drug versus Disease (DrugVsDisease) package provides a pipeline for the comparison of drug and disease gene expression profiles where negatively
More informationSVM Classification in -Arrays
SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What
More informationSupporting Bioinformatic Experiments with A Service Query Engine
Supporting Bioinformatic Experiments with A Service Query Engine Xuan Zhou Shiping Chen Athman Bouguettaya Kai Xu CSIRO ICT Centre, Australia {xuan.zhou,shiping.chen,athman.bouguettaya,kai.xu}@csiro.au
More informationAnalysis of Basic Data Reordering Techniques
Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu
More information8/19/13. Computational problems. Introduction to Algorithm
I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship
More informationDocument Retrieval using Predication Similarity
Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research
More informationFast Searching in Wikipedia, P2P, and Biological Sequences. Ela Hunt, Marie Curie Fellow
Fast Searching in Wikipedia, P2P, and Biological Sequences Ela Hunt, Marie Curie Fellow Ela Hunt, Department of Computer Science, GlobIS, hunt@inf.ethz.ch Edinburgh, May 1 st 2008 My background Krakow,
More informationMin Wang. April, 2003
Development of a co-regulated gene expression analysis tool (CREAT) By Min Wang April, 2003 Project Documentation Description of CREAT CREAT (coordinated regulatory element analysis tool) are developed
More informationMeSH: A Thesaurus for PubMed
Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical
More informationUMLS-Query: A Perl Module for Querying the UMLS
UMLS-Query: A Perl Module for Querying the UMLS Nigam H. Shah, MBBS, PhD, Mark A. Musen, MD, PhD Center for Biomedical Informatics Research, Stanford University, Stanford, CA Abstract The Metathesaurus
More informationQuery Reformulation for Clinical Decision Support Search
Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler
ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler François Gonard, Marc Schoenauer, Michele Sebag To cite this version: François Gonard, Marc Schoenauer, Michele Sebag.
More informationAcquiring Experience with Ontology and Vocabularies
Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended
More informationQQ normality plots Harvey Motulsky, GraphPad Software Inc. July 2013
QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 213 Introduction Many statistical tests assume that data (or residuals) are sampled from a Gaussian distribution. Normality tests are often
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationLaHC at CLEF 2015 SBS Lab
LaHC at CLEF 2015 SBS Lab Nawal Ould-Amer, Mathias Géry To cite this version: Nawal Ould-Amer, Mathias Géry. LaHC at CLEF 2015 SBS Lab. Conference and Labs of the Evaluation Forum, Sep 2015, Toulouse,
More informationNCI Thesaurus, managing towards an ontology
NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports
More informationSearching for Literature Using HDAS (Healthcare Databases Advanced Search)
Searching for Literature Using HDAS (Healthcare Databases Advanced Search) 1. What is HDAS?... page 2 2. How do I access HDAS?... page 3 3. Questions and concepts (PICO) page 4 4. Selecting a database.
More informationARES: Advanced Networking for Distributing Genomic Data. Gianluca Reali University of Perugia
ARES: Advanced Networking for Distributing Genomic Data Gianluca Reali University of Perugia VUB, Brussels, May 13, 2014 Outline Description of ARES ARES research and implementation purposes Technologies
More informationBioinformatics explained: BLAST. March 8, 2007
Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationPrognosis of Lung Cancer Using Data Mining Techniques
Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,
More informationScientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities
Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities Sarah Cohen-Boulakia, Khalid Belhajjame, Olivier Collin, Jérôme Chopard, Christine Froidevaux,
More informationBioMinT: Biological Text Mining EU FP5 Quality of Life Project
BioMinT: Biological Text Mining EU FP5 Quality of Life Project Dr. Dipl.-Ing. Österreichisches Forschungsinstitut für Artificial Intelligence Motivation Economic and business pressures are forcing drug
More informationPredicting Disease-related Genes using Integrated Biomedical Networks
Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng (jiajiepeng@nwpu.edu.cn) HanshengXue(xhs1892@gmail.com) Jin Chen* (chen.jin@uky.edu) Yadong Wang* (ydwang@hit.edu.cn) 1
More informationCHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM
CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM In this research work, Genetic Algorithm method is used for feature selection. The following section explains how Genetic Algorithm is used for feature
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationCD 485 Computer Applications in Communication Disorders and Sciences MODULE 3
CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3 SECTION VII IDENTIFYING THE APPROPRIATE DATABASES JOURNAL ARTICLES THROUGH PUBMED, MEDLINE AND COMMUNICATION DISORDERS MULTISEARCH
More informationFormalizing Mappings to Optimize Automated Schema Alignment: Application to Rare Diseases
e-health For Continuity of Care C. Lovis et al. (Eds.) 2014 European Federation for Medical Informatics and IOS Press. This article is published online with Open Access by IOS Press and distributed under
More informationYAM++ : A multi-strategy based approach for Ontology matching task
YAM++ : A multi-strategy based approach for Ontology matching task Duy Hoa Ngo, Zohra Bellahsene To cite this version: Duy Hoa Ngo, Zohra Bellahsene. YAM++ : A multi-strategy based approach for Ontology
More informationLaurent D. COHEN.
for biomedical image segmentation Laurent D. COHEN Directeur de Recherche CNRS CEREMADE, UMR CNRS 7534 Université Paris-9 Dauphine Place du Maréchal de Lattre de Tassigny 75016 Paris, France Cohen@ceremade.dauphine.fr
More informationBioinformatics Hubs on the Web
Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is
More informationHeuristic methods for pairwise alignment:
Bi03c_1 Unit 03c: Heuristic methods for pairwise alignment: k-tuple-methods k-tuple-methods for alignment of pairs of sequences Bi03c_2 dynamic programming is too slow for large databases Use heuristic
More informationLarge scale corporate Web Analysis for Business Intelligence
Industrial Clusters in England Large scale corporate Web Analysis for Business Intelligence Michele Barbera, Andrey Bratus, Nicola Sambin {barbera,bratus,sambin}@spaziodati.eu 29 April, 2016 25 Software
More informationWeb of Science Including BIOSIS Citation Index
Web of Science Including BIOSIS Citation Index UCL Library Services, Gower St., London WC1E 6BT 020 7679 7792 E-mail: library@ucl.ac.uk http://www.ucl.ac.uk/library/ 1. What is Web of Science? Web of Science
More informationUSING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT
IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah
More informationFrom Smith-Waterman to BLAST
From Smith-Waterman to BLAST Jeremy Buhler July 23, 2015 Smith-Waterman is the fundamental tool that we use to decide how similar two sequences are. Isn t that all that BLAST does? In principle, it is
More informationBioethics Thesaurus Database Search Tips
Bioethics Thesaurus Database Search Tips Writers, Internet surfers, bloggers, indexers, journalists, health care professionals, librarians and students alike will find the Bioethics Thesaurus Database
More informationPOMap results for OAEI 2017
POMap results for OAEI 2017 Amir Laadhar 1, Faiza Ghozzi 2, Imen Megdiche 1, Franck Ravat 1, Olivier Teste 1, and Faiez Gargouri 2 1 Paul Sabatier University, IRIT (CNRS/UMR 5505) 118 Route de Narbonne
More informationIntroduction to Systems Biology II: Lab
Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois
More informationSimilarity Ranking in Large- Scale Bipartite Graphs
Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads
More informationSelecting biomedical data sources according to user preferences
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science July 2005 Selecting biomedical data sources according to user preferences Sarah Cohen-Boulakia
More informationKraken: ultrafast metagenomic sequence classification using exact alignments
Kraken: ultrafast metagenomic sequence classification using exact alignments Derrick E. Wood and Steven L. Salzberg Bioinformatics journal club October 8, 2014 Märt Roosaare Need for speed Metagenomic
More informationWeighted Heuristic Ensemble of Filters
Weighted Heuristic Ensemble of Filters Ghadah Aldehim School of Computing Sciences University of East Anglia Norwich, NR4 7TJ, UK G.Aldehim@uea.ac.uk Wenjia Wang School of Computing Sciences University
More informationA Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result
Wayne State University Wayne State University Dissertations 1-1-2013 A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result Massuod Hassan Alatrash Wayne State University,
More informationInformation mining and information retrieval : methods and applications
Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationA Semantic Map of RSS Feeds to Support Discovery
A Semantic Map of RSS Feeds to Support Discovery Gaïane Hochard, 1 Zoé Lacroix, 1,2, Jordi Creus, 3 and Bernd Amann, 3 1 Arizona State University, Tempe, Arizona, USA 2 Translational Genomics Research
More informationBiomedical Genomics Workbench APPLICATION BASED MANUAL
Biomedical Genomics Workbench APPLICATION BASED MANUAL Manual for Biomedical Genomics Workbench 4.0 Windows, Mac OS X and Linux January 23, 2017 This software is for research purposes only. QIAGEN Aarhus
More informationLecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:
Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating
More informationThe Green Computing Observatory
The Green Computing Observatory Michel Jouvin (LAL) Cécile Germain-Renaud (LRI), Thibaut Jacob (LRI), Gilles Kassel (MIS), Julien Nauroy (LRI), Guillaume Philippon (LAL) Outline Contexts Acquisition Status
More informationWorld Academy of Science, Engineering and Technology International Journal of Bioengineering and Life Sciences Vol:11, No:6, 2017
BeamGA Median: A Hybrid Heuristic Search Approach Ghada Badr, Manar Hosny, Nuha Bintayyash, Eman Albilali, Souad Larabi Marie-Sainte Abstract The median problem is significantly applied to derive the most
More informationInformativeness for Adhoc IR Evaluation:
Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,
More informationSequence Alignment & Search
Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version
More informationAnalysis on the technology improvement of the library network information retrieval efficiency
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the
More information