Sarah Cohen-Boulakia. Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris-Sud, Université Paris-Saclay

Size: px
Start display at page:

Download "Sarah Cohen-Boulakia. Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris-Sud, Université Paris-Saclay"

Transcription

1 Sarah Cohen-Boulakia Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris-Sud, Université Paris-Saclay

2 Understanding Life Sciences Progress in multiple domains: biology, chemistry, maths, computer sciences Emergence of new technologies: Next generation sequencing, Increasing volumes of raw data All stored in Web data sources Raw data are not sufficient Data Annotated by experts Bioinformatics analysis of data New data sources Concrete example: Querying NCBI Entrez («Gquery NCBI» on google ) Sarah Cohen-Boulakia, Université Paris-Sud 2

3 National Center for Biotechnology Information (NCBI) provides access to biomedical and genomic information through its Web portal 33 databases queries Breast cancer genes Sarah Cohen-Boulakia, Université Paris-Sud 3

4 Content of a biological database = a set of pages (a book!) Each database is focused on one biological entity (here, Gene) Each page describes one instance (ie, one gene) Sarah Cohen-Boulakia, Université Paris-Sud 4

5 Various quality features Last update (freshness) Status of the sequence (reliability) «volume of information» (completeness) + Reputation of the data source, number of incoming links (paths) to reach the data Sarah Cohen-Boulakia, Université Paris-Sud 5

6 to determine the importance of results (obtained by several ways) and get complete sets of answers Mammilian carcinoma Mamamilian carcinoma 771 genes not all included in the previous result! 771 Sarah Cohen-Boulakia, Université Paris-Sud 6

7 NCBI ranks answers for each keyword using the relevance Number of occurrences of the keyword (breast cancer) in the files Other approaches have been tested Combining various criteria (weighted functions) Freshness first then reputation of the source then Following PageRank or ObjectRank-like solutions (Biozon, Biorank ) How to deal with several sets of answers? How to aggregate several rankings? Answers obtained by several keywords should have higher priority Wanted: Ranking solution exploiting many criteria Problem : How to combine all such criteria? Alternative: Consensus rankings? Sarah Cohen-Boulakia, Université Paris-Sud 7

8 Need for consensus rankings to order biological data Definition of the problem Experimental Study of existing approaches ConQuR-Bio Conclusion Sarah Cohen-Boulakia, Université Paris-Sud 8

9 Input: a set of rankings (A to D are four genes) Ranking 1 = [A, D, C, B] Ranking 2 = [B, A, D, C] Ranking 3 = [D, A, B, C] Output: one consensus ranking Consensus = [A, D, B, C] A consensus ranking makes the most of several rankings, each obtained by given ranking methods Put emphasis on their common points Does not put too much importance on data classified good by only one or a few ranking methods The optimal consensus is one ranking which is the closest to the input rankings Distance Sarah Cohen-Boulakia, Université Paris-Sud 9

10 The Kendall τ D(π,σ) distance counts the number of pairs of elements inversed (ie in the opposite order) between two rankings Sarah Cohen-Boulakia, Université Paris-Sud 10

11 The Kendall τ D(π,σ) distance counts the number of pairs of elements inversed (ie in the opposite order) between two rankings Kemeny Score Optimal Consensus (median) Sarah Cohen-Boulakia, Université Paris-Sud 11

12 The Kendall τ D(π,σ) distance counts the number of pairs of elements inversed (ie in the opposite order) between two rankings Kemeny Score Optimal Consensus (median) Complexity [Dwork et al 2001, Biedl et al. 2009] NP-Difficult for an odd number of permutations 4 Approximation and heuristics Sarah Cohen-Boulakia, Université Paris-Sud 12

13 Real data sets Have equalities: several items ex-aequo May be incomplete not all on the same sets of elements One unifying bucket at the end of each ranking with the missing elements Unification process Generalized Kendall τ G(r,s) counts the number of pairs of elements inversed between two rankings r et s tied in only one of the two rankings R1=[A, {B,C},D,{E}] R2=[{A,D},E,{B,C}] R3=[{C,E},A,B,{D}] Elements in the unifying bucket are equally unimportant Complexity: Finding an optimal consensus with ties is at least as difficult as in the permutation case [VLDB15] Optimal consensus implemented [VLDB15] Provides exact solutions until n=40 elements Approximation and heuristics Sarah Cohen-Boulakia, Université Paris-Sud 13

14 BioConsert [Cohen-Boulakia et al. 2011] FaginDyn [Fagin et al. 2004] Generalized Kendall-τ distance Ailon3/2 [Fagin et al. 2004] RepeatChoice [Ailon et al. 2010] Pick-A-Perm [Ailon et al. 2008] Kendall-τ distance PNE (exact) [Conitzer, et al. 2006] permutations KwikSort [Ailon et al. 2008] B&B [Ali et al. 2012] Chanas [Chanas et al. 1996] MEDRank [Fagin et al. 2003] BordaCount [Borda 1781] CopelandMethod [Copeland et al. 1951] Positionnal approaches MC4 [Dwork et al. 2001] ChanasBoth [Coleman et al. 2009] Sarah Cohen-Boulakia, Université Paris-Sud 14

15 Compatibility with ties Ability to consider the cost of (un)tying [VLDB15] Sarah Cohen-Boulakia, Université Paris-Sud 15

16 Optimal solution can be found until n~40 Comparison of 12 algorithms Re-implemented and adapted to ties Evaluated on Quality: gap (optimal consensus c*) Time Real datasets Extracted from previous publications 7 datasets: EachMovie, F1, GiantSlalom, SkiCross/Jumping, WebCommunities, WebSearch, BioMedical [VLDB15] Synthetic datasets 1. Uniformly generated with MuPad Combinat 2. With increasing levels of similarity 3. Similarity and unification process Precise understanding of the impact of the similarity and unification process Sarah Cohen-Boulakia, Université Paris-Sud 16

17 BioConsert can be used in a very large majority of the cases For very large data sets (> elements) KwikSort can be preferred If there is a need to seed up then In case of few equalities use BordaCount Otherwise use MEDRank Alternativeley: use both algorithms and pick the best High quality low quality Time (gap) [VLDB15] Sarah Cohen-Boulakia, Université Paris-Sud 17

18 Need for consensus rankings to order biological data Definition of the problem Experimental Study of existing approaches ConQuR-Bio Conclusion Sarah Cohen-Boulakia, Université Paris-Sud 18

19 Synonyms: Breast cancer vs mammalian carcinoma ( vs 771 genes, not all included) Abbreviations: Attention deficit hyperactivity disorders vs ADHD (109 vs 144 genes, 74 common) Linguistics variations: tumour vs tumor (& breast cancer) : 681 vs 291 genes More precise reformulations: colorectal cancer vs Lynch syndrom (+6 new genes) [DILS14] Which are the genes associated to a given disease? Finding all synonyms is time-consuming Querying using all synonyms provide huge amounts of data sets which have to be ranked. Sarah Cohen-Boulakia, Université Paris-Sud 19

20 I) Reformulations (synonyms) can be automatically generated Input: the user s Keyword Automatic search of synonyms in major biomedical terminologies Output : set of Synonyms of the input keyword [DILS14] Sarah Cohen-Boulakia, Université Paris-Sud 20

21 I) Reformulations (synonyms) can be automatically generated Input: the user s Keyword Automatic search of synonyms in major biomedical terminologies Output : set of Synonyms of the input keyword II) DB querying is automated (based on keywords) Input : set of synonyms obtained in I) Querying gene databases with each synonym (#queries = # synonyms) Output : sets of rankings (ranked genes), one ranking per synonym [DILS14] Sarah Cohen-Boulakia, Université Paris-Sud 21

22 I) Reformulations (synonyms) can be automatically generated Input: the user s Keyword Automatic search of synonyms in major biomedical terminologies Output : set of Synonyms of the input keyword II) DB querying is automated (based on keywords) Input : set of synonyms obtained in I) Querying gene databases with each synonym (#queries = # synonyms) Output : sets of rankings (ranked genes), one ranking per synonym III) Aggregating using a series of consensus algorithms with a variant of the Generalized Kendall τ distance [DILS14] Sarah Cohen-Boulakia, Université Paris-Sud 22

23 Highly accessed by members of APHP (Hospitals), Institut Curie, Institut Pasteur Sarah Cohen-Boulakia, Université Paris-Sud 23

24 Ranking Biological data is a complex task Many quality criteria, difficult to combine Consensus ranking approaches Classification of consensus ranking algorithms Rankings with ties, incomplete data sets Study of distances and normalization techniques Complexity results, new distances considered, new heuristics Guide in the choice of distances and algorithms Rank-n-ties platform available Concrete application to real biological data Consensus of query reformulations Currently under evaluation based on real datasets (Leukemia) Increasing number of reformulations + increasing number of answers ranking big data sets Sarah Cohen-Boulakia, Université Paris-Sud 24

25 Bryan Brancotte Mastodon Project QualibioConsensus: LRI (Paris-Sud), LIGM (Marne-lavallée), IFB (Institut Français de Bioinformatique) Other partners: Univ. Montréal, Institut Curie, APHP Georges Pompidou, APHP Paul Brousse Sarah Cohen-Boulakia, Université Paris-Sud 25

26 Rank aggregation with ties: Experiments and Analysis B Brancotte, B Yang, G Blin, S Cohen-Boulakia, A Denise, S Hamel. Proceedings of the VLDB Endowment 8 (11), Interrogation de bases de données biologiques publiques par reformulation de requêtes et classement des résultats avec ConQuR-Bio B Brancotte, B Rance, A Denise, S Cohen-Boulakia. JOBIM (Journées Ouvertes Biologie Informatique Mathématiques) 2015 Conqur-bio: Consensus ranking with query reformulation for biological data B Brancotte, B Rance, A Denise, S Cohen-Boulakia. International Conference on Data Integration in the Life Sciences (DILS), Using medians to generate consensus rankings for biological data S Cohen-Boulakia, A Denise, S Hamel. International Conference on Scientific and Statistical Database Management (SSDBM) Sarah Cohen-Boulakia, Université Paris-Sud 26

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier Repositories queried (IR-style) with workflow docum. Open question:

More information

Humboldt-University of Berlin

Humboldt-University of Berlin Humboldt-University of Berlin Exploiting Link Structure to Discover Meaningful Associations between Controlled Vocabulary Terms exposé of diploma thesis of Andrej Masula 13th October 2008 supervisor: Louiqa

More information

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 cohen@lri.fr 01 69 15 32 16 https://www.lri.fr/~cohen/bigdata/biodata-ami2b.html Understanding Life Sciences Progress in multiple domains: biology,

More information

Path-based systems to guide scientists in the maze of biological data sources

Path-based systems to guide scientists in the maze of biological data sources University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science August 2006 Path-based systems to guide scientists in the maze of biological data sources

More information

Literature Databases

Literature Databases Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and

More information

Information Retrieval Rank aggregation. Luca Bondi

Information Retrieval Rank aggregation. Luca Bondi Rank aggregation Luca Bondi Motivations 2 Metasearch For a given query, combine the results from different search engines Combining ranking functions Text, links, anchor text, page title, etc. Comparing

More information

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data

More information

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as

More information

arxiv: v5 [cs.ds] 4 May 2014

arxiv: v5 [cs.ds] 4 May 2014 An Analysis of Rank Aggregation Algorithms Gattaca Lv No Institute Given arxiv:1402.5259v5 [cs.ds] 4 May 2014 Abstract. Rank aggregation is an essential approach for aggregating the preferences of multiple

More information

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query

More information

e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3

e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 1 National Institute of Pharmaceutical Education and

More information

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 On leave at INRIA Virtual Plants & Zenith, Inst. of Comput. Biology, Montpellier Data Integration in the Life Science (DILS) is more important

More information

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI. 2 Navigating the NCBI Instructions Aim: To become familiar with the resources available at the National Center for Bioinformatics (NCBI) and the search engine Entrez. Instructions: Write the answers to

More information

Retrieving factual data and documents using IMGT-ML in the IMGT information system

Retrieving factual data and documents using IMGT-ML in the IMGT information system Retrieving factual data and documents using IMGT-ML in the IMGT information system Authors : Chaume D. *, Combres K. *, Giudicelli V. *, Lefranc M.-P. * * Laboratoire d'immunogénétique Moléculaire, LIGM,

More information

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation

More information

Machine Learning. Computational biology: Sequence alignment and profile HMMs

Machine Learning. Computational biology: Sequence alignment and profile HMMs 10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program

More information

SELF-SERVICE SEMANTIC DATA FEDERATION

SELF-SERVICE SEMANTIC DATA FEDERATION SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical

More information

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

A System for Ontology-Based Annotation of Biomedical Data

A System for Ontology-Based Annotation of Biomedical Data A System for Ontology-Based Annotation of Biomedical Data Clement Jonquet, Mark A. Musen, and Nigam Shah Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Medical

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information

A Framework for BioCuration (part II)

A Framework for BioCuration (part II) A Framework for BioCuration (part II) Text Mining for the BioCuration Workflow Workshop, 3rd International Biocuration Conference Friday, April 17, 2009 (Berlin) Martin Krallinger Spanish National Cancer

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire

More information

Data Mining in Bioinformatics: Study & Survey

Data Mining in Bioinformatics: Study & Survey Data Mining in Bioinformatics: Study & Survey Saliha V S St. Joseph s college Irinjalakuda Abstract--Large amounts of data are generated in medical research. A biological database consists of a collection

More information

On Demand Phenotype Ranking through Subspace Clustering

On Demand Phenotype Ranking through Subspace Clustering On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu

More information

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data ( Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (http://bioqueries.uma.es) María Jesús García-Godoy, Ismael Navas-Delgado, José Francisco Aldana Montes Computing

More information

EBSCO Discovery Service

EBSCO Discovery Service EBSCO Discovery Service Presented By: Shaji John Discovery Innovations and FOLIO Africa, Southeast Asia, South Asia & South Korea 1 AGENDA What is Discovery Service Features & Functionalities Client Lists

More information

Rank Aggregation for Similar Items

Rank Aggregation for Similar Items Rank Aggregation for Similar Items D. Sculley Department of Computer Science Tufts University, Medford, MA, USA dsculley@cs.tufts.edu Abstract The problem of combining the ranked preferences of many experts

More information

Measuring inter-annotator agreement in GO annotations

Measuring inter-annotator agreement in GO annotations Measuring inter-annotator agreement in GO annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

Automatically Constructing a Directory of Molecular Biology Databases

Automatically Constructing a Directory of Molecular Biology Databases Automatically Constructing a Directory of Molecular Biology Databases Luciano Barbosa, Sumit Tandon, and Juliana Freire School of Computing, University of Utah Abstract. There has been an explosion in

More information

Bandit-based Search for Constraint Programming

Bandit-based Search for Constraint Programming Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud

More information

Effective and Efficient Similarity Search in Scientific Workflow Repositories

Effective and Efficient Similarity Search in Scientific Workflow Repositories Effective and Efficient Similarity Search in Scientific Workflow Repositories Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan Davidson, Ulf Leser To cite this version: Johannes Starlinger,

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Relational Retrieval Using a Combination of Path-Constrained Random Walks

Relational Retrieval Using a Combination of Path-Constrained Random Walks Relational Retrieval Using a Combination of Path-Constrained Random Walks Ni Lao, William W. Cohen University 2010.9.22 Outline Relational Retrieval Problems Path-constrained random walks The need for

More information

Present and Future of the IPOL Journal Machine Learning Applications

Present and Future of the IPOL Journal Machine Learning Applications Present and Future of the IPOL Journal Machine Learning Applications Miguel Colom http://mcolom.info CMLA, ENS Paris-Saclay 1 What is Reproducible Research? It redefines the result of the research. It

More information

Similarity Search for Scientific Workflows

Similarity Search for Scientific Workflows Similarity Search for Scientific Worflows Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulaia, Ulf Leser To cite this version: Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulaia, Ulf Leser.

More information

Drug versus Disease (DrugVsDisease) package

Drug versus Disease (DrugVsDisease) package 1 Introduction Drug versus Disease (DrugVsDisease) package The Drug versus Disease (DrugVsDisease) package provides a pipeline for the comparison of drug and disease gene expression profiles where negatively

More information

SVM Classification in -Arrays

SVM Classification in -Arrays SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What

More information

Supporting Bioinformatic Experiments with A Service Query Engine

Supporting Bioinformatic Experiments with A Service Query Engine Supporting Bioinformatic Experiments with A Service Query Engine Xuan Zhou Shiping Chen Athman Bouguettaya Kai Xu CSIRO ICT Centre, Australia {xuan.zhou,shiping.chen,athman.bouguettaya,kai.xu}@csiro.au

More information

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

8/19/13. Computational problems. Introduction to Algorithm

8/19/13. Computational problems. Introduction to Algorithm I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Fast Searching in Wikipedia, P2P, and Biological Sequences. Ela Hunt, Marie Curie Fellow

Fast Searching in Wikipedia, P2P, and Biological Sequences. Ela Hunt, Marie Curie Fellow Fast Searching in Wikipedia, P2P, and Biological Sequences Ela Hunt, Marie Curie Fellow Ela Hunt, Department of Computer Science, GlobIS, hunt@inf.ethz.ch Edinburgh, May 1 st 2008 My background Krakow,

More information

Min Wang. April, 2003

Min Wang. April, 2003 Development of a co-regulated gene expression analysis tool (CREAT) By Min Wang April, 2003 Project Documentation Description of CREAT CREAT (coordinated regulatory element analysis tool) are developed

More information

MeSH: A Thesaurus for PubMed

MeSH: A Thesaurus for PubMed Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical

More information

UMLS-Query: A Perl Module for Querying the UMLS

UMLS-Query: A Perl Module for Querying the UMLS UMLS-Query: A Perl Module for Querying the UMLS Nigam H. Shah, MBBS, PhD, Mark A. Musen, MD, PhD Center for Biomedical Informatics Research, Stanford University, Stanford, CA Abstract The Metathesaurus

More information

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

More information

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler François Gonard, Marc Schoenauer, Michele Sebag To cite this version: François Gonard, Marc Schoenauer, Michele Sebag.

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 2013

QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 2013 QQ normality plots Harvey Motulsky, GraphPad Software Inc. July 213 Introduction Many statistical tests assume that data (or residuals) are sampled from a Gaussian distribution. Normality tests are often

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

LaHC at CLEF 2015 SBS Lab

LaHC at CLEF 2015 SBS Lab LaHC at CLEF 2015 SBS Lab Nawal Ould-Amer, Mathias Géry To cite this version: Nawal Ould-Amer, Mathias Géry. LaHC at CLEF 2015 SBS Lab. Conference and Labs of the Evaluation Forum, Sep 2015, Toulouse,

More information

NCI Thesaurus, managing towards an ontology

NCI Thesaurus, managing towards an ontology NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports

More information

Searching for Literature Using HDAS (Healthcare Databases Advanced Search)

Searching for Literature Using HDAS (Healthcare Databases Advanced Search) Searching for Literature Using HDAS (Healthcare Databases Advanced Search) 1. What is HDAS?... page 2 2. How do I access HDAS?... page 3 3. Questions and concepts (PICO) page 4 4. Selecting a database.

More information

ARES: Advanced Networking for Distributing Genomic Data. Gianluca Reali University of Perugia

ARES: Advanced Networking for Distributing Genomic Data. Gianluca Reali University of Perugia ARES: Advanced Networking for Distributing Genomic Data Gianluca Reali University of Perugia VUB, Brussels, May 13, 2014 Outline Description of ARES ARES research and implementation purposes Technologies

More information

Bioinformatics explained: BLAST. March 8, 2007

Bioinformatics explained: BLAST. March 8, 2007 Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Prognosis of Lung Cancer Using Data Mining Techniques

Prognosis of Lung Cancer Using Data Mining Techniques Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,

More information

Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities

Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities Sarah Cohen-Boulakia, Khalid Belhajjame, Olivier Collin, Jérôme Chopard, Christine Froidevaux,

More information

BioMinT: Biological Text Mining EU FP5 Quality of Life Project

BioMinT: Biological Text Mining EU FP5 Quality of Life Project BioMinT: Biological Text Mining EU FP5 Quality of Life Project Dr. Dipl.-Ing. Österreichisches Forschungsinstitut für Artificial Intelligence Motivation Economic and business pressures are forcing drug

More information

Predicting Disease-related Genes using Integrated Biomedical Networks

Predicting Disease-related Genes using Integrated Biomedical Networks Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng (jiajiepeng@nwpu.edu.cn) HanshengXue(xhs1892@gmail.com) Jin Chen* (chen.jin@uky.edu) Yadong Wang* (ydwang@hit.edu.cn) 1

More information

CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM

CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM In this research work, Genetic Algorithm method is used for feature selection. The following section explains how Genetic Algorithm is used for feature

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3

CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3 CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3 SECTION VII IDENTIFYING THE APPROPRIATE DATABASES JOURNAL ARTICLES THROUGH PUBMED, MEDLINE AND COMMUNICATION DISORDERS MULTISEARCH

More information

Formalizing Mappings to Optimize Automated Schema Alignment: Application to Rare Diseases

Formalizing Mappings to Optimize Automated Schema Alignment: Application to Rare Diseases e-health For Continuity of Care C. Lovis et al. (Eds.) 2014 European Federation for Medical Informatics and IOS Press. This article is published online with Open Access by IOS Press and distributed under

More information

YAM++ : A multi-strategy based approach for Ontology matching task

YAM++ : A multi-strategy based approach for Ontology matching task YAM++ : A multi-strategy based approach for Ontology matching task Duy Hoa Ngo, Zohra Bellahsene To cite this version: Duy Hoa Ngo, Zohra Bellahsene. YAM++ : A multi-strategy based approach for Ontology

More information

Laurent D. COHEN.

Laurent D. COHEN. for biomedical image segmentation Laurent D. COHEN Directeur de Recherche CNRS CEREMADE, UMR CNRS 7534 Université Paris-9 Dauphine Place du Maréchal de Lattre de Tassigny 75016 Paris, France Cohen@ceremade.dauphine.fr

More information

Bioinformatics Hubs on the Web

Bioinformatics Hubs on the Web Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is

More information

Heuristic methods for pairwise alignment:

Heuristic methods for pairwise alignment: Bi03c_1 Unit 03c: Heuristic methods for pairwise alignment: k-tuple-methods k-tuple-methods for alignment of pairs of sequences Bi03c_2 dynamic programming is too slow for large databases Use heuristic

More information

Large scale corporate Web Analysis for Business Intelligence

Large scale corporate Web Analysis for Business Intelligence Industrial Clusters in England Large scale corporate Web Analysis for Business Intelligence Michele Barbera, Andrey Bratus, Nicola Sambin {barbera,bratus,sambin}@spaziodati.eu 29 April, 2016 25 Software

More information

Web of Science Including BIOSIS Citation Index

Web of Science Including BIOSIS Citation Index Web of Science Including BIOSIS Citation Index UCL Library Services, Gower St., London WC1E 6BT 020 7679 7792 E-mail: library@ucl.ac.uk http://www.ucl.ac.uk/library/ 1. What is Web of Science? Web of Science

More information

USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT

USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah

More information

From Smith-Waterman to BLAST

From Smith-Waterman to BLAST From Smith-Waterman to BLAST Jeremy Buhler July 23, 2015 Smith-Waterman is the fundamental tool that we use to decide how similar two sequences are. Isn t that all that BLAST does? In principle, it is

More information

Bioethics Thesaurus Database Search Tips

Bioethics Thesaurus Database Search Tips Bioethics Thesaurus Database Search Tips Writers, Internet surfers, bloggers, indexers, journalists, health care professionals, librarians and students alike will find the Bioethics Thesaurus Database

More information

POMap results for OAEI 2017

POMap results for OAEI 2017 POMap results for OAEI 2017 Amir Laadhar 1, Faiza Ghozzi 2, Imen Megdiche 1, Franck Ravat 1, Olivier Teste 1, and Faiez Gargouri 2 1 Paul Sabatier University, IRIT (CNRS/UMR 5505) 118 Route de Narbonne

More information

Introduction to Systems Biology II: Lab

Introduction to Systems Biology II: Lab Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois

More information

Similarity Ranking in Large- Scale Bipartite Graphs

Similarity Ranking in Large- Scale Bipartite Graphs Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads

More information

Selecting biomedical data sources according to user preferences

Selecting biomedical data sources according to user preferences University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science July 2005 Selecting biomedical data sources according to user preferences Sarah Cohen-Boulakia

More information

Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken: ultrafast metagenomic sequence classification using exact alignments Kraken: ultrafast metagenomic sequence classification using exact alignments Derrick E. Wood and Steven L. Salzberg Bioinformatics journal club October 8, 2014 Märt Roosaare Need for speed Metagenomic

More information

Weighted Heuristic Ensemble of Filters

Weighted Heuristic Ensemble of Filters Weighted Heuristic Ensemble of Filters Ghadah Aldehim School of Computing Sciences University of East Anglia Norwich, NR4 7TJ, UK G.Aldehim@uea.ac.uk Wenjia Wang School of Computing Sciences University

More information

A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result

A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result Wayne State University Wayne State University Dissertations 1-1-2013 A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result Massuod Hassan Alatrash Wayne State University,

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

A Semantic Map of RSS Feeds to Support Discovery

A Semantic Map of RSS Feeds to Support Discovery A Semantic Map of RSS Feeds to Support Discovery Gaïane Hochard, 1 Zoé Lacroix, 1,2, Jordi Creus, 3 and Bernd Amann, 3 1 Arizona State University, Tempe, Arizona, USA 2 Translational Genomics Research

More information

Biomedical Genomics Workbench APPLICATION BASED MANUAL

Biomedical Genomics Workbench APPLICATION BASED MANUAL Biomedical Genomics Workbench APPLICATION BASED MANUAL Manual for Biomedical Genomics Workbench 4.0 Windows, Mac OS X and Linux January 23, 2017 This software is for research purposes only. QIAGEN Aarhus

More information

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations: Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating

More information

The Green Computing Observatory

The Green Computing Observatory The Green Computing Observatory Michel Jouvin (LAL) Cécile Germain-Renaud (LRI), Thibaut Jacob (LRI), Gilles Kassel (MIS), Julien Nauroy (LRI), Guillaume Philippon (LAL) Outline Contexts Acquisition Status

More information

World Academy of Science, Engineering and Technology International Journal of Bioengineering and Life Sciences Vol:11, No:6, 2017

World Academy of Science, Engineering and Technology International Journal of Bioengineering and Life Sciences Vol:11, No:6, 2017 BeamGA Median: A Hybrid Heuristic Search Approach Ghada Badr, Manar Hosny, Nuha Bintayyash, Eman Albilali, Souad Larabi Marie-Sainte Abstract The median problem is significantly applied to derive the most

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Sequence Alignment & Search

Sequence Alignment & Search Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version

More information

Analysis on the technology improvement of the library network information retrieval efficiency

Analysis on the technology improvement of the library network information retrieval efficiency Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the

More information