LSHTC: A Benchmark for Large-Scale Classification
|
|
- Christopher Reynolds
- 6 years ago
- Views:
Transcription
1 LSHTC: A Benchmark for Large-Scale Classification I. Partalas, A. Kosmopoulos,, G. Paliouras, E. Gaussier, I. Androutsopoulos, T. Artières, P. Gallinari, Massih-Reza Amini, Nicolas Baskiotis Lab. d Informatique de Grenoble & Grenoble University, France National Center for Scientific Research Demokritos, Greece Athens University of Economics and Business, Greece Lab. d informatique de Paris 6, France ICML 2014, Beijing 25th June, 2014 ICML The LSHTC Challenge series 1 / 19
2 The LSHTC Challenge Introduction A series of competitions on large-scale hierarchical classification First edition started in editions were organized till 2014 Participation of around 200 teams in total Goal: provide a standard benchmark for large-scale classification ICML The LSHTC Challenge series 2 / 19
3 Motivation Introduction Many systems use a structured organization of the data Taxonomy #doc. #catg. #editors DMOZ 5 M 1 M 100 k Wikipedia 32 M 600 k 129 k PubMed/MeSH 20 M 27 k - IPC 32 M 70 k - Root Arts Sports Movies Video Tennis Soccer Players Fun ICML The LSHTC Challenge series 3 / 19
4 Introduction The Problems in Large Scale Classification (1/2) Large number of parameters #cat #stems #docs cat/doc max path DMOZ 27, , ,992 1,02 5 Wiki Small 36, , ,148 1,86 10 Wiki Large 325,056 1,617,899 2,817, For DMOZ: 27, ,158 = 16,508,680,039 parameters should be learned 123Gb of memory ICML The LSHTC Challenge series 4 / 19
5 Introduction The Problems in Large Scale Classification (2/2) A lot of rare classes Distribution of documents in DMOZ nb of categories with n i >n α = category size n ICML The LSHTC Challenge series 5 / 19
6 Data format The Competitions Text from Web pages Pre-processing Stemming/lemmatization Stop-word removal Extraction of term-frequencies LibSVM format : : : : : ,45, : : : : : ICML The LSHTC Challenge series 6 / 19
7 LSHTC1 ( ) Data source: ODP Web directory Tracks: Basic, Cheap, Expensive, Full Single-label data Hierarchy type: tree Max number of categories: 12,000 Training/Test docs: 93,805/34,880 ICML The LSHTC Challenge series 7 / 19
8 LSHTC2 ( ) Data sources: ODP Web directory and Wikipedia Multi-label data Hierarchy type: tree and DAG Tracks (classes/documents/features) DMOZ: 27K/394K/594K Wikipedia small: 32K/456K/346K Wikipedia large: 325K/2.3M/1.6M (hierarchy is a graph) ICML The LSHTC Challenge series 8 / 19
9 Statistics on the Datasets #cat #stems #docs cat/doc max path DMOZ 27, , ,992 1,02 5 Wiki Small 36, , ,148 1,86 10 Wiki Large 325,056 1,617,899 2,817, ICML The LSHTC Challenge series 9 / 19
10 LSHTC3 ( ) Data sources: ODP Web directory and Wikipedia Large Scale Hierarchical Classification: Wikipedia small and large from LSHTC2 Multi-task Learning: DMOZ and Wikipedia datasets Refinement Learning: Expand hierarchy by splitting the nodes (based on DMOZ) ICML The LSHTC Challenge series 10 / 19
11 Track 1: Large Scale Hierarchical Classification 2 versions of Wikipedia dataset Task 1: medium-size (36,500 classes) Original text data Pre-processed data Task 2: large Wikipedia (325,000 classes) Multi-label and hierarchy is DAG ICML The LSHTC Challenge series 11 / 19
12 Track 2: Multi-task learning Wikipedia and DMOZ datasets Common feature space 12,000 classes for each dataset Single-label, hierarchy: DAG for Wikipedia and tree for DMOZ Goal: Leverage shared information in order to improve performance on each task Test sets are provided for both datasets. DMOZ Wikipedia shared info ICML The LSHTC Challenge series 12 / 19
13 Track 3: Refinement Learning Task 1: semi-supervised A reduced (12,000 classes) and an expanded (14,000 classes) hierarchy is available Two documents are given for each expanded class Goal: to reassign test documents to the new classes Root Books Music Comics Poetry Rock Jazz Funky Fusion ICML The LSHTC Challenge series 13 / 19
14 Track 3: Refinement Learning Task 1: semi-supervised A reduced (12,000 classes) and an expanded (14,000 classes) hierarchy is available Two documents are given for each expanded class Goal: to reassign test documents to the new classes Root Books Music Comics Poetry Rock Jazz Funky Root Books Music Fusion Task 2: unsupervised Only the reduced hierarchy is given Goal: expand the hierarchy Rock Jazz ICML The LSHTC Challenge series 13 / 19
15 Track 3: Refinement Learning Task 1: semi-supervised A reduced (12,000 classes) and an expanded (14,000 classes) hierarchy is available Two documents are given for each expanded class Goal: to reassign test documents to the new classes Root Books Music Comics Poetry Rock Jazz Funky Root Books Music Fusion Task 2: unsupervised Only the reduced hierarchy is given Goal: expand the hierarchy Rock Jazz ICML The LSHTC Challenge series 13 / 19
16 LSHTC The Competitions LSHTC4 organized within Kaggle One track: Very Large Scale Supervised Learning Based on the Wikipedia large dataset (325K classes) ICML The LSHTC Challenge series 14 / 19
17 More Info The Competitions Challenge site: You can download data and use the oracles for testing your systems In the near future all datasets will be released with ground truth answers Goal Establish standard benchmark datasets for large-scale classification ICML The LSHTC Challenge series 15 / 19
18 BioASQ ( A challenge on large-scale semantic indexing and question answering European FP7 project: Special Support Action Two tracks: 1 Large-Scale Semantic Indexing of Biomedical Documents 2 Question Answering for Biomedicine ICML The LSHTC Challenge series 16 / 19
19 Large-Scale Semantic Indexing (1/3) Classify articles from PubMed (biomedical journals) Test is performed in batches 5,000 articles per test set ICML The LSHTC Challenge series 17 / 19
20 Large-Scale Semantic Indexing (2/3) Field name Description JSON type pmid PubMed identifier string year Year of publication string journal Journal of publication string abstracttext Full text of the abstract string title Title of the article string meshmajor Mesh labels of the article array of strings Table : Description of the properties of the data for Task1a. ICML The LSHTC Challenge series 18 / 19
21 Large-Scale Semantic Indexing (3/3) Task 1a Task 2a Task 2a (sample) Articles 10,876,004 12,628,968 4,458,300 Unique labels 26,563 26,831 26,631 Labels per article Size in GB Table : Basic statistics about the training data for Task 1a and Task 2a. ICML The LSHTC Challenge series 19 / 19
Classification in a large number of categories
Classification in a large number of categories T. Artières, Joint work with the MLIA team at LIP6, the AMA team at LIG and Demokritos Lab. d informatique de Paris 6, France Lab. d Informatique de Grenoble
More informationUsing Centroids of Word Embeddings and Word Mover s Distance for Biomedical Document Retrieval in Question Answering
Using Centroids of Word Embeddings and Word Mover s Distance for Biomedical Document Retrieval in Question Answering Georgios-Ioannis Brokos 1, Prodromos Malakasiotis 1,2 and Ion Androutsopoulos 1,2 1
More informationFeature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web
Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Chenghua Lin, Yulan He, Carlos Pedrinaci, and John Domingue Knowledge Media Institute, The Open University
More informationSubset Labeled LDA: A Topic Model for Extreme Multi-Label Classification
Subset Labeled LDA: A Topic Model for Extreme Multi-Label Classification Yannis Papanikolaou and Grigorios Tsoumakas School of Informatics, Aristotle University of Thessaloniki, Greece ypapanik, greg@csd.auth.gr
More informationAnomaly Detection. You Chen
Anomaly Detection You Chen 1 Two questions: (1) What is Anomaly Detection? (2) What are Anomalies? Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior
More informationIntroduction to Web Clustering
Introduction to Web Clustering D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 June 26, 2009 Outline Introduction to Web Clustering Some Web Clustering engines The KeySRC approach Some
More informationScalable Learnability Measure for Hierarchical Learning in Large Scale Multi-Class Classification
Scalable Learnability Measure for Hierarchical Learning in Large Scale Multi-Class Classification Raphael Puget, Nicolas Baskiotis, Patrick Gallinari To cite this version: Raphael Puget, Nicolas Baskiotis,
More informationMulti-Stage Rocchio Classification for Large-scale Multilabeled
Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale
More informationClassification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014
Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationThe Functional Extension Parser (FEP) A Document Understanding Platform
The Functional Extension Parser (FEP) A Document Understanding Platform Günter Mühlberger University of Innsbruck Department for German Language and Literature Studies Introduction A book is more than
More informationQuery Phrase Expansion using Wikipedia for Patent Class Search
Query Phrase Expansion using Wikipedia for Patent Class Search 1 Bashar Al-Shboul, Sung-Hyon Myaeng Korea Advanced Institute of Science and Technology (KAIST) December 19 th, 2011 AIRS 11, Dubai, UAE OUTLINE
More informationExploitation and dissemination strategy. Authors: Michael Alvers, George Tsatsaronis, Matthias Zschunke and Axel-Cyrille Ngonga Ngomo
Project Deliverable D2.3 Distribution Intelligent Information Management Targeted Competition Framework ICT-2011.4.4(d) FP7-318652 / BioASQ Public http://www.bioasq.org Exploitation and dissemination strategy
More informationStudying the Impact of Text Summarization on Contextual Advertising
Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University
More informationQuery- vs. Crawling-based Classification of Searchable Web Databases
Query- vs. Crawling-based Classification of Searchable Web Databases Luis Gravano Panagiotis G. Ipeirotis Mehran Sahami gravano@cs.columbia.edu pirot@cs.columbia.edu sahami@epiphany.com Columbia University
More informationarxiv: v2 [cs.lg] 12 Jan 2018
HINET: HIERARCHICAL CLASSIFICATION WITH NEU- RAL NETWORK Zhezhou Wu SAP Iovatio Ceter, Sigapore Sigapore 119958 zhezhou.wu@sap.com Sea Saito Yale-NUS College Sigapore, 138533 sea.saito@u.yale-us.edu.sg
More informationInformation Retrieval
Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationCERMINE automatic extraction of metadata and references from scientific literature
CERMINE automatic of metadata and references from scientific literature Dominika Tkaczyk, Pawel Szostek, Piotr Jan Dendek, Mateusz Fedoryszak and Lukasz Bolikowski Interdisciplinary Centre for Mathematical
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationSelf-tuning ongoing terminology extraction retrained on terminology validation decisions
Self-tuning ongoing terminology extraction retrained on terminology validation decisions Alfredo Maldonado and David Lewis ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin
More informationUtilizing Probase in Open Directory Project-based Text Classification
Utilizing Probase in Open Directory Project-based Text Classification So-Young Jun, Dinara Aliyeva, Ji-Min Lee, SangKeun Lee Korea University, Seoul, Republic of Korea {syzzang, dinara aliyeva, jm94318,
More informationFeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis Supplementary Material
FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis Supplementary Material Nitika Verma Edmond Boyer Jakob Verbeek Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
More informationIPL at ImageCLEF 2010
IPL at ImageCLEF 2010 Alexandros Stougiannis, Anestis Gkanogiannis, and Theodore Kalamboukis Information Processing Laboratory Department of Informatics Athens University of Economics and Business 76 Patission
More informationarxiv: v1 [cs.lg] 5 May 2015
Reinforced Decision Trees Reinforced Decision Trees arxiv:1505.00908v1 [cs.lg] 5 May 2015 Aurélia Léon aurelia.leon@lip6.fr Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France
More informationMeSHProbeNet: A Self-attentive Probe Net for MeSH Indexing
Text Mining MeSHProbeNet: A Self-attentive Probe Net for MeSH Indexing Guangxu Xun 1,, Kishlay Jha 1, Ye Yuan 2, Yaqing Wang 3 and Aidong Zhang 1 1 Department of Computer Science, University of Virginia,
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationsummarization Ivan Ivanov, Peter Vajda, Jong-Seok Lee, Touradj Ebrahimi
1 Epitome A social game for photo album summarization Ivan Ivanov, Peter Vajda, Jong-Seok Lee, Touradj Ebrahimi Ecole Polytechnique Federale de Lausanne, Switzerland Motivation 2 Number of photos taken
More informationClassification of Daily Life Activities by Decision Level Fusion of Inertial Sensor Data
Classification of Daily Life Activities by Decision Level Fusion of Inertial Sensor Data Dominik Schuldhaus, Heike Leutheuser, Bjoern M. Eskofier October 2, 2013 Digital Sports Group Pattern Recognition
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationLarge-Scale Semantic Indexing and Question Answering in Biomedicine
Large-Scale Semantic Indexing and Question Answering in Biomedicine E. Papagiannopoulou *, Y. Papanikolaou *, D. Dimitriadis *, S. Lagopoulos *, G. Tsoumakas *, M. Laliotis **, N. Markantonatos ** and
More informationWord Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas
Word Embeddings in Search Engines, Quality Evaluation Eneko Pinzolas Neural Networks are widely used with high rate of success. But can we reproduce those results in IR? Motivation State of the art for
More informationLabel Filters for Large Scale Multilabel Classification
Alexandru Niculescu-Mizil NEC Laboratories America Princeton, NJ, USA alex@nec-labs.com Ehsan Abbasnejad Australian National University eabbasnejad@gmail.com Abstract When assigning labels to a test instance,
More informationThe Wikipedia XML Corpus
INEX REPORT The Wikipedia XML Corpus Ludovic Denoyer, Patrick Gallinari Laboratoire d Informatique de Paris 6 8 rue du capitaine Scott 75015 Paris http://www-connex.lip6.fr/denoyer/wikipediaxml {ludovic.denoyer,
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationA Personalized Approach for Re-ranking Search Results Using User Preferences
Journal of Universal Computer Science, vol. 20, no. 9 (2014), 1232-1258 submitted: 17/2/14, accepted: 29/8/14, appeared: 1/9/14 J.UCS A Personalized Approach for Re-ranking Search Results Using User Preferences
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London What Is Text Clustering? Text Clustering = Grouping a set of documents into classes of similar
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationThe Kinect Sensor. Luís Carriço FCUL 2014/15
Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time
More informationCS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University
CS490W Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti] Clustering Document clustering Motivations Document
More informationLiterature Databases
Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and
More informationMedical image analysis and retrieval. Henning Müller
Medical image analysis and retrieval Henning Müller Overview My background Our laboratory Current projects Khresmoi, MANY, Promise, Chorus+, NinaPro Challenges Demonstration Conclusions 2 Personal background
More informationAn Ontology-Based Information Retrieval Model
An Ontology-Based Information Retrieval Model M. Fernández, D. Vallet, P. Castells Universidad Autónoma de Madrid Escuela Politécnica Superior Ontology Workshop Strasbourg, France, 25 October 2005 Contents
More informationCitation extraction and modeling. Meen Chul Kim, Andrea Forte, Aaron Halfaker
Citation extraction and modeling Meen Chul Kim, Andrea Forte, Aaron Halfaker History 2005 - Rebuilt Mediawiki with references as first class objects in the system. - it had a summary page and discussion
More informationFinal report - CS224W: Using Wikipedia pages traffic data to infer pertinence of links
Final report - CS224W: Using Wikipedia pages traffic data to infer pertinence of links Marc Thibault; SUNetID number: 06227968 Mael Trean; SUNetIDnumber: 06228438 I - Introduction/Motivation/Problem Definition
More informationA modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems
A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University
More informationInterpreting Document Collections with Topic Models. Nikolaos Aletras University College London
Interpreting Document Collections with Topic Models Nikolaos Aletras University College London Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research Talk Outline Introduction
More informationDialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning
Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Julien Perez 1 and Y-Lan Boureau 2 and Antoine Bordes 2 1 Naver Labs Europe, Grenoble, France 2 Facebook
More informationJKernelMachines: A Simple Framework for Kernel Machines
Journal of Machine Learning Research 14 (2013) 1417-1421 Submitted 2/12; Revised 11/12; Published 5/13 JKernelMachines: A Simple Framework for Kernel Machines David Picard ETIS - ENSEA/CNRS/Université
More informationSemi-supervised Document Classification with a Mislabeling Error Model
Semi-supervised Document Classification with a Mislabeling Error Model Anastasia Krithara 1, Massih R. Amini 2, Jean-Michel Renders 1, Cyril Goutte 3 1 Xerox Research Centre Europe, chemin de Maupertuis,
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationExploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge
Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science
More informationSemi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction
Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir
More informationGenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies
Leopold Franzens Universität Innsbruck GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies Martin HEPP DERI Innsbruck
More informationDocument Retrieval using Predication Similarity
Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey
More informationVISION FOR AUTOMOTIVE DRIVING
VISION FOR AUTOMOTIVE DRIVING French Japanese Workshop on Deep Learning & AI, Paris, October 25th, 2017 Quoc Cuong PHAM, PhD Vision and Content Engineering Lab AI & MACHINE LEARNING FOR ADAS AND SELF-DRIVING
More information2. Design Methodology
Content-aware Email Multiclass Classification Categorize Emails According to Senders Liwei Wang, Li Du s Abstract People nowadays are overwhelmed by tons of coming emails everyday at work or in their daily
More information3DNSITE: A networked interactive 3D visualization system to simplify location awareness in crisis management
www.crs4.it/vic/ 3DNSITE: A networked interactive 3D visualization system to simplify location awareness in crisis management Giovanni Pintore 1, Enrico Gobbetti 1, Fabio Ganovelli 2 and Paolo Brivio 2
More informationSemi-Supervised Learning with Explicit Misclassification Modeling
Semi- with Explicit Misclassification Modeling Massih-Reza Amini, Patrick Gallinari University of Pierre and Marie Curie Computer Science Laboratory of Paris 6 8, rue du capitaine Scott, F-015 Paris, France
More informationFlow-Based Video Recognition
Flow-Based Video Recognition Jifeng Dai Visual Computing Group, Microsoft Research Asia Joint work with Xizhou Zhu*, Yuwen Xiong*, Yujie Wang*, Lu Yuan and Yichen Wei (* interns) Talk pipeline Introduction
More informationInconsistent Node Flattening for Improving Top-down Hierarchical Classification
2016 IEEE International Conference on Data Science and Advanced Analytics Inconsistent Node Flattening for Improving Top-down Hierarchical Classification Azad Naik and Huzefa Rangwala George Mason University
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationYOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationDesign and Realization of Agricultural Information Intelligent Processing and Application Platform
Design and Realization of Agricultural Information Intelligent Processing and Application Platform Dan Wang 1,2 1 Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing
More informationarxiv: v1 [cs.lg] 5 Mar 2013
GURLS: a Least Squares Library for Supervised Learning Andrea Tacchetti, Pavan K. Mallapragada, Matteo Santoro, Lorenzo Rosasco Center for Biological and Computational Learning, Massachusetts Institute
More informationNearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications
Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for
More informationImproving Difficult Queries by Leveraging Clusters in Term Graph
Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationFuzzy Multilevel Graph Embedding for Recognition, Indexing and Retrieval of Graphic Document Images
Cotutelle PhD thesis for Recognition, Indexing and Retrieval of Graphic Document Images presented by Muhammad Muzzamil LUQMAN mluqman@{univ-tours.fr, cvc.uab.es} Friday, 2 nd of March 2012 Directors of
More informationThe Linked Data Value Chain Model: A Methodology for Information Integration and Orchestration
Na/onal Research University Higher School of Economics The Linked Data Value Chain Model: A Methodology for Information Integration and Orchestration Daniel Hladky Semantic Web Lab at HSE/W3C 28 November
More informationAROMA results for OAEI 2009
AROMA results for OAEI 2009 Jérôme David 1 Université Pierre-Mendès-France, Grenoble Laboratoire d Informatique de Grenoble INRIA Rhône-Alpes, Montbonnot Saint-Martin, France Jerome.David-at-inrialpes.fr
More informationMining Social Media Users Interest
Mining Social Media Users Interest Presenters: Heng Wang,Man Yuan April, 4 th, 2016 Agenda Introduction to Text Mining Tool & Dataset Data Pre-processing Text Mining on Twitter Summary & Future Improvement
More informationCollective Intelligence in Action
Collective Intelligence in Action SATNAM ALAG II MANNING Greenwich (74 w. long.) contents foreword xv preface xvii acknowledgments xix about this book xxi PART 1 GATHERING DATA FOR INTELLIGENCE 1 "1 Understanding
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationWikipedia Retrieval Task ImageCLEF 2011
Wikipedia Retrieval Task ImageCLEF 2011 Theodora Tsikrika University of Applied Sciences Western Switzerland, Switzerland Jana Kludas University of Geneva, Switzerland Adrian Popescu CEA LIST, France Outline
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationCollaborative Ontology Construction using Template-based Wiki for Semantic Web Applications
2009 International Conference on Computer Engineering and Technology Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications Sung-Kooc Lim Information and Communications
More informationDBpedia Extracting structured data from Wikipedia
DBpedia Extracting structured data from Wikipedia Anja Jentzsch, Freie Universität Berlin Köln. 24. November 2009 DBpedia DBpedia is a community effort to extract structured information from Wikipedia
More informationClustering Results. Result List Example. Clustering Results. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationHierFlat: flattened hierarchies for improving top-down hierarchical classification
Int J Data Sci Anal (2017) 4:191 208 DOI 10.1007/s41060-017-0070-1 REGULAR PAPER HierFlat: flattened hierarchies for improving top-down hierarchical classification Azad Naik 1 Huzefa Rangwala 1 Received:
More informationSemantic Overlay Networks
Semantic Overlay Networks Arturo Crespo and Hector Garcia-Molina Write-up by Pavel Serdyukov Saarland University, Department of Computer Science Saarbrücken, December 2003 Content 1 Motivation... 3 2 Introduction
More informationUsing Multi-Task Learning For Large-Scale Document Classification. Azad Naik Bachelor of Technology Indian School of Mines, 2009
Using Multi-Task Learning For Large-Scale Document Classification A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at George Mason University By Azad Naik
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationF. Aiolli - Sistemi Informativi 2006/2007
Text Categorization Text categorization (TC - aka text classification) is the task of buiding text classifiers, i.e. sofware systems that classify documents from a domain D into a given, fixed set C =
More informationMulti-Aspect Tagging for Collaborative Structuring
Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund
More informationLecture 5: Object Detection
Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based
More informationAn Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques
An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques K. Ntirogiannis, B. Gatos and I. Pratikakis Computational Intelligence Laboratory, Institute of Informatics and
More informationWEB LIP6 STÉPHANE GANÇARSKI Z. PEHLIVAN, M. CORD, M. BEN-SAAD, A. SANOJA,N. THOME, M. LAW
WEB ARCHIVING @ LIP6 STÉPHANE GANÇARSKI Z. PEHLIVAN, M. CORD, M. BEN-SAAD, A. SANOJA,N. THOME, M. LAW French ANR project Cartec (ended) European project Scape (ends 9/2014) Web Archives Web is ephemeral
More informationText Extraction in Video
International Journal of Computational Engineering Research Vol, 03 Issue, 5 Text Extraction in Video 1, Ankur Srivastava, 2, Dhananjay Kumar, 3, Om Prakash Gupta, 4, Amit Maurya, 5, Mr.sanjay kumar Srivastava
More informationParis-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification
Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification Xavier Roynard, Jean-Emmanuel Deschaud, François Goulette To cite this version: Xavier Roynard, Jean-Emmanuel Deschaud,
More informationInformation Retrieval
Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More information