Riding the Wave: Move Beyond Text TIB's strategy in the context of non-textual materials Uwe Rosemann, Irina Sens IATUL Conference Singapur
Outline TIB Role and functions Requirements Politicians - Funders Users Examples for solutions DataCite AV-Portal chemocr Visual search Long-time preservation 2
TIB Hannover Some of the facts = German National Library of Science and Technology engineering, architecture, chemistry, information technology, mathematics and physics Founded in 1959 Financed by Federal Government and all Federal States 3
Main Building 4
Marstall Building 5
Marstall Building a former horse stable 6
Castle 7
Main Stacks 8
TIB Hannover Some of the facts 11 mio annual acquisition budget 18,500 journal subscriptions 7 mio items Staff: ca. 175 FTE 9
Global Network TechLib 10
Customers Europe 10% Germany 71% USA 14% World 5% 11
Main Services Provision of scientific content full texts, document delivery, interlibrary loan Scientific retrieval portal GetInfo Long-time preservation DOI Services for research data Research and development 12
Veränderungen im Wissenschaftsprozess Jim Gray, escience Group, Microsoft Research 13
A Gap A widening gap in the scientific record between published research in a text document and the data that underlies it As a result, datasets are Difficult to discover Difficult to access Scientific information gets lost 14
Requirements - Politics Knowledge is power; Europe must manage the digital assets its researchers generate. 15
Riding the wave How Europe can gain access from the rising tide of scientific data. Final report of the High Level Expert Group on Scientific Data. 16
Strategy Move beyond text Scientific Films 3D Objects Software Simulation Research Data Text 17
Move beyond text Consequences for TIB Research communities produce many types of scientific and technical information Each has its own unique characteristics and life cycle Must become capable of accepting and managing new media formats 18
GetInfo Portal for Science and Technology 45 Mio metadata index 150 Mio metadata in external sources 1,8 Mio documents AV-Media GetInfo mobile 19
Move beyond text Consequences for TIB We have to open our portal to this non-textual information 20
Joint Science Conference statement An increasingly important user needs addressed by TIB is the systematic collection, registration, archiving, indexing, and optimized provision of audio-visual materials using the latest technical possibilities. TIB is the appropriate institution to build up expertise in the area of non-textual materials. Systematic acquisition of scientific objects Object specific search and presentation Long-term archiving Development of standards (in collaboration) Applied research (visual search, search, automatic Content analysis etc) 21
How have we been preparing? Infrastructure for research data- DataCite Visual Search tools for AV-media 3D-Objects Architecture chemocr Visual access to research data 22
Collaboration Research Data In 2005, the TIB became a non-commercial DOI registration agency for research data In 2010, the TIB became co-founder of the international Datacite consortium to establish easier access to scientific research data on the Internet. Mission Citability of research data High visibility of the data Easy re-use and verification of the data sets Increasing quality of published papers 23
DataCite Members Australia Canada Denmark France Germany Italy The Netherland Sweden Switherland UK USA Korea (affiliated) 24
DataCite Structure International DOI Foundation Member DataCite Managing Agent (TIB) Member Institution Member Institution Associate Stakeholder Data Centre Data Centre Data Centre Data Centre Data Centre Data Centre Works with 25
Example: EHEC virus 26
Example: EHEC virus 27
How have we been preparing? Infrastructure for research data- DataCite Visual Search tools for AV-media 3D-Objects Architecture chemocr Visual access to research data 28
AV media A N A L Y S I S visual structural auditive source: Scorupka, Sascha, Experiment der Woche, 2011 Object detection & clustering Genre analysis Intelligent Character Recognition (ICR) Scene/ shot detection Speaker detection Automatic Speech Recognition (ASR) Semantic and content based indexing on extracted features on extracted text 29
Keyframes Annotation Machine learning using visual features I N D E X I N G Textual Metadata ICR derived text Audio Transkript Genre Classes Graphical : Animation Graphical : Drawing Graphical : Diagram Real : Outdoor Real : Indoor Real : Lecture / Conference Real : Interview Real : Buildings... e.g. person xy location xy subject xy domain xy... Named Entity Recognition Mapping Ontologies Taxonomies Thesauri e.g. bibliographic, geographical, encyclopedic data 30
AV media D I S P L A Y Faceted search Info ASR Explorative/ Semantic search Keyframe navigation Navigation on audio text Die Technik des Lasers ist aus dem heutigen Alltag kaum mehr wegzudenken.... Variable Strahl- Aufweitungssysteme in Berlin Adlershof - der Stadt für Wissenschaft, Wirtschaft und... 31
How have we been preparing? Infrastructure for research data- DataCite Visual Search tools for AV-media 3D-Objects Architecture chemocr Visual access to research data 32
3D objects an excursion to Architecture 33
Visual search tools visual search content based indexing 34
Content based indexing segmentation with form-primitives extraction of room connectivity graphs 35
Visual search attributed graph 3D sketch result visualization 36
How have we been preparing? Infrastructure for research data- DataCite Visual Search tools for AV-media 3D-Objects Architecture chemocr Visual access to research data 37
Information Retrieval in Chemistry Search for chemical structures how? Chemists are used to drawing? 38
Textual and non-textual chemical information Table with reaction scheme Chemical Names 2a-i: Derivates from the reaction Linked entities from the table Chemical structure Reaction scheme 39
Non-textual data processing chemocr image data chemical structure data CLiDE chemocr 40
Information retrieval in chemistry Text AND Formulas 41
How have we been preparing? Infrastructure for research data- DataCite Visual Search tools for AV-media 3D-Objects Architecture chemocr Visual access to research data 42
Numerical data Zeit [h] T [ C] 1 12 2 13 3 12 4 12 5 13 6 35 7 17 8 11 9 10 10 12 11 13 12 13 13 12 14 12 15 12 16 11 17 11 18 10 19 10 20 11 21 11 22 10 23 12 24 12 43
Visual access to research data 44
Last but not least. Long Term Preservation Digital texts AV media 3D objects etc. 45
Conclusion Dissemination of Scientific and Technical Information has been a foundational mission. The methods have completely changed, but the mission remains the same. 46
Conclusion Ultimate Goal: Interlinking and Search Across All Types of Digital Assets. 47
Questions? 48