Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Similar documents
Altmetrics for large, multidisciplinary research groups

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

Getting Started with your Explorer for Institutions access

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Who is Citing Your Work?

The Computation and Data Needs of Canadian Astronomy

Ideas to help making your research visible

Scuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

SCOPUS. Scuola di Dottorato di Ricerca in Bioscienze e Biotecnologie. Polo bibliotecario di Scienze, Farmacologia e Scienze Farmaceutiche

EERQI Innovative Indicators and Test Results

Science 2.0 VU Processing Science 2.0 Data, Content Mining

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

e-infrastructures in FP7 INFO DAY - Paris

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Open Access Statistics: Interoperable Usage Statistics for Open Access Documents

Quick Start Guide to Aurora. Contents

Scientific Data Curation and the Grid

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Invenio: A Modern Digital Library for Grey Literature

Introduction to Grid Computing

DOIs for Research Data

The Materials Data Facility

Commercial Data Intensive Cloud Computing Architecture: A Decision Support Framework

Application of machine learning and big data technologies in OpenAIRE system

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Information Literacy. By Arnold Mwanzu

Ranking Web of Repositories Metrics, results and a plea for a change. Isidro F. Aguillo Cybermetrics Lab CCHS - CSIC

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

GÉANT Community Programme

The Social Grid. Leveraging the Power of the Web and Focusing on Development Simplicity

DATA SHARING FOR BETTER SCIENCE

Some Reflections on Advanced Geocomputations and the Data Deluge

Data Citation and Scholarship

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Green Supercomputing

Information Networks: PageRank

CSCS CERN videoconference CFD applications

The Future of High- Performance Computing

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

OpenAIRE Open Knowledge Infrastructure for Europe

Your Research Social Media: Leverage the Mendeley platform for your needs

You need to start your research and most people just start typing words into Google, but that s not the best way to start.

Semantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

Scopus. Information literacy in Chemistry. J une 14, 2011

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

The Virtual Observatory and the IVOA

Digital repositories as research infrastructure: a UK perspective

Finding a needle in Haystack: Facebook's photo storage

Benchmarking Google Scholar with the New Zealand PBRF research assessment exercise

The Evolving World of Medical Publishing

Scientific databases

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Dennis Gannon Data Center Futures Microsoft Research

Scholarly Big Data: Leverage for Science

In the absence of a contextslide, CSIRO has a data repository ( It s

The Fusion Distributed File System

ResEval: An Open and Resource-oriented Research Impact Evaluation tool

How to Guide. For Personal Users

Reproducibility and FAIR Data in the Earth and Space Sciences

Guide to SciVal Experts

CORE: Improving access and enabling re-use of open access content using aggregations

Introduction. Internet and Social Networking Research Tools for Academic Writing. Introduction. Introduction. Writing well is challenging enough

Scopus. Quick Reference Guide

Reflections on Three Decades in Internet Time

I data set della ricerca ed il progetto EUDAT

OpenAIRE From Pilot to Service

Taylor & Francis Online. A User Guide.

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Applications of HPCC Systems at Clemson University. Amy Apon, PhD Linh Ngo, PhD Michael Payne Big Data Systems Laboratory Clemson University

EUDAT- Towards a Global Collaborative Data Infrastructure

Contents/Navigation: xxxxxxx

THE EVOLUTIONARY CASE FOR OPEN ACCESS

Preservation of Web Materials

SHARING YOUR RESEARCH DATA VIA

Your Open Science and Research Publishing Platform. 1st SciShops Summer School

HPC IN EUROPE. Organisation of public HPC resources

ScienceDirect. University of Wolverhampton. Goes beyond search to research

The bx Scholarly Recommender Service. Nettie Lagace Product Manager, Ex Libris


How to Guide. For Personal Users

Data Curation Profile Movement of Proteins

OUR VISION To be a global leader of computing research in identified areas that will bring positive impact to the lives of citizens and society.

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Scientific Electronic Library Online

A VO-friendly, Community-based Authorization Framework

Organize. Collaborate. Discover. All About Mendeley

Some of your assignments will require you to develop a topic. The search process & topic development is often a circular, iterative process

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Using Scopus. Scopus. To access Scopus, go to the Article Databases tab on the library home page and browse by title.

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Striving for efficiency

Large-Scale Data Engineering. Overview and Introduction

The Future of High Performance Computing

4th EBIB Conference Internet in libraries Open Access Torun, December 7-8, 2007

Transcription:

W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz WS 2015/16 u www.tugraz.at

Agenda Repetition from last time: altmetrics / altmetrics in practice Big Data and Science E-Science E-Infrastructures Bibliometric Network Analysis Your Assignment! 2

Altmetrics (repetition) Altmetric is the creation and study of new metrics based on the Social Web for analyzing and informing scholarship - Altmetrics Manifesto, http://altmetrics.org/about Aggregated from many sources (e.g. Twitter, Mendeley, github, slideshare,...) Article Level Metrics (ALM) multidimensional suite of transparent and established metrics at article level 3

Examples for Altmetrics sources (repetition) Usage Views, downloads,.. Captures Bookmarks, readers,.. Mentions Blog posts, news stories, Wikipedia articles, comments, reviews Social Media Tweets, Google+, Facebook likes, shares, ratings Citations Web of Science, Scopus, Google Scholar,... 4

Examples: Altmetric.com 5 Source: http://www.altmetric.com/details.php?domain=www.altmetric.com&citation_id=843656

Lessons learned (repetition) Alternative ways to assess impact of various scientific outputs No common understanding of altmetrics yet What do they really express? Are they useful and for which part of the research process? Not necessarily better metrics E.g. Gamification Can help to get an overview of a research field Visualizations based on altmetrics 6

Modern Science: What has changed? 150 years later: Searching for new particles like Higgs boson with the Large Hadron Collider Built in collaboration with over 10,000 scientists and engineers from over 100 countries, hundreds of universities and laboratories. In a tunnel of 27 km in circumference,175 m deep, near Geneva 7

Motivation Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and engineering) generate large and complex datasets (Big Data) require more advanced database and architectural support New kind of research methodology has emerged (fourth paradigm of scientific exploration (Hey, 2007) based on statistical exploration of big amounts of data 8 http://www.ksi.mff.cuni.cz/astropara/

Data intensive scientific discovery 9 http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf

Example: Big Data in Science - European Exascale Projects Exascale computing: computers capable of at least one exaflops (10 18 floating point operations per second) à Not yet achieved, currently 10 15 10 http://exascale-projects.eu

www.tugraz.at n Publications as Big Data CrossJournal Recommendation based on Click Streams [Bollen et al., 2009] 11

e-science Large scale science (since 1999) Data-driven discovery Focus on computationally intensive science and how to tackle it using highly distributed environments in collaborative manner Powerful computers: Supercomputers, High Performance Computing (HPC), Grid, Distributed Computing Powerful research infrastructures e-infrastructures, grids, clouds 12 http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores/3

Supercomputers large, expensive systems, usually housed in a single room, in which multiple processors are connected by fast local network Suited for highly complex, real-time applications and simulation Pros: data can move between processors rapidly à all processors can work together on same tasks Cons: expensive to build and maintain. Do not scale well, e.g. adding more processors is challenging 13 http://www.top500.org/lists/2014/06/ http://www.wikihow.com/build-a-supercomputer

Distributed Computing systems in which processors are not necessarily located in close proximity to one another and can even be housed on different continents but which are connected via the Internet or other networks Pros: relative to supercomputers much less expensive. Cons: less speed achieved than with supercomputers 14

Example: Hadoop Ecosystem of tools for processing big data Simple computational model two-stage method for processing large data amounts design an algorithm for operating on one chunk of the data in two stages (a Map and a Reduce stage), MapReduce automatically distributes that algorithm to cluster à hides complexity in framework 15 http://hadoop.apache.org http://architects.dzone.com/articles/how-hadoop-mapreduce-works

Hadoop in escience: Example: Astronomical Image Processing Large telescopes survey sky over a prolonged period of time. Large Synoptic Survey Telescope LSST - under construction - will capture 1/2 of sky over 10 years - 30TB of data every night - ~60PBs in 10 years Astronomers pick out faint objects for study by capturing multiple images of same area and by combining them coaddition Challenge: how to organize and process all the resulting data. 16 http://www.lsst.org/lsst/

Using Hadoop to help with image coaddition 17 http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop

Virtual Science Environments Not only HPC but also sharing of knowledge and data is becoming a requirement for scientific discovery providing useful mechanisms to facilitate this sharing Preserve and organize research data à Virtual Science Environments: virtual environments in which researchers work together through ubiquitous, trusted and easy access to services for scientific data, computing and networking, enabled by e-infrastructures 18

Defining e-infrastructures European e- Infrastructure Reflection group (e-irg): The term e-infrastructure refers to this new research environment in which all researchers whether working in the context of their home institutions or in national or multinational scientific initiatives have shared access to unique or distributed scientific facilities (including data, instruments, computing and communications), regardless of their type and location in the world. 19 http://www.e-irg.eu/about-e-irg.html

e-infrastructures - Goals Opening access to knowledge through reliable, distributed and participatory data e-infrastructures Cost effective infrastructures for preservation and curation for re-use of data Persistent availability of information and linking people and data through flexible and robust digital identifiers Interoperability for consistency of approaches on global data exchange (e.g. standards) Enabling trust through authentication and authorisation mechanisms 20 http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/framework-for-action-in-h2020_en.pdf

Example: e-infrastructure OpenAIRE The European Open Access Data Infrastructure for Scholarly and Scientific Communication Functionality: Harvesting and storing of information about publications from various repos (OAI-PMH) Enables searching for publications and related infos (e.g. funding,..) Provides list of OA repos that can be used to store publications Orphan repo Shows statistics of stored data 21 https://www.openaire.eu

OpenAIRE - Applications 22

Example: e-infrastructures Austria 1/2 23 http://www.e-infrastructures.at

Example: e-infrastructures Austria 2/2 24

Take away message Big Science / e-science: data-driven, large scale science Supercomputers and distributed computing Virtual research environments e-infrastructures 25

Bibliometric Network Analysis 26

Bibliometrics Quantitative study of all kinds of bibliographic data Patterns of authorship, publications, citations E.g: citation analysis of research outputs/publication Assess research impact of individuals, groups, institutions Measuring by Author (H Index), Article (Plos), or Publication (Journal Impact Factor) Measure of Output not Quality (Quantitative Not Qualitative!) Other measures could include funding received, number of patents, awards granted, or qualitative measures such as peer review 17/04/2015 Maynooth University

Why use Bibliometrics? Measure impact of research/publishing activity CV, promotion, tenure, grants, feedback to funding bodies/ industry/public Showcase Individual/Group/Institutional Research identify Areas of Research Strengths/Weaknesses Inform Research Priorities Identify highest impact or top performing Journals in a Subject Area Where to Publish, learning about a particular subject area, identify emerging areas of research Identify the top researchers in a subject area Collaborations/Competitors Recruitment Learning about a subject area 17/04/2015 Maynooth University

Bibliometric Networks Represent scientific literature based on bibliographic data in form of networks Helps providing overview of structure of scientific literature e.g. in a domain or wrt a topic Applications Identify main research areas within a field Analyze relationship between research areas 29

Bibliometric Networks Co-authorship networks Citation networks Co-citation networks Co-occurence maps Keywords, extracted topics,.. 30

Co-authorship Networks Scientific collaboration network Nodes are authors of publications Link between authors if they co-authored a publication Collaboration networks are scale-free Co-authorship networks are Affiliation Networks 31

Co-authorship networks: Example 32

Citation Networks Nodes are publications Link between nodes if publications cite each other Reveals how often articles were cited 33

Citation Networks 34 http://eduinf.eu/2012/03/15/co-citation-analysis-of-the-topic-social-network-analysis/

Co-Citation Networks Nodes are publications Links between nodes if two publications were cited together in a paper How often two articles were cited by some third article OR: nodes are authors Links between nodes if authors were cited together To identify clusters of authors 35

www.tugraz.at n Author co-citation network of 15 history & philosophy of science journals. Two authors are connected if they are cited together in some article, and connected more strongly if they are cited together frequently 36 http://www.scottbot.net/hial/?p=38272

Mining in Scientific Networks Find influential researchers Find influential papers Investigate patterns of scientific collaboration... 37

Centrality Measures Degree Centrality equals to number of links (connections) a node has à In citation networks papers that have high in-degree centrality have a lot of citations à Widely used metric for measuring the scientific impact of a paper 38

Centrality Measures Extension of degree centrality Degree centrality awards one centrality point for every neighbor a node has However, not all neighbors are equally important In many cases importance of node increased by having connections to other nodes that are themselves important Eigenvector centrality: not only count of neighbors is important but also the importance of the neighbors Eigenvector centrality gives each node score proportional to the sum of the scores of its neighbors 39

Centrality Measures in Python https://networkx.github.io/documentation/latest/ reference/algorithms.centrality.html 40

Summary Big Science E-Science E-Infrastructure Bibliometrics Bibliometric Network Analysis 41

Thank you for your attention! 42