e-neighbourhood Virtual Organisation and PNC Simon C. Lin ( ) Academia Sinica, Taipei, Taiwan 11529 sclin@gate.sinica.edu.tw 1 November 2005
e-neighbourhood is suggested by C. C. Hsieh, he does not like the name PNC Virtual Organisation (VO) VO is not necessarily just a single application domain as in some e-science collaboration I am compiling others slides, a la Confucius, 2
6 Ultimately, the Globus Toolkit is designed to enable the creation and maintenance of Virtual Organizations 3
7 Virtual Organizations Distributed resources and people Linked by networks, crossing admin domains Sharing resources, common goals Dynamic VO-A VO-B 4
8 Virtual Organizations Distributed resources and people Linked by networks, crossing admin domains Sharing resources, common goals Dynamic Fault tolerant VO-A VO-B 5
Its all about Virtual Organizations! Different views " Dynamic enterprises " Coalitions " escience collaboration " On demand computing " Utility computing! Same problem " Support work across dynamic communities with vested self interest 6
4 A new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology, and pulled by the expanding complexity, scope and scale of today s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive cyberinfrastructure on which to build new types of scientific and engineering knowledge environments and organizations, and to pursue research in new ways and with increased efficacy eport of the National Science Foundation Blue ibbon Advisory Panel, 2003 7
Water, water, everywhere, nor any drop to drink S. T. Coleridge, 1797
The Data Deluge A large novel: 1 Mbyte; The Bible: 5 Mbytes A Mozart symphony (compressed): 10 Mbytes A digital mammogram: 100 Mbytes OED on CD: 500 Mbytes Digital movie (compressed): 10 Gbytes Annual production of refereed journal literature ( 20 k journals; 2 M articles): 1 Tbyte Library of Congress: 20 Tbytes The Internet Archive (10 B pages) (From 1996 to 2002): 100 Tbytes Annual production of information (print, film, optical & magnetic media): 1500 to 3000 Pbytes All Worldwide Telephone communication in 2002: 19.3 ExaBytes Moore s Law enables instruments and detectors to generate unprecedented amount of data in all scientific disciplines 9
LHC/Atlas in Action
The LHC Data Challenge Starting from this event Selectivity: 1 in 10 13 Like looking for 1 person in a thousand world populations! Or for a needle in 20 million haystacks! You are looking for this signature Src: CEN 11
The Computing Needs will grow It may grow to the scale of ExaBytes of Data and PetaFlops Computing by 2015, in particular, the luminosity will be enhanced even in early stage The largest commercial database currently can only handle tens of TeraBytes The fastest stand-alone computer now is only capable of delivering 70 TeraFlops peak
Enabling Grids for E-sciencE Tera Peta Bytes AM time to move 15 minutes 1Gb WAN move time 10 hours ($1000) Disk Cost 7 disks = $5000 (SCSI) Disk Power 100 Watts Disk Weight 5.6 Kg Disk Footprint Inside machine AM time to move 2 months 1Gb WAN move time 14 months ($1 million) Disk Cost 6800 Disks + 490 units + 32 racks = $7 million Disk Power 100 Kilowatts Disk Weight 33 Tonnes Disk Footprint 60 m 2 May 2003 Approximately Correct Distributed Computing Economics Jim Gray, Microsoft esearch, MS-T-2003-24 INFSO-I-508833 Academia Sinica, Taiwan 13
Enabling Grids for E-sciencE Mohammed & Mountains Petabytes of Data cannot be moved It stays where it is produced or curated Hospitals, observatories, European Bioinformatics Institute, A few caches and a small proportion cached Distributed collaborating communities Expertise in curation, simulation & analysis Distributed & diverse data collections Discovery depends on insights Unpredictable sophisticated application code Tested by combining data from many sources Using novel sophisticated models & algorithms What can you do? If the mountain won't come to Mohammed, Mohammed must go to the mountain Move Computation to the Data INFSO-I-508833 Academia Sinica, Taiwan 14
From Optimizing Architecture to Optimizing Organisation High-performance computing has moved from being a problem of optimizing the architecture of an individual supercomputer to one of optimizing the organization of large numbers of ordinary computers operating in parallel. Scott Kirkpatrick, SCIENCE Vol. 299, 2003, p668 15
It s about Collaboration
Open Source Model A world-wide community of people cooperatively developing software A software development analogue of open scientific inquiry Users have much greater control over their computing environment An attempt to account for the costs of software development honestly A new kind of knowledge- and community-building infrastructure Note: It potentially allows academic specialities, educators, civic organizations, business enterprises and others to develop their own innovative vehicles for sharing and elaborating a common body of knowledge. A force reshaping the software industry 17
Some Quotations If I have been able to see further, it was only because I stood on the shoulders of giants. --- Isaac Newton, Letter to obert Hooke The real problem has always been, in my opinion, getting people to collaborate on a solution [for a common problem]. --- David Williams, from 50 years of Computing at CEN 18
Enabling Grids for E-sciencE Why work together Wonderful opportunity Can do things that can t be done alone its more fun! ecognising and Establishing e-dreams Combine our creativity, ingenuity and resources Challenge so Hard can t go it alone Building a broad user community is hard Building e-infrastructure is hard Deploying e-infrastructure is hard Sustaining e-infrastructure is very hard ealising our potential Multipurpose & Multidiscipline infrastructure Amortise costs International competition and collaboration Source: Malcolm Atkinson INFSO-I-508833 2 nd EGEE Conference Den Haag - 23 rd November 2004 19
Enabling Grids for E-sciencE ules of Engagement A foundation of Honesty Admit what we don t understand Admit the limits of our software Admit the limits of our support ealism Working together is worthwhile so is competition choose! But they both take time to show a profit persist Openness and Mutual espect Stand shoulder-to-shoulder to meet challenges Be prepared to adapt your ideas and plans Commitment Make strong / explicit commitments or say no Honour the commitments you make Source: Malcolm Atkinson INFSO-I-508833 2 nd EGEE Conference Den Haag - 23 rd November 2004 20
Grid - its really about collaboration! It s about sharing and building a vision for the future And it s about getting connected It s about the democratization of science It takes advantage of Open Source! Source: Vicky White
Success on a Worldwide scale If we can bring together people from all over the world (whether they be physicists, biologists, computer scientists, climate researchers or.) and they Want to be part of building the cyber infrastructure or Grid environments or e-science environments for the future Actively participate Get benefit from the collaboration Then we will be succeeding Source: Vicky White
IT Holy Grail
IT Historical Perspective 1960 1970 1980 1990 2000 esult of 40 Years of Technology Evolution: Complex, multiple systems and processes 200 billion lines of legacy code on 30,000 mainframes worldwide 40-60 billion lines of code need modernization over next five years The CIO of the Future Changing the Dialogue 25 Oct 2005 page
Notes from EDS slide The convergence of business & IT agendas is underway but there s no common language and their infrastructures can t support the business Nearly everyone has accumulated a legacy environment because of mergers and acquisitions, reorganizations, decisions to centralize then decentralize and vice versa The result is IT sprawl - the unplanned, uncoordinated legacy systems and processes which create rigid environments incapable of supporting the needs of today and tomorrow Globalization will continue... changing competitive landscapes... Where, how, when and by whom work is performed will change... No single company can do it alone - business ecosystems will dominate 25
The Fourth Wave of IT Evolution Grid computing as the 4th Wave of IT evolution $ Source: Insight eports, Global Information Inc. 26
e-business e-science and the Grid e-business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. The growing use of outsourcing is one example e-science is the similar vision for scientific research with international participation in large accelerators, satellites or distributed gene analyses. The Grid integrates the best of the Web, traditional enterprise software, high performance computing and Peerto-peer systems to provide the information technology infrastructure for e-moreorlessanything. A deluge of data of unprecedented and inevitable size must be managed and understood. People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported Source: G. Fox
Is the Timing ight? Does PNC have the capacity to do it?
EGEE/LCG-2 Grid Sites : September 2005 Country providing resources Country anticipating joining EGEE/LCG-2 grid: 160 sites, 36 countries >15,000 processors, ~5 PB storage Other national & regional grids: ~60 sites, ~6,000 processors
Open Science Grid OSG Production 46 CEs, 15459 CPUs 6 SEs http://osg-cat.grid.iu.edu/ October 25, 2005 4th EGEE Conference: Dane Skow 30
JA3 CEN Enabling Grids for E-sciencE glite Services for elease 1 Access Services Grid Access API Service UK IT/CZ Security Services Authorization Authentication Auditing Information & Monitoring Information & Monitoring Services Application Monitoring Metadata Catalog Data Services File & eplica Catalog Accounting Job Management Services Job Package Provenance Manager Storage Element Data Management Site Proxy Computing Element Workload Management INFSO-I-508833 ISGC 2005 31
Open Science Grid OSG Services elease 0.2 0.4 Configuration & Installation Service Agreement Service Authorization Service Authentication Bandwidth Allocation & eservation Service Helper Services Auditing Dynamic Connectivity Information & Monitoring Network Monitoring Job Monitoring Service Discovery Information & Monitoring Services Key: OSG Specific Metadata Catalog Storage Element Security Services File & eplica Catalog Data Movement Data Services Job Provenance Computing Element Accounting Package Manager Workload Management Job Management Services EGEE Compatible VO Service TBD October 25, 2005 4th EGEE Conference: Dane Skow 32
Enabling Grids for E-sciencE Manage and operate production Grid infrastructure for the European esearch Area Interoperate with e-infrastructure projects around the globe Contribute to Grid standardisation efforts EGEE-II Mission Support applications deployed from diverse scientific communities High Energy Physics Biomedicine Earth Sciences Astrophysics Computational Chemistry Fusion Geophysics (supporting the Industrial application, EGEODE) Finance, Multimedia..... einforce links with the full spectrum of interested industrial partners Disseminate knowledge about the Grid through training Prepare for a permanent/sustainable European Grid Infrastructure (in a GÉANT2-like manner) INFSO-I-508833 Bob Jones, 4th EGEE conference, Pisa, 24th October 2005 33
Open Science Grid Who is using OSG? The Virtual Organizations High Energy and Nuclear Physics CMS, ATLAS, STA, DZero, CDF, Fermilab Physics and Astronomy LIGO, SDSS, Auger, DES Biology fmi, GADU, GASE, GLOW Engineering GASE, GLOW Computer Science ivdgl, GLOW User Support is entirely provided by the Vos October 25, 2005 4th EGEE Conference: Dane Skow 34
Asia Pacific esource Centers BEIJING-LCG2 LCG_KNU TOKYO-LCG2 PAKGID-LCG2 TIF-LCG2 Taiwan-LCG2 Taiwan-IPAS-LCG2 TW-NCUHEP GOG-Singapore 35
esources from egional Centers Taiwan- LCG2 Taiwan- Taiwan- Taiwan- GOG- LCG-KNU PAKGID-NCP- IPAS- LCG2 NCUHEPNTU_HEP Singapore LCG2 LCG2 TIF- LCG2 BEIJING- LCG2 Tokyo- LCG2 Australia New Zealand # CPU Disk (TB) 400 50 60 50 90 38 2 4 26 70 84 96+? 40 5 5 5 0.08 0.05 0.05? 0.05 3.00 0.87? VO Dteam, Alice, Atlas, CMS, BioMed Dteam, Atlas Dteam, CMS CMS Dteam, Atlas, CMS Dteam Dteam, CMS CMS Dteam, CMS Dteam, CMS, Atlas Dteam, Atlas Atlas CMS Tier-1 Center, 32+ CPUs dedicated for OSG ATLAS Federated Tier-2 CMS Tier-2 OSG Site, Federated CMS Tier- 2 32 CPUs Pakinstan Pakinstan India for Tier-2 Grid3/OSG egional Center University U. of Auckland Melbourne, and U. NorduGrid Canterbury site, ATLAS Tier-2 36
APOC Website (www.twgrid.org/aproc) 37
Taiwan Tier-1 in CEN Courier Academia Sinica drives e-science in Asia-Pacific The Academia Sinica Grid Computing Centre (ASGC) in Taipei is currently the only LCG Tier-1 Centre in the Asia-Pacific area, with 400 KSI2K computing capacity, 50 TB disk space and a 35 TB tape library dedicated to the LCG. Since 2004, Academia Sinica has provided the services of a regional operation centre (OC), site monitoring, virtual-organization (VO) support, middleware deployment, certificate authority (CA) and global Griduser support (GGUS mainly first-line support and FAQs). The centre supports not only Tier-2 sites in Taiwan, but also Grid operations in South Korea, Singapore and other Asia- Pacific countries that are not supported by other Tier-1 sites. To support service and data challenges, a maximum Grid tutorial at the Academia Sinica Grid Computing Centre. 1.6 Gbit/s transmission rate was achieved in the 2 Gbit network bandwidth between CEN and Taiwan in June 2005. During the CMS service challenge, ASGC received 20 TB of data from CEN at an average rate of 56 Mbit/s from 14 July to 14 August. The ASGC Tier-1 Centre provided 12% of the LCG-2 computing jobs, second only to the 14% of CEN in the ATLAS data challenge in 2004. Academia Sinica will work closely with Tokyo University and other Tier-2 sites in this region for the ATLAS and CMS service challenges in the near future. ASGC is engaging in collaboration and sharing of information by taking advantage of e-science applications in the Asia-Pacific area. ASGC is also working with different partners to help form and support application-driven e-science communities in the Asia-Pacific region, to improve the nextgeneration research infrastructure and build up the e-science applications. Hosting the International Symposium on Grid Computing (ISGC) since 6 CEN Computer Newsletter September October 2005 38
Conclusion Critical mass decides which Grid technology/system to prevail; Collaboration, Data and Complexity eduction are the main themes We are about to witness Data Deluge in all disciplines of e-sciences Unprecedented way to collaborate on day-to-day basis will change the sociology of academia life, eco-system of business world and eventually every one in the society We have digital scholars, librarians, content owner... Some capacity from Asia Tier-1 Centre in Taipei... John Taylor now talks about e-esearch: Collaboratories and Curation in APAC meeting Could we build the PNC e-neighbourhood together? 39