Research Cyberinfrastructure Collaboration Resources Computational resources are an important part of research endeavors, and that research varies with respect to its data and processing demands, and also with respect to the need to compute, modify theories/codes, recompute, etc.. Computational resources are essential to help build research programs, to extend the value of extramural contracts/grants/awards, and to help sustain programs. Some projects may take weeks to realize, some may take decades to realize; scalable, predictable, and sustainable resources are required. Research problems are not one-sized; therefore, computational demands are not one-sized. I. Consultation and Engagement Research computing depends on an Engagement Team of experienced scientists who are also adept with various computational, information-processing, and data management techniques. An Engagement Team is loosely organized by disciplinary families, for example: Physical, Information, Mathematical, Computer Science Life and Environmental Science Health Outcomes and Clinical Research Economics, Social and Behavioral Science, Business Humanities If a project does not fit one of the above families easily, we assign an engagement member as appropriate. Engagement team members perform three general functions: (i) user/group onboarding, (ii) disciplinary/project outreach, (iii) advanced consultations. Contributions by engagement team members range from co-investigation and article co-authorship to assisting lab teams with job submission scripts, to collaborating on scientific workshops. II. Research CyberInfrastructure A. Cluster-scale Computation and Information-Processing High Performance Computation The Killdevil cluster at UNC-Chapel Hill is a 772 node (9152 core) Dell Linux cluster with QDR Infiniband interconnect and a minimum of 4-GB memory per core, and two 32-core hosts with one terabyte of memory each to accommodate codes that require extremely large amounts of RAM. Killdevil also includes 64 NVidia Tesla GPUs (M2070). A 125-TB Lustre parallel filesystem is presented to Killdevil over Infiniband. Killdevil uses the IBM LSF batch scheduling system. A high performance NFS scratch filesystem of 225-TB is presented to Killdevil over Ethernet. Also, a permanent 4-PB high performance scale-out NFS storage cluster on Dell/EMC Isilon X-series was recently installed in 2016 as a lifecycle replacement of a prior system; Killdevil nodes may access to this space by request if required.
Prior to July 1, 2017, UNC-Chapel Hill will implement a new cluster explicitly designed for MPI and/or OpenMP+MPI hybrid (or relevantly similar) workloads typical of disciplines and programs that have significant calculation and/or simulation workloads. In subsequent additions, the new cluster is to have high-end GPU and Xeon Phi compute capability as well. Research groups, programmes, investigators, and users in general, whose typical workloads are MPI and/or OpenMP+MPI hybrid (or relevantly similar) workloads will be provided access to and resource allocations on the new cluster at UNC-Chapel Hill, Dogwood, post-implementation. High-throughput, data-intensive, regulated-data, and big-data computation Longleaf is a new cluster at UNC-Chapel Hill explicitly designed to address the computational, data-intensive, memory-intensive, and big data needs of researchers and research programmes that require scalable information-processing capabilities that are not of the MPI and/or OpenMP+MPI hybrid variety. Longleaf includes 117 General-Purpose nodes (24-cores each; 256-GB RAM; 2x10Gbps NIC) and 24 Big-Data nodes (12-cores each; 256-GB RAM; 2x10Gbps; 2x40Gbps), 5 large memory nodes (3-TB RAM each), 5 GPU nodes each with GeForce GTX1080 cards (102,400 CUDA cores in total), zero-hop connections to a high-performance and high-throughput parallel filesystem (GPFS; a.k.a., IBM SpectrumScale ) and storage subsystem with 14- controllers, over 225-TB of high-performance SSD disk storage, and approximately 2-PB of high-performance SAS disk. The nodes include local SSD disks for a GPFS Local Read-Only Cache ( LRoC ) that optimizes the most frequent metadata data/file requests to the node itself, thus eliminating traversals of the network fabric and disk subsystem. Both General-Purpose and Big-Data nodes have 68-GigaBytes/second of memory bandwidth. General-Purpose nodes have 10.67GB of memory per core and 53.34-Megabytes/second of network bandwidth per core. Big-Data nodes have 21.34GB of memory per core and 213.34-Megabytes/second of network bandwidth per core. Longleaf uses the SLURM resource management and batch scheduling system. Longleaf s total conventional compute core count is 6,496 cores (note: this count reflects that hyperthreading enabled). Also, a permanent 4PB high performance scale-out NFS storage cluster on Dell/EMC Isilon X-series was recently installed in 2016 as a lifecycle replacement of a prior system; this storage is presented to all Longleaf nodes. Research groups, programmes, investigators, and users in general, whose typical workloads are best satisfied by Longleaf are provided access to and resource allocations there. Killdevil will be retired or repurposed once all researchers, research programs, etc., have been provided appropriate access to and allocations on Longleaf and/or the new cluster system, Dogwood. B. Permanent storage systems and data management RC-Isilon For comparatively large capacity permanent storage, UNC-Chapel Hill presents a 4PB high performance scale-out NFS storage cluster on Dell/EMC Isilon X-series (recently installed in 2016 as a lifecycle replacement of a prior system). Researchers whose research requires it may receive a 5-TB allocation upon request. On a project-by-
project basis, researchers may request additional storage space (usually not to exceed 25-TBs of added space) for the duration of a time-delimited project (usually not to exceed 3-years), pending available capacity. Network Attached Storage (NAS) Researchers have access to Netapp filer storage providing predominantly NFS (and also CIFS for specific use cases). High-performance storage to is delivered via SATA disks; extreme-performance storage is delivered via SAS disks. All storage is configured with large controller caches and redundant hardware components to protect against single points of failure. This storage space is snapshotted in order to support file recovery in the event of accidental deletions. Cluster users receive an institutional allocation of 10-GB per person. Active archive Quantum StorNext is an active archive with 600-TB disk cache, and in excess of 4-PB tape storage. Data protected against media failure via two copies, and encrypted on tape. Faculty receive an institutional allocation of 2-TB per person; laboratories and project teams receive an institutional allocation of 10-TB per person. Additional capacity is available for incremental cost. SecureFTP To facilitate the deposition of files/data from external organizations into relevant computational resources offers a secure file-transfer-protocol service that allows files/data to be uploaded but prohibits downloading. This file transfer service meets additional IT-Security requirements for sensitive data. Globus Globus (http://www.globus.org) is available for secure data/file transfer amongst participating institutions. C. Secure Research Workspace Redesigned and re-architected in 2013 by UNC-Chapel Hill, the Secure Research Workspace (SRW) contains computational and storage resources specifically designed for management and interaction with high-risk data. The SRW is used for storage and access to Electronic Health Records (EHR) and other highly sensitive or regulated data; it includes technical and administrative controls that satisfy applicable institutional policies. SRW is specifically designed to be an enclave that minimizes the risk of storing and computing on regulated or sensitive data. Technically, the SRW is an advanced implementation of a Virtual Desktop Infrastructure (VDI) system based on VMWare Horizon View, Cisco Unified Computing System, Netapp Clustered Data ONTAP comprised of standard disk and flash arrays, with network segmentation and protection guaranteed by design, by adaptive Palo Alto enterprise firewalls, and enterprise TippingPoint Intrusion Prevention System appliances. Access controls and permissions are managed via centrally administered systems and technologies appropriate to ensure security
practices and procedures are correctly and consistently applied. ITS-Research Computing consults with the investigator or research group to arrive at a reasonable initial configuration suitable for their respective project(s). The default software installed is: ActivePerl Adobe Reader ArcGIS Workflow Manager ERD Concepts 6 Google Chrome Internet Explorer Java Runtime Java Development Kit Microsoft Accessories Bundle Microsoft Sharepoint Workspace Microsoft Silverlight Microsoft-SQL Server 2008 Notepad++ NSClient++ Oracle Client R Rstudio Rsyslog SAS Stata 13 In addition, Data Leakage Prevention software is available for install on systems that enable data ingress and egress but require detailed access and transfer logging, or that require additional server-level controls. Twostep (or two-factor ) authentication is also available as required or requested. D. Virtual Computing Lab Virtual Computing Lab (http://vcl.unc.edu) is a self-service private cloud virtualization service. Originally developed by NC State University in collaboration with IBM, VCL (see http://vcl.apache.org) provides researchers with anytime, anywhere access to custom application environments created specifically for their use. With only a web-browser, users can make a reservation for an application, either in advance or immediately, and the VCL will provision that application on a centrally maintained server, and provide the user with remote access to that server. VCL provides users remote access to hardware and software that they would otherwise have to install themselves on their own systems, or visit a computer lab to use. It also reduces the burden on computer labs to maintain large numbers of applications on individual lab computers, where in many cases it s difficult for some applications to coexist on the same machine. In the VCL, operating system images with the desired applications and custom configurations are stored in an image library, and deployed to a server on-demand when a user requests it.
E. Select Commercial Scientific Software The research cyberinfrastructure environment licenses commercial software to support the research community. Notable software includes: Amber Atlas.ti Biovia (DiscoveryStudio, MaterialsStudio); formerly Accelrys COMSOL Cambridge Crystallographic ESRI Gaussian Globus Connect Harris Geospatial Solutions (ENVI+IDL); formerly Excelis Intel Compilers KEGG Database MapleSoft Mathworks Mplus nquery (Statistical Solutions) Portland Group (Fortran/C/C++) PyMOL RogueWave (TotalView and IMSL) SAS Schrödinger Scientific Computing Modeling (ADF and BAND Modeling Suite) StataCorp (Stata/SE) Certara (SYBYL) Wolfram (Mathematica) X-win32 The above list is not exhaustive. III. Training Currently delivered face to face at UNC-Chapel Hill, there are short courses during the Summer, Fall and Spring terms. Courses are: Linux: Intermediate Linux: Introduction Matlab: Intermediate Matlab: Introduction Python for Scientific Computing
Python Workshop QIIME Scientific Computing: Gaussian and GaussView Scientific Computing: Introduction to Computational Chemistry Shell Scripting TarHeel Linux Using Research Computing Clusters Web Scraping See http://its.unc.edu/rc-services/research-computing-training/. Delivering training workshops effectively in a manner that does not require face-to-face presence is a goal. Contact: J. Michael Barker, Ph.D. Assistant Vice Chancellor for Research Computing and Learning Technologies UNC-Chapel Hill michael_barker@unc.edu