COMPUTE CANADA GLOBUS PORTAL Fast, user-friendly data transfer and sharing Jason Hlady University of Saskatchewan WestGrid / Compute Canada February 4, 2015
Why Globus? I need to easily, quickly, and reliably move or mirror portions of my data to other places. Compute Canada HPC Cluster Campus filesystem Lab server Personal laptop or workstation I need to easily and securely share my data with my colleagues at other institutions. I need a good way to store / backup / archive my research data.
Globus highlights Software-as-a-Service (SaaS) Compute Canada has partnered with Globus, a not-for-profit organization from University of Chicago/Argonne National Labs Globus operates the file transfer service for Compute Canada 24 CC sites connected File transfer and replication Reliable Secure high-performance File sharing Share files with collaborators who do not have Compute Canada accounts
Compute Canada Globus Portal
Getting a Globus Account Create a Globus account Separate and distinct from your Compute Canada account: could be same username, or different Identifies you to the Globus service Globus is hosted in the United States Potentially personally-identifying information, i.e. your Globus username and password, stored in USA Research data does NOT travel through Globus: Globus brokers point-to-point connection between source and destination Globus is currently English-only
Logging in to Globus Use Globus account name and Globus Password
Data transfer Fire-and-forget transfers Automatic fault recovery Data Source 2 Globus moves and replicates files Data Destination Powerful GUI, CLI, APIs Built-in security 1 User initiates transfer request 3 Globus notifies user
Data transfer: high performance Globus uses GridFTP for high-speed, reliable, secure data transfer GridFTP is an extension of the standard File Transfer Protocol (FTP) GSI Security: uses Grid Security Infrastructure (GSI) for authentication and encryption of transferred files Parallel transfers: supports multiple TCP streams to take advantage of fast networks for faster transfers Automatic TCP optimization: automated performance tuning Fault tolerance: tolerates network / server failure, supports automatic restart
Data transfer: fire-and-forget Start data transfer of many files using web browser/globus: No need to maintain terminal connection to server, or to webpage Transfers queued and handled by Globus Globus emails you when transfer completed successfully
Data transfer: fault recovery Once transfer is initiated, Globus monitors and automatically restarts failed or stalled transfers When a problem is encountered part-way through the transfer, Globus resumes from the point of failure does not retransmit all of the data specified in the original request; only what remains to be transferred No need to babysit data transfer Very useful for transferring larger numbers of files or directories
Data transfer: supported features Mirroring Options available to mimic rsync and/or mirroring transfer only new/changed files delete files on destination if don t exist on source Keep file dates consistent at both ends File verification at both ends Checksums checked for matching before and after file transfer; if they don t match, entire file retransferred until it succeeds Encryption Typically results in slower performance
Globus Endpoints Endpoints: locations you can transfer to / from using Globus: a logical address for a GridFTP server, similar to a domain name for a web server. username#endpointname Endpoints can be configured on a variety of systems: Compute Canada systems / clusters Local research servers Scientific instrument workstations Researcher desktops / laptops Research IT infrastructure around the world All Compute Canada systems can be found under computecanada#systemname
Activating Endpoints To activate an endpoint for transfer, you must prove that you are a valid user on that system Use appropriate regional consortium account name and password to activate a Compute Canada system s endpoint For example, to activate endpoint in WestGrid (e.g. computecanada#silo), use WestGrid username and password Authentication and authorization handled by Compute Canada using myproxy-oauth On endpoint activation, Globus redirects user to Compute Canada consortium-level webpage for authentication/authorization Your consortium username and password does NOT go through Globus
Activating Endpoints OAuth MyProxy Globus redirects to Compute Canada operated authentication page Activate endpoint with consortium username and password After authentication, you are returned to the Globus transfer page with the endpoint now active
Activating Endpoints OAuth MyProxy
Transfer demo
Globus Connect Personal A client for communicating with other GridFTP servers / Globus endpoints, using your local computer creates your own endpoint to transfer data to and from your computer uses GridFTP for high performance transfers Available for Mac, Linux, Windows https://www.globus.org/globus-connect-personal
Globus Connect Personal Demo Download, install, configure Globus Connect Personal on laptop or desktop Follow detailed instructions on CC website https://computecanada.ca/en/globus-portal Activate endpoint on laptop/desktop Initiate transfers between Globus endpoints, including your laptop/desktop
Sharing Share large data with any user / group Shared directly from where data currently resides 1 User A selects file(s) to share, selects user or group, and sets permissions 2 Globus tracks shared files; no need to move files to cloud storage! Data Source 3 User B logs in to Globus and accesses shared file(s)
Sharing Sharing enables collaborators to access files from within your Compute Canada account on a CC system EVEN IF the collaborators do not have an account on the system you are sharing from Files can be shared with any Globus users, anywhere in the world You can set Globus permissions for who reads/writes which can be overwritten by the site s Globus permissions which are in turn overwritten by the operating system s permissions e.g. you can t share /root on a system you can t share files you don t have access to you can t set write if the system administrators don t allow it Contact globus@computecanada.ca
Sharing Demo
Sharing Caveats Sharing files entails a certain level of risk By creating a share, you are opening up access to files to others that (up to now) have been in your exclusive control Make sure you have permission to share the files, if you are not the data s owner Make sure you are sharing with only those you intend to Verify the person you add to the access list is the person you think; there are often people with the same or similar names Remember that Globus usernames are not linked to Compute Canada usernames Use the email address of the person(s) you wish to share with, unless you have the exact account name
Sharing Caveats If you are sharing with a group you do not control, make sure you trust the owner of the group They may add people to their group who are not authorized to access your files If granting write access, make sure that you have backups of important files on the shared endpoint Users of the shared endpoint may delete or overwrite files Those users can do anything that you yourself can do to a file Restrict sharing to a subdirectory, rather than your toplevel home directory
Future Directions Single Sign-on / integration of CC and Globus accounts Improved bilingualism for Globus service
Summary Portal: https://globus.computecanada.ca Documentation: https://computecanada.ca/en/globus-portal https://computecanada.ca/fr/globus-portal Support: Email globus@computecanada.ca Email support@westgrid.ca