Introduction to Scientific Data Management damien.francois@uclouvain.be October 2015 http://www.cism.ucl.ac.be/training 1
http://www.cism.ucl.ac.be/training Goal of this session: Share tools, tips and tricks related to the storage, transfer, and sharing of scientific data 2
1. Data storage Block File Object Databases 3
Data storage Schema RDBMS Serialization (file formats, etc) NoSQL Global filesystem Local filesystem LVM RAID Obj store Block storage software RAID JBOD Erasure coding Attachment (IDE, SAS, SATA, iscsi, ATAoE, FC) Medium (Flash, Hard Drives, Tapes, DRAM) 4
Storage abstraction levels 5
Storage Medium Technologies 6 http://www.slideshare.net/imexresearch/ss-ds-ready-for-enterprise-cloud
Storage performances 7 http://www.slideshare.net/imexresearch/ss-ds-ready-for-enterprise-cloud
Storage safety (RAID) 8 https://www.extremetech.com/computing/170748-how-long-do-hard-drives-actually-live-for
Storage safety (RAID) 9 https://en.wikipedia.org/wiki/standard_raid_levels
Storage abstraction levels 10
(local) Filesystems Generation 0: No system at all. There was just an arbitrary stream of data. Think punchcards, data on audiocassette, Atari 2600 ROM carts. Generation 1: Early random access. Here, there are multiple named files on one device with no folders or other metadata. Think Apple ][ DOS (but not ProDOS!) as one example. Generation 2: Early organization (aka folders). When devices became capable of holding hundreds of files, better organization became necessary. We're referring to TRS-DOS, Apple //c ProDOS, MS-DOS FAT/FAT32, etc. Generation 3: Metadata ownership, permissions, etc. As the user count on machines grew higher, the ability to restrict and control access became necessary. This includes AT&T UNIX, Netware, early NTFS, etc. Generation 4: Journaling! This is the killer feature defining all current, modern filesystems ext4, modern NTFS, UFS2, XFS, you name it. Journaling keeps the filesystem from becoming inconsistent in the event of a crash, making it much less likely that you'll lose data, or even an entire disk, when the power goes off or the kernel crashes. Generation 5: Copy on Write snapshots, Per-block checksumming, Volume management, Far-future scalability, Asynchronous incremental replication, Online compression. Generation 5 filesystems are Btrfs and ZFS. 11 http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/
Network filesystem One source many consumers NAS: ex. NFS SAN: ex. GFS2 12 Pictures from https://www.redhat.com/magazine/008jun05/features/gfs_nfs/
Parallel / distributed filesystem Many sources many consumers ex: Lustre, GPFS, BeeGeeFS GlusterFS 13 Pictures from https://www.redhat.com/magazine/008jun05/features/gfs_nfs/
Special filesystems in memory 14
Filesystems 15
What filesystem for what usage Home (NFS) : Small size, Small I/Os Global scratch (parallel FS) : Large size, Large I/Os Local scratch (local FS): Medium size, Large I/Os In-memory (tmpfs): Small Size, Very Large I/Os Mass storage (NFS); Large size, Small I/Os 16
Storage abstraction levels 17
Text File Formats JSON, YML, XML 18
Text File Formats CSV,TSV 19
Binary File Formats CDF, HDF 20 http://pro.arcgis.com/en/pro-app/help/data/multidimensional/fundamentals-of-netcdf-data-storage.htm
Binary File Formats CDF, HDF 21 https://www.nersc.gov/users/training/online-tutorials/introduction-to-scientific-i-o/
Binary File Formats CDF, HDF 22
What file format for what usage Meta data Configuration file: INI, YAML Result with context information: JSON Data Small data (kbs): CSV, TSV Medium data (MBs): compressed CSV Large data (GBs): netcdf, HDF5, DXMF Huge data (TBs): Database, Object store ( loss of innocence ) Use dedicated libraries to write and read them 23
Storage abstraction levels 24
Object storage Object: data (e.g. file) + meta data Often built on erasure coding Scale out easily Useful for web applications Access with REST API 25
RDBMS Mostly needed for categorical data and alphanumerical data (not suited for matrices, but good for end-results) Indexes make finding a data element is very fast (and computing sums, maxima, etc.) Encodes relations between data (constraints, etc) Atomicity, Consistency, Isolation, and Durability 26 Pictures from http://www.ibm.com/developerworks/library/x-matters8/
NoSQL Mostly needed for unstructured, semistructured, and polymorphic data Scaling out very easy Basic Availability, Soft-state, Eventual consistency 27 Pictures from http://www.tomsitpro.com/articles/rdbms-sql-cassandra-dba-developer,2-547-2.html
When to use? when you have a large number of small files when you perform a lot of direct writes in a large file when you want to keep structure/relations between data when software crashes have a non-negligible probability when files are update by several processes When not to use: only sequential access simple matrices/vectors, etc. direct access on fixed-size records and no structure 28
Example: run a redis server Create a redis directory Copy /etc/redis.conf and modify the following lines: Choose a port at random 29
Example: run a redis server Start the redis server Store values (normally you would do this in a Slurm job) 30
Example: run a redis server Check the values Retrieve the values 31
2. Data transfer faster and less secure parallel transfers 32
scp -c cipher... 33 http://blog.famzah.net/2010/06/11/openssh-ciphers-performance-benchmark/
Fastest: No SSH at all Need friendly firewall (choose direction accordingly) Only over trusted networks If rsh is installed: rcp instead of scp 34
Fastest: No SSH at all Need friendly firewall (choose direction accordingly) Only over trusted networks If rsh is installed: rcp instead of scp If rsh is not installed: nc on both ends 35
Resuming transfers When nothing changed but the transfer was interrupted size-only: do not perform byte-level file comparison 36
Resuming transfers When nothing changed but the transfer was interrupted append: do not re-check partially transmitted files and resume the transfer where it was abandoned assuming first transfer attempt was with scp or with rsync --inplace 37
Parallel data transfer: bbcp Better use of the bandwidth than SCP Needs to be installed on both sides (easy to install) Needs friendly firewalls 38
Parallel data transfers: parsync 39 http://moo.nac.uci.edu/~hjm/parsync/
Parallel data transfers: sbcast 40
Transferring ZOT files Zillions Of Tiny files More meta-data than data large overhead for rsync Solution: Pre-tar or tar on the fly Needs friendly firewall Also avoid 'ls' and '*' as they sort the output. Favor 'find' 41
3. Data sharing with other users (Unix permissions, Encryption) with external users (Owncloud) 42
Data sharing Data sharing with other users 43
Sharing with all other users 44
Sharing with the group 45
Sharing and hiding 46
Sharing and encrypting 47
Data sharing Data sharing with external users 48
Data sharing with external users owncloud CISM login 49
Dropbox-like 50
External SFTP connectors 51
Dropbox-like 52
My home on Manneback 53
Can create a share URL 54
And distribute it 55
http://www.cism.ucl.ac.be/training Exercise: 1. Run a redis server on Hmem 2. Populate it from compute nodes with random data 3. Extract the data from it and create an HDF5 file 4. Encrypt the file 5. Copy it to lemaitre2 using nc 6. Make it available to others who know of its name 56
http://www.cism.ucl.ac.be/training Summary: Storage: choose the right filesystem and the right file format Transfer: use the parallel tools when possible and limit encryption in favor of throughput Sharing: use all the potential of the UNIX permissions and try Owncloud 57