Lecture 08: When disaster strikes and all else fails Hands-on Unix system administration DeCal 2012-10-22 1 / 27
Projects groups of four people submit one form per group with proposed project ideas and SSH public keys we ll be provisioning VMs and sending out an announcement 2 / 27
What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools 3 / 27
What s up? What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools uptime: how long continuously running, what s the load average 1, 5, 15 min average number of processes waiting for CPU (or IO) w, who: who s logged in on machine write: write to a logged-in user wall: write to all logged-in users 4 / 27
What s hosing? What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools top, htop (Linux), ps (ps aux, ps elf) similarly iftop for network interface bandwidth, iotop (Linux) for disk IO 5 / 27
What s in use? What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools The action can t be completed... in use (Windows) The operation can t be completed... in use (Mac OS X) lsof for files lsof -i for network ports see also: netstat -pant, fuser 6 / 27
Too much traffic What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools netcat: pipe over TCP/UDP wireshark, tshark, tcpdump: packet sniffer/analyzer nmap: network scanner 7 / 27
Too many files What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools du, df: directory, filesystem disk space usage scp (secure copy): transfer files over SSH rsync (remote sync): intelligently transfer files (often over SSH) tar (tape archiver): combine files into a tarball 8 / 27
Low-level files What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools fdisk, parted (Linux): edit partition table fsck: check filesystem for errors dd: copy block devices 9 / 27
Too many terminals What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools screen, tmux metaterminal access multiple terminal sessions inside a single terminal session other features: persistence (after logging off), session sharing (between users) 10 / 27
sudo What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools sudo: switch user do (usually used to give your command root powers) via xkcd.com 11 / 27
Other tools What s up? What s hosing? What s in use? Too much traffic files Low-level files terminals sudo Other tools ldd (shared library dependencies), truss or strace (trace system calls) md5sum: file checksum watch: execute command and repeatedly show output seq: print sequence of numbers 12 / 27
Software meltdowns Hardware meltdowns Criminals on the loose Escalation of problems 2003 Northeast blackout 2003 Northeast blackout 13 / 27
Software meltdowns Software meltdowns Hardware meltdowns Criminals on the loose Escalation of problems 2003 Northeast blackout 2003 Northeast blackout system load (uptime command) too damn high remote access (networking, firewall, SSH) broken 14 / 27
Hardware meltdowns Software meltdowns Hardware meltdowns Criminals on the loose Escalation of problems 2003 Northeast blackout 2003 Northeast blackout failed hard drives failed fans, power supplies, CPU, RAM 15 / 27
Criminals on the loose Software meltdowns Hardware meltdowns Criminals on the loose Escalation of problems 2003 Northeast blackout 2003 Northeast blackout crackers will do Bad Things compromised accounts looks can be deceiving, uncertain what to trust 16 / 27
Escalation of problems Software meltdowns Hardware meltdowns Criminals on the loose Escalation of problems 2003 Northeast blackout 2003 Northeast blackout we like to build systems on top of each other if one thing fails, it may break other things, causing other things to fail 17 / 27
2003 Northeast blackout August 13, 2003, 9:21pm EDT (via en.wikipedia.org) 18 / 27
2003 Northeast blackout August 14, 2003, 9:03pm EDT (via en.wikipedia.org) 19 / 27
Be Prepared Power Out-of-band Redundancy Monitoring Security Backups 20 / 27
Be Prepared Be Prepared Power Out-of-band Redundancy Monitoring Security Backups Boy Scout motto Murphy s Law: Anything that can go wrong, will go wrong. s happens 21 / 27
Power Be Prepared Power Out-of-band Redundancy Monitoring Security Backups Uninterruptible Power Supply (UPS) many UPSes can remotely power cycle servers 22 / 27
Out-of-band Be Prepared Power Out-of-band Redundancy Monitoring Security Backups separate hardware that can be remotely accessed independent from rest of hardware, dedicated NIC can access BIOS, power cycle, provide visual display e.g., IPMI, Dell DRAC, Sun LOM 23 / 27
Redundancy Be Prepared Power Out-of-band Redundancy Monitoring Security Backups dual redundant power supplies typical RAID failover servers for high availability spare parts (hard drives!) for swapping 24 / 27
Monitoring Be Prepared Power Out-of-band Redundancy Monitoring Security Backups many large scale operations (Google, Facebook) have many failed servers at any point in time, monitoring servers reroute traffic appropriately monitor syslog SNMP traps alarm notification by email, text message 25 / 27
Security Be Prepared Power Out-of-band Redundancy Monitoring Security Backups subscribe to OS security announcements Intrusion Detection Software (e.g., snort, bro) be wary of lax permissions limit root access 26 / 27
Backups Be Prepared Power Out-of-band Redundancy Monitoring Security Backups user data, system configuration ideally daily, weekly, monthly rotations RAID is not a backup e.g., rsync, cron, rsnapshot 27 / 27