LENS Server Maintenance Guide JZ 2017/07/28
Duty Maintain LENS server with minimum downtime Patch critical vulnerabilities Assist LAB member for using the LENS services Evaluate for custom requirements e.g. dedicated VM for lab member if necessory, service deployment for custom needs Keep track the public IP leases 2
Required credentials for administration Install Lastpass Plugin for Chrome The Lastpass password manager account contained the credentials needed for the server administration. 3
Repositories for the configuration files git clone git@git.lens.csie.ncku.edu.tw:2222/lens-devops/ansible git clone git@git.lens.csie.ncku.edu.tw:2222/lens-devops/docker-library git clone git@git.lens.csie.ncku.edu.tw:2222/lens-devops/rancher-compose https://git.lens.csie.ncku.edu.tw 4
Physical Machines / Network Two outbound connections are wired to the dept. switch, labeled as 655013 and 65501-6, are in separate VLANs Backup NAS Switch 2 Switch 1 wiring: 65501-6->switch1->switch2 Currently broken due to fan failure UPS Battery power the network switches, S1, S6, NAS and offline-backup NAS The UPS connects to this laptop NAS Currently broken due to motherboar d failure S5 S6 S2 S3 S1 S4 S1, S6 serve the lens service, S2 to S5 are the testbed for Hadoop cluster 5
Architecture [Public static IPV4] subnet 140.116.245.193/28 gateway 140.116.245.206 Core Services 10.10.50.6 10.10.50.1 10.10.50.34 140.116.245.196 [Public static IPV6] subnet 2001:288:7001:2717::/64 gateway 2001:288:7001:2702::fffe 10.10.50.35 10.10.50.254 10.10.50.50 10.10.50.21 10.10.50.23 10.10.50.252 10.10.50.253 140.116.245.193 140.116.245.194 2001:288:7001:2717::1 2001:288:7001:2717::2 10.10.50.25 10.10.50.26 [DHCP IP] assigned by the dept. subnet 192.168.50.0/24 gateway 192.168.50.254 routed by the ns1, ns2 subnet 10.10.50.0/24 gateway 10.10.50.254 (only available on the VLAN) 6
Services Firewall and name server: ns1.lens.csie.ncku.edu.tw, ns2.lens.csie.ncku.edu.tw Homepage: lens.csie.ncku.edu.tw Personal website: lens.csie.ncku.edu.tw/~{username} Git service: git.lens.csie.ncku.edu.tw Owncloud (Web access for the storage server): cloud.lens.csie.ncku.edu.tw Storage server: nas.lens.csie.ncku.edu.tw Storage server (admin page): nas-intra.lens.csie.ncku.edu.tw Storage server (offline backup): nasbk.lens.csie.ncku.edu.tw Docker registry: portus.lens.csie.ncku.edu.tw Windows Server workstation: ws.lens.csie.ncku.edu.tw Account management: ldm-admin.lens.csie.ncku.edu.tw UPS status: ups.lens.csie.ncku.edu.tw/viewpower For more complete info, please see the DNS zone file 7
Hypervisor Vcenter Server 5.5 Vcenter ESXI 5.5 8
Hypervisor - Create VM Right click on the server where the vm will be created Login VM view Login account: Administrator: root Lens user account: username@lens.csie.ncku.edu.tw 9
Hypervisor - Create VM 10
Hypervisor - Create VM 11
Hypervisor - Create VM 12
Hypervisor - Create VM 13
Hypervisor - Create VM 14
Hypervisor - Create VM 15
Hypervisor - Create VM 16
Hypervisor - Create VM 17
Hypervisor - Create VM 18
Hypervisor Create VM Installation Make sure the Vmware Tools is installed to proper shutdown the VM when power outage 19
Hypervisor Permission Add permission for lens user 20
Hypervisor Auto startup Auto startup VMs after the system boot 21
Hypervisor - Snapshot Create Snapshot especially before install major software upgrade or system changes, to be able to rollback to previous state if something went wrong. 22
Container vs Virtual Machine Containerized application without OS overhead and consistent deployment across different physical servers 23
Docker Registry - Public Cloud vs Self-hosted Dockerhub, public or paid private repo https://portus.lens.csie.ncku.edu.tw Portus, self-hosted docker registry 24
Docker File LENS website as an example Base docker image that defined the Joomla dependency. Build: docker build. --rm -t registry.lens.csie.ncku.edu.tw/library/joomla-base Push: docker push registry.lens.csie.ncku.edu.tw/library/joomla-base:latest Website image with Joomla installation added. Build: docker build. --rm -t registry.lens.csie.ncku.edu.tw/library/joomla Push: docker push registry.lens.csie.ncku.edu.tw/library/joomla:latest 25
Docker Compose Define web container and database container in a docker compose file, this service stack can be run on the development machine for testing and development. 26
Docker Compose 27
Rancher Rancher cli LENS-CORE & LENS Github Organization 28
Rancher Docker Orchestration Platform Rancher, docker container are deployed across three VMs http://rma.lens.csie.ncku.edu.tw 29
Rancher Obtain API key for deployment 30
Deploy lens website on production Download rancher CLI wget https://releases.rancher.com/cli/v0.4.1/rancher-linux-amd64-v0.4.1.tar.gz Extract the archive tar -xvf rancher-linux-amd64-v0.4.1.tar.gz mkdir ~/bin && mv rancher-v0.4.1/rancher ~/bin Add following into.bashrc, vi ~/.bashrc export RANCHER_URL=http://rma.lens.csie.ncku.edu.tw/v1/projects/1a5 export RANCHER_ACCESS_KEY={PASTE-IN-THE-OBTAINED-ACCESS-KEY} export RANCHER_SECRET_KEY={PASTE-IN-THE-OBTAINED-SECRET-KEY} After Logout and login (re-login), cd into project directory, and run one of following rancher up --force-upgrade -p --batch-size 1 # will force upgrade the service rancher up --upgrade -p --batch-size 1 # will only upgrade if the compose file is changed 31
Deploy lens website on production 32
Google G Suite Admin mailing list Create account https://admin.google.com 33
Google G Suite Admin mailing list Select group Select admin 34
Google G Suite Admin mailing list Select user management Make sure your (the administrator) email is in the list, to receive alerts like power loss, disk failure etc 35
Google G Suite Create account Select user Select add user 36
CSIE Mailing list Lab email: lens@csie.ncku.edu.tw Login https://mail.csie.ncku.edu.tw Email forwarding setup 37
Home Server Web server for personal website https://lens.csie.ncku.edu.tw/~{account} FTP server for student to upload project and homework Home directory is mounted through NFS, the data is stored at the NAS Ansible for the configuration and state management ssh://lens.csie.ncku.edu.tw 38
Home Server Run the definded configuration with ansible-playbook 39
Home Server Maintain the personal website Select SFTP protocol and fill in the account credential Personal website must be put in to the public_html folder 40
Storage Server FreeNAS 9.10 Boot drive: USB 16G Data drive: ZFS 3TB*2 in RAIDZ1 NFS for rancher and home server SMB for owncloud external storage Check email for the critical alerts, make sure the administrator's email is in the mailing list admin@lens.csie.ncku.edu.tw 41
Storage Server Admin page 42
Storage Server Backup config before update the FreeNAS OS 43
Account Management http://ldm-admin.lens.csie.ncku.edu.tw 44
Account Management Click Create new entry here Select generic user account copy this UID number to be pasted in the following step http://ldm-admin.lens.csie.ncku.edu.tw 45
Account Management Rename to the new name Type in the new password Instead of creating an entry from scratch, select one current user and click copy this entry Paste in the UID number Rename to the new name http://ldm-admin.lens.csie.ncku.edu.tw 46
Router and Firewall - pfsense DNS record maintaince Admin page Services BIND DNS server 47
Router and Firewall - pfsense Edit zone Added record, and scroll down to click the save button 48
Router and Firewall - pfsense Updated DNS record on the master (ns1) DNS record not updated on the slave (ns2) 49
Router and Firewall - pfsense Workaround: ssh into ns2, delete slave folder, and restart DNS service delete slave folder 50
Router and Firewall - pfsense Restart DNS service Edit zone Save zone 51
UPS - scheduled shutdown Shutdown script is available at: git clone git@git.lens.csie.ncku.edu.tw:2222/lens-devops/ups SSH remote into the UPS laptop, for instance to have a scheduled shutdown on 2017/07/12 05:00 > ssh root@r60.lens.csie.ncku.edu.tw # at 05:00 2017-07-12 < /root/shutdown-servers.sh (This script will shutdown servers powered by the UPS, typically will be run when power loss event occurred) # at 05:00 2017-07-12 < /root/shutdown-servers-noups.sh (This script will shutdown the servers without the battery backup, must schedule this script on the day of power maintenance, use this script if UPS is active) # at 05:00 2017-07-12 < /root/shutdown-servers-all.sh (This script will shutdown all servers, must schedule this script on the day of power maintenance, use this script if UPS is not active) 52
UPS - monitor Enable flash View the power status 53
Restore on power loss Set the BIOS setting to Power On after power loss. Note that the server will be powered on only if the UPS runs out and cut the power off. BIOS Setting To fix the issue, these methods are feasible 1. Ensure the UPS is power cycled after all servers were shut down (use NUT networkupstools.org) 2. Send a wake on lan signal after power is restored (only if the network card is supported) 54
Steps to do for a new member Create a LENS account (p.44) Create a Gsuit account, use same user ID as the LENS one (p.36). 55
NAS Troubleshooting if a service is not available, try to restart the service by toggle the on/off button at service tab if the domain user can not login to the nas (e.g. via samba), try to rebuild cache and save 56
Useful command NAS zpool status zpool attach [pool-name] [disk-id] // attach a disk to a mirrored pool zpool detach [pool-name] [disk-id] // remove a disk from a pool zpool replace [old-disk-id-to-be-replaced] [new-disk-id] zpool replace [new-disk-id] // if the old drive is removed and replaced with new at same location gpart list gpart show camcontrol devlist http://docs.oracle.com/cd/e19253-01/819-5461/index.html 57
Useful command SSH Server apt-get update // update software list apt-get upgrade // upgrade system mount -a // remount all mount points, useful for remounting the NFS storage when it is available df // list the mount points 58
Adding new server Install vsphere ESXI Use Rufus to flash the iso to a USB drive to create a bootable installation media https://rufus.akeo.ie By default the ESXI only support server-grade network card, custom network adapter driver may needed to add to the ISO using ESXI-Customizer All the required software is located at \\nas.lens.csie.ncku.edu.tw\software\os\vsphere\ and \\nas.lens.csie.ncku.edu.tw\software\rufus Assign the networking setting and hostname for the new server during the installation process, something like this: Hostname: s7.lens.csie.ncku.edu.tw (need to add a new DNS record at pfsense) IP: 10.10.50.7 DNS Server: 10.10.50.254 Default Gateway: 10.10.50.254 59
Adding new server Adding the server to the cluster select add host, and follow the wizard 60
Adding new server Pass-through PCI device (such as GPU card) Login to vsphere Web Choose the server and go to the management tab 61
Adding new server Pass-through PCI device (such as GPU card) Choose the PCI controller to be passed 62
Adding new server Pass-through PCI device (such as GPU card) Select the desired VM to attach the PCI device Select the PCI device 63
乖乖 Exp. 2019/01/20 Remember to replace it berfor expires XD 64