Security Service Challenge & Security Monitoring Jinny Chien Academia Sinica Grid Computing OSCT Security Workshop on 7 th March in Taipei www.eu-egee.org
Motivation After today s training, we expect you to understand : Handle the Incident Response Procedure Ensure communication channels with the involved admins are in place. Deal with sudden security attacks Etc Overview Introduction Security Service Challenge Security Monitoring Conclusion
Security Service Challenge (SSC) Enabling Grids for E-sciencE The objective: The goal of the LCG/EGEE Security Service Challenge, is to investigate whether sufficient information is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels are available. The concept: At first CERN security team submit a testing job to the specific sites and site security contact must according to the clues and reply the answer at the limited time. In general the challenge executed once every year.
SSC-Objective
Stages / Role of SSC Stages of the SSC 1. Security Challenge targeting the principal site of each of the LCG/EGEE Regional Operation Centers(ROC) 2. Security Challenge targeting the individual sites in each ROC Roles 1. The Test Operator (TOP) : who submits the challenging job, issues the alert, escalates the alert as required and checks the response. 2. The Security Contact of the target site, who receives and acknowledges the alert, makes the necessary investigation and submits the response back to TOP
SSC The challenge is executed by submitting a Grid Job from a User Interface (UI). SSC level 1 : challenges the Workload Management System(WMS) of the Grid: Resource Broker(RB) and Computing Element(CE) SSC level 2 : challenges the Storage Elements(SE) on the Grid SSC level 3 : challenges the Operational Diligence of the LCG/EGEE Grid Sites SSC level 4 : coming soon Materials for SSC The materials are available for download from https://twiki.cern.ch/ twiki/bin/view/lcg/lcgsecuritychallenge
SSC Common Setup SSCs were run in two stages: Stage 1: targeting the principal sites in the regions Stage 2: targeting the individual sites in each ROC The jobs were submitted from an User Interface(UI) to a chosen Grid Computing Element(CE) via a Resource Broker (RB) using standard Grid commands They consist of a set of small, non-intrusive programs. Not intrusive, only legal operations are executed (job submission), file transfer, ) No penetration tests, no execution of exploits etc.
Security Service Challenge 1
SSC-1 Objective and Setup SSC-1 (2005- March 2006) targeted the Workload Management System(WMS) : Resource Broker (RB) and Computing Element (CE) It tested whether sufficient information was available and whether communication channels were sufficiently open. Did not address the Security Incident Response Procedure Used Savannah as the vehicle for communication between the Test Operator (TOP) and the Target sites.
SSC-1 - Task Given: Time range, IP-address of the target computer, UNIX-UID of challenging job on target The Sites had to find out 1. The DN of grid-credentials/certificate used by the job submitter? 2. The IP-address of the submitting network device (UI)? 3. The name of the executable which ran on the target computer? 4. The data and the precise time when the executable ran?
Sample: SSC-1 Subject: Security Service Challenge Local date and time of request creation: 2006-03-08 10:38:39 (CET, UTC+2) Initials of test operator: psa Dear LCG/EGEE Site Security Officer, This e-mail constitutes a security service challenge alert. You have received this because you have opened an e-mail destined to this site's security officer. In case you are not the security officer of this site, please forward this e-mail to - aproc-security@list.grid.sinica.edu.tw just stating so. This will allow us to improve our procedures, and we thank you in advance. We thank you for your collaboration, Date - 2006-03-08 and time period of challenge, between: 08:23:00 -and- 08:34:00 UTC Virtual Organization (VO): LCG/EGEE sitename: Resource Broker (RB): Regional Operation Center (ROC): IP-address of the target computer: lcg00189.grid.sinica.edu.tw UNIX-UID of challenging job on target: 18118 --- Security_Service_Challenge_Description ------------ Within the time period indicated above, a security service challenge was launched on your site. The UNIX-UID on the target computer as noted above, was associated with the challenge.
SSC-1 in AP Executed time : 2006/3/5 2006/3/13 Targeted Sites : Australia-UNIMELB-LCG2 GOG-Singapore INDIACMS-TIFR LCG_KNU Taiwan-IPAS-LCG2 Taiwan-NCUCC-LCG2 TOKYO-LCG2, TW-NCUHEP Total sites are 8 The final report https://twiki.cern.ch/twiki/pub/lcg/ssc1/ SSC_1_Debrief_2006-04-18.pdf
Security Service Challenge 2
SSC-2 Objective and Setup SSC-2 tested the traceability of storage operations (2007). From the Worker Node (WN) a sequence of seven storage operations have been executed. lcg_crx, lcg_lgx, lcg_repx, lcg_rx, lcg_cpx, lcg_delx Did not address the Security Incident Response Procedure Used the Global Grid User Support (GGUS) as the vehicle for communication between the Test Operator and the Target Sites.
SSC-2 - Task Given: User DN, Time range and SE The Sites had to find out: 1. For each of the identified storage operation, please indicate: The exact time (UTC). The type of operation. The URLs, filenames, catalog names and file paths involved. 2. Please indicate the IP-address of the User Interface (UI) that was used for the Job Submission
SSC-2 in AP Executed time : 2007/4/20 2007/5/4 Targeted Sites : 18 sites, 8 countries The procedure is http://lists.grid.sinica.edu.tw/apwiki/security_service_challenge? highlight=%28security%29 The final report could be found https://twiki.cern.ch/twiki/pub/ LCG/SSC2/SSC_2_Stage_2_Report_AsiaPacific.pdf
The result of SSC2 Site name Status Reply Feedback Australia-UNIMELB-LCG2 OK YES YES GOG-Singapore Error NO NO HK-HKU-CC-01 OK YES YES IN-DAE-VECC-01 OK NO NO INDIACMS-TIFR Error NO NO JP-KEK-CRC-01 Error NO NO JP-KEK-CRC-02 OK YES NO KR-KISTI-GCRT-01 OK YES YES LCG_KNU OK YES NO NCP-LCG2 OK YES YES PAKGRID-LCG2 OK YES NO Taiwan-IPAS-LCG2 OK YES NO Taiwan-NCUCC-LCG2 OK YES YES TOKYO-LCG2 OK YES YES TW-FTT Error NO NO TW-NTCU-HPC-01 OK YES YES TW-NIU-EECS-01 OK YES NO TW-NCUHEP OK NO NO Status : (1) Error could not submit a SSC job (2) OK success Reply : (1) Yes Reply the answer (2) No Not reply the answer Feedback : (1) Yes provide the feedback (2) No Not provide the feedback
Security Service Challenge 3
Preparing/Running Regional SSC3 Enabling Grids for E-sciencE TestOperator (TOp) is attacker and incident coordinator and... Get/Install SSC software from svn repository. Malicious binary (might need some tweaking) Job-Submission framework (scripts). Available for glite, globus (Aashish). Job-Monitoring webserver. Certificate, VO and all the rest. Get a grid certificate (short lived) for the TOp. Negotiate an identity used for TOp with a VO (this VO has to be supported by all sites). Make sure the default communication channels to the sites to be challenged work. Check sufficient queue length/wallclocktime. 72h nice, everything less needs some additional tweaking, but possible. Min. is 12h.
SSC-3 -a more realistic simulation of an incident, it challenges the Operational Responsiveness of LCG/ EGEE Grid Sites. The Job is launched from a User Interface (UI); It runs with valid credentials. Once running, it will exploit its environment to conceal its activities. Sign of life will be reported through an out-of-band channel. SSC-3 Objective and Setup
Alert Enabling Grids for E-sciencE SSC-3 Objective and Setup II The Alert is sent to the CSIRT e-mail address registered in the Grid Operations Center Data Base (GOCDB) The text clearly identifies the alert as a test. The Grid identity of the submitting user is indicated. The Site is asked to deal with the Alert following approved Incident Response Procedures. Send alert mails to : VO managers 4 weeks ago Alert-mail to sites roc-security-contact to 2 weeks ago
SSC-3 Incident Response The Incident Response is broken up in three activities: Communication Acknowledgment/Heads-up report to the indicated e-mail address. Alert to the VO manager. Verification that the responsible Certification Authority (CA) has been notified. Filing of the final report. Containment Identification of the Job and killing of its processes. Suspension of the offending user at the challenged Site. Forensics Discovery of emitting Site and contact to the Sites CSIRT. Analysis of network traffic. Analysis of the submitted binaries.
SSC-3 in AP Receive a ticket from GGUS Send a notification to ROC Initial analysis and classification Contact Certification Authority manager Contact Virtual Organization manager Post-incident analysis
Result of SSC3
Comment for SSC Material for SSC The material is available for download from https://twiki.cern.ch/twiki/bin/view/lcg/lcgsecuritychallenge More details at OSCT public web http://osct.web.cern.ch/osct/ssc.html SSC4 will coming soon~
Security Monitoring
Goals Detecting operational problems or event incidents Help sites to keep their resources secure Warning sites exposing vulnerabilities Only a basic set of probes currently Main focus on higher levels (ROC, project) Provide the project and ROC (OSCT) with information about site status not concerned with site level No special privileges required from sites Only public interfaces used https://twiki.cern.ch/twiki/pub/lcg/osct-egeeiiitasks/security-monitoring-v0.12.pdf Security Monitoring 27
Current Status A few SAM tests used CRL, file permission checks, Pakiti (patching status) Results encrypted and only available to ROC security contacts Further focus on Nagios-based framework Project and ROC view SAM probes ported Tests to be launched from ROC-level Nagios Results collected in a standard way via message bus Encryption must be applied Access allowed to ROC security contacts and site admins Synchronized with GOC DB Hopefully new probes will be developed Security Monitoring 28
Incident statistics A number of local root exploits released in 2009 Main entry points: Compromised user accounts at other sites (very difficult to control) Vulnerable Web applications Weak passwords (!) Main escalation factors (= how the attacker got root) Failure to apply security patches (Pakiti does help here) Weak passwords (!)
Recent patching campaigns Lots of efforts to eliminate critical vulnerabilities in 2009 Most common reasons for not patching were: In the majority of the cases, this was due to a communication problem (the recipients of our alerts, in the ROCs, at the sites, etc. thought somebody else would take care of this) Only a part of the farm was upgraded for some reason Some tried an exploit that did not work and concluded they were safe Some did not understand/agree with the implications of the risk and ignored our alerts Some thought they closed the job queues and were surprised (malicious) jobs could still be submitted Some upgraded, but did not rebooted the hosts A very small number of sites reported they could not upgrade due to missing third party drivers
Improve
Pakiti Security Patching status monitoring Simple design: A lightweight, unprivileged, shell client sends data to a server: List of installed packages ( rpm -qa ) Running kernel and operating system version The Pakiti client DOES NOT modify/patch the system The Pakiti server: Collects security + repository data from vendors Compares the input from the client and the repo information Concludes on the missing packages and applicable CVEs Displays the results on a Web interface and offers many views/search options Pakiti can help with many common issues: Is my cluster fully patched? Is there any node where auto-update is broken? Do I have any node vulnerable to CVE-2010-1234?
Pakiti (cont.) Open source tool to check patching status http://sourceforge.net/projects/pakiti/ Any site can run its own Pakiti server to monitor internal machines Server evaluates packages installed on clients Detects security patches not applied Allows for searching for particular vulnerabilities (CVE) Proved very useful recently (CVE-2009-2692, CVE-2009-2698) Currently maintained by OSCT A lot of improvements applied recently New version designed and prototyped during summer OSCT operates Pakiti server for EGEE Information collected with SAM/Nagios probes (WNs) Attention: Only OSCT members allowed to access Security Monitoring 33
Pakiti (cont.) Pakiti server https://pakiti.cern.ch/ Data collected by production SAM probes (4500 hosts) Any OSCT member can ask for access Check the results and talk to sites avoid miscommunications (PMB) Maintanence, development New version prototyped Sites installation possible New release is now available to all from SourceForge Metrics for proper evaluations missing Many vulnerable packages don t harm often EGEE09: Security Monitoring 34
Pakiti Results 4500 machines (all ROCs represented) Only 135 sites fully patched Note, that not all unpatched sites are vulnerable! Security Monitoring 35
New Release Pakiti has been used internally by the OSCT to track CVE-2009-3547, CVE-2009-2692, CVE-2009-2698, etc. Pakiti 2.1 is now available to all from SourceForge http://pakiti.sourceforge.net/
Conclusion SSC The challenge is from EGEE Operational Security Coordination Team (OSCT) The goal of the LCG/EGEE Security challenge is to conduct an audit trace as part of an incident response to ensure that appropriate communication channels with available sufficient information SSC4 will come soon!! Pakiti Open source could be found from http://sourceforge.net/projects/pakiti/ Security Patching status monitoring Any site can run its own Pakiti server to monitor internal machines Do not forget to restart your hosts after a kernel update
Reference OSCT public webpage http://osct.web.cern.ch/osct/ Security Service Challenge https://twiki.cern.ch/twiki/bin/view/lcg/lcgsecuritychallenge Incident Response Procedure https://edms.cern.ch/file/428035/last_released/ Incident_Response_Guide.pdf The SSC toolkit https://twiki.cern.ch/twiki/bin/view/lcg/lcgsecuritychallenge Pakiti Source https://www.sf.net/projects/pakiti
Question