Staying Out of the Swamp

Similar documents
Performance and Database Locking at Large Perforce Sites

Performance Analysis

Better Living Through New Releases

Life on the Edge: Monitoring and Running A Very Large Perforce Installation

Managing the Analyzer

Help! My system is slow!, meetbsd, 2008, Kris Kennaway

Perforce Tunables. Michael Shields Performance Lab Manager Perforce Software. March, Abstract

The former pager tasks have been replaced in 7.9 by the special savepoint tasks.

Sistemi in Tempo Reale

Still All on One Server: Perforce at Scale

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Perforce Performance. Michael Shields Performance Lab Manager Perforce Software

Performance Tuning. Chapter 25

OpenEdge 12.0 Database Performance and Server Side Joins. Richard Banville Fellow, OpenEdge Development October 12, 2018

Distributed File Systems Part II. Distributed File System Implementation

Introduction to Process in Computing Systems SEEM

2

Tivoli Storage Manager Technical Exchange. Performance Diagnosis. Dave Daun, IBM Advanced Technical Support, IBM Software Group

Parallels Virtuozzo Containers

Performance issues in Cerm What to check first?

Estimate performance and capacity requirements for InfoPath Forms Services 2010

System recommendations for version 17.1

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

CPSC 457 OPERATING SYSTEMS MIDTERM EXAM

A Scalable Event Dispatching Library for Linux Network Servers

Final Examination CS 111, Fall 2016 UCLA. Name:

Part 2 (Disk Pane, Network Pane, Process Details & Troubleshooting)

Unix Processes. What is a Process?

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS

Lesson 1: Using Task Manager

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

File Server Comparison: Executive Summary. Microsoft Windows NT Server 4.0 and Novell NetWare 5. Contents

-Device. -Physical or virtual thing that does something -Software + hardware to operate a device (Controller runs port, Bus, device)

COSC 6385 Computer Architecture. Storage Systems

Notices Carbonite Availability for Linux User's Guide Version 8.1.1, Thursday, April 5, 2018 If you need technical assistance, you can contact

DupScout DUPLICATE FILES FINDER

Operating Systems, Unix Files and Commands SEEM

Informatica Developer Tips for Troubleshooting Common Issues PowerCenter 8 Standard Edition. Eugene Gonzalez Support Enablement Manager, Informatica

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Version Double-Take Availability for Linux User's Guide

Binghamton University. CS-220 Spring Sharing Resources. Computer Systems Chapter 8.2, 8.4

CIT 470: Advanced Network and System Administration. Topics. What is performance testing? Performance Monitoring

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Monitoring and Trouble Shooting on BioHPC

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

Services: Monitoring and Logging. 9/16/2018 IST346: Info Tech Management & Administration 1

Process States. Controlling processes. Process states. PID and PPID UID and EUID GID and EGID Niceness Control terminal. Runnable. Sleeping.

Lesson 2: Using the Performance Console

Quest Central for DB2

Installing Prime Optical

Overcoming Obstacles to Petabyte Archives

COMP 3361: Operating Systems 1 Final Exam Winter 2009

SysGauge SYSTEM MONITOR. User Manual. Version 3.8. Oct Flexense Ltd.

Operating Systems. Operating Systems Professor Sina Meraji U of T

PROCESS VIRTUAL MEMORY PART 2. CS124 Operating Systems Winter , Lecture 19

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Virtual Memory #2 Feb. 21, 2018

NetVault Backup Client and Server Sizing Guide 3.0

Practical 5. Linux Commands: Working with Files

Monitoring NT Performance

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

rsync link-dest Local, rotated, quick and useful backups!

Lab 2: Threads and Processes

Lecture 08: When disaster strikes and all else fails

Introduction to parallel Computing

CS 537 Fall 2017 Review Session

Why You Should Consider a Hardware Based Protocol Analyzer?

Distributed Filesystem

Improving Perforce Performance At Research In Motion (RIM)

Some popular Operating Systems include Linux Operating System, Windows Operating System, VMS, OS/400, AIX, z/os, etc.

Storage. Hwansoo Han

Unix Tools and Scripts to Monitor Informix IDS

Section 9: Cache, Clock Algorithm, Banker s Algorithm and Demand Paging

BASIC OPERATIONS. Managing System Resources

CS2506 Quick Revision

Repository Structure Considerations for Performance

Chapter 11: File System Implementation. Objectives

Monitoring Agent for Unix OS Version Reference IBM

CIS Operating Systems Memory Management Address Translation for Paging. Professor Qiang Zeng Spring 2018

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

GFS: The Google File System

DiskBoss DATA MANAGEMENT

Let s Tune Oracle8 for NT

Section 1: Tools. Contents CS162. January 19, Make More details about Make Git Commands to know... 3

CS 3733 Operating Systems:

NetVault Backup Client and Server Sizing Guide 2.1

CS3600 SYSTEMS AND NETWORKS

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

CS-537: Midterm Exam (Spring 2009) The Future of Processors, Operating Systems, and You

Key Performance Metrics Exposed in EdgeSight for XenApp 5.0 and EdgeSight for Endpoints 5.0

Unity 1.0 Troubleshooting Guide

Chapter 8. Virtual Memory

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Voxco Command Center, Voxco Online, and Voxco Dialer - Technical specifications & Recommendations

CS 416: Operating Systems Design April 22, 2015

Modern RAID Technology. RAID Primer A Configuration Guide

Ch. 13: Measuring Performance

Jyotheswar Kuricheti

Transcription:

Staying Out of the Swamp Perforce User Conference 2001 Richard E. Baum

Introduction Perforce runs well when given proper resources. CPU requirements are quite small. A server s I/O bandwidth is generally the major performance limitation. A server that is responding poorly may in fact be swamped with requests.

What We Will Cover How to tell if your server is swamped Tools you can use to evaluate system performance External factors that cause swamp How Perforce can cause swamp

How Do I Tell If I m in the Swamp?

How Do I Tell If I m in the Swamp? Check for obvious signs of a problem. Excessive CPU utilization Abnormal memory usage I/O bandwidth problems Use available operating system tools to analyze system status. Standard OS tools provide most of the functionality required.

CPU Bottlenecks Check to see if there are any free processor cycles. If there are no free cycles see what is using the CPU time. It may not be perforce. Unix: use ps ef or ps axl Windows: Programs-> Administrative tools-> Performance monitor

CPU Bottlenecks See what Perforce process are running Note the parent/child relationships Abbreviated process table output (Unix): chinadoll:reb reb% ps -ef UID PID PPID C STIME TTY TIME CMD perforce 795 680 0 10:38:39 pts/4 0:00./p4d -p 1667 -r. perforce 1909 795 7 11:59:25 pts/4 0:33./p4d -p 1667 -r. perforce 1911 795 9 11:59:41 pts/4 0:09./p4d -p 1667 -r.

Memory Bottlenecks How much RAM is in the system? How much swap space is defined? How much of these is available? Unix: use vmstat,, swap s, dmesg Windows: Use the task manager. (Ctl-Alt-Del)->Task Manager

vmstat: Memory Bottlenecks Swap column - free pages of swap Free column - free pages of RAM swap s: How much swap is configured?

Memory Bottlenecks Is the system swapping? swap -s output of a Solaris system: chinadoll:reb reb% % swap -s total: 22232k bytes allocated + 4520k reserved = 26752k used, 1534024k available vmstat output of a Solaris system: chinadoll:reb reb% vmstat 3 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr dd dd f0 s0 in sy cs us sy id 0 0 0 1532968 496240 118 315 0 2 2 0 0 0 0 0 0 328 512 154 1 5 94 0 0 0 1532968 496240 118 315 0 2 0 0 0 3 0 0 0 337 514 169 2 5 93 0 0 0 1532912 496312 468 334 28 10 2 2 0 0 28 0 0 0 386 865 173 21 8 71 0 0 0 1532760 508968 1537 315 113 84 0 0 0 0 95 0 0 0 516 1991 176 79 18 3 0 0 0 1532736 542296 1541 315 113 70 0 0 0 0 91 0 0 0 508 1909 157 81 18 1 0 0 0 1532728 557656 1402 315 102 08 0 0 0 0 89 0 0 0 505 1804 181 70 16 14

Memory Bottlenecks How much memory is in the system? Partial dmesg output of a Solaris system: Sep 9 21:45:42 chinadoll unix: : [ID 389951 kern.info] mem = 655360K (0x28000000) Sep 9 21:45:42 chinadoll unix: : [ID 930857 kern.info] avail mem = 638574592

What We Know About the System System has 640MB of RAM (655360kb) Between 496240kb and 557656kb of free memory. Percentage of user, system, and idle CPU time during vmstat run Use OS tools to establish a baseline of what normal values are.

Detecting Problems on Unix vmstat output of a Solaris system that is swapping: procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 s3 in sy cs us sy id 0 0 0 644688 28240 0 1263 608 5 5 0 0 103 2 5 762 4440 714 26 40 33 1 0 0 532216 18940 7 2281 226 8 8 0 0 9 2 53 511 2892 600 14 62 24 6 0 0 528912 7652 15 2159 44 232 818 3072 302 7 3 8 611 2488 613 32 61 6 5 0 0 521648 7112 10 2369 94 222 330 1640 44 35 8 2 781 2569 742 30 70 0 6 0 0 525804 7136 21 2381 1310 672 1840 1500 599 24 5 21 637 2741 634 31 69 0 2 0 0 527992 7880 9 1349 3405 557 2066 1100 684 7 4 133 675 1457 643 16 45 39 1 0 0 530208 6780 13 1261 3553 1170 3580 1220 1079 32 4 165 725 1375 661 12 43 46 1 0 0 526996 7028 7 855 36 181 530 1576 149 10 2 4 569 4596 539 66 23 11 0 0 0 527548 8440 16 541 65 114 250 1408 55 14 1 1 505 2002 559 44 20 36 0 0 0 530664 11008 2 499 25 2 0 1032 0 16 6 1 383 1209 511 6 15 79 1 0 0 531460 35916 0 370 130 0 0 756 0 4 6 0 367 866 519 3 24 73 0 0 0 645488 123780 0 108 20 0 0 516 0 5 5 3 376 830 457 4 7 89

Detecting Problems on Unix Over 100mb of swap space in use CPU utilization peaks when system starts to swap, with zero idle cycles CPU utilization lower later, when paging memory in/out

Windows Task Manager Performance screen is most useful overall. Beware of deceptive information! CPU utilization is displayed as a percentage of total.

Windows Task Manager 256mb (262000kb) RAM Windows shows RAM + swap As total RAM 296192kb in use This system is swapping!

Windows Task Manager Perforce server is using 11% of CPU Under 2mb of RAM

I/O Problems Perforce can only move data as fast as the slowest point in the data path Two main areas for problems: Disk Network

I/O Problems Disk access: Fast SCSI drives can transfer 45mb/sec RAID arrays can increase performance Multiple conflicting requests Disk errors

I/O Problems vmstat shows number of disk operations per second vmstat output of a Solaris system that is swapping: procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 s3 in sy cs us sy id 0 0 0 644688 28240 0 1263 608 5 5 0 0 103 2 5 762 4440 714 26 40 33 1 0 0 532216 18940 7 2281 226 8 8 0 0 9 2 53 511 2892 600 14 62 24 6 0 0 528912 7652 15 2159 44 232 818 3072 302 7 3 8 611 2488 613 32 61 6 5 0 0 521648 7112 10 2369 94 222 330 1640 44 35 8 2 781 2569 742 30 70 0 6 0 0 525804 7136 21 2381 1310 672 1840 1500 599 24 5 21 637 2741 634 31 69 0 2 0 0 527992 7880 9 1349 3405 557 2066 1100 684 7 4 133 675 1457 643 16 45 39 1 0 0 530208 6780 13 1261 3553 1170 3580 1220 1079 32 4 165 725 1375 661 12 43 46 1 0 0 526996 7028 7 855 36 181 530 1576 149 10 2 4 569 4596 539 66 23 11 0 0 0 527548 8440 16 541 65 114 250 1408 55 14 1 1 505 2002 559 44 20 36 0 0 0 530664 11008 2 499 25 2 0 1032 0 16 6 1 383 1209 511 6 15 79 1 0 0 531460 35916 0 370 130 0 0 756 0 4 6 0 367 866 519 3 24 73 0 0 0 645488 123780 0 108 20 0 0 516 0 5 5 3 376 830 457 4 7 89

I/O Problems For more I/O detail try iostat Network: Network is not as fast as you think (10mb/100mb/1gb) Use of a hub on 100mb network (half duplex) Single network interface Same interface for NAS and users Physical problem (loose cable)

I/O Problems Network problems netstat output during large file transfer. netstat output of a Solaris system: chinadoll:reb reb% netstat -i -I le0 3 input le0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 0 0 0 0 0 1106096 0 740342 0 0 0 0 0 0 0 37 0 47 0 0 0 0 0 0 0 40 0 48 0 0 0 0 0 0 0 81 0 117 0 0 0 0 0 0 0 2397 0 4395 0 0 0 0 0 0 0 2776 0 4898 0 0 0 0 0 0 0 878 0 1708 0 0 0 0 0 0 0 776 0 1498 0 0 0 0 0 0 0 83 0 167 0 0 0 0 0 0 0 66 0 91 0 0

Windows Performance Monitor Allows monitoring of almost any part of the system. Monitor performance as it relates to: A particular process A particular thread The whole system

Windows Performance Monitor Example A Perforce operation that submitted a large binary file to the depot A scripted Perforce operation that performed many small submit operations. From the chart we can see that the network traffic is off the scale during the large submit. During the numerous small operations, the perforce process itself is performing much more work.

Windows Performance Monitor Number of bytes/sec processed by p4s Example Number of bytes written to/from the disk/sec Number of bytes/sec handled by the network card.

How Can Perforce Cause Server Swamp? There are many reasons your system can become overwhelmed Not all of these have to do with Perforce Your system may be used for other tasks as well.

Network Attached Storage Can allow faster disk access if properly configured Involves more complex configuration than local drives Opens additional areas for configuration problems

NAS Performance Issues Permission/locking problems NT service local user network access changed permissions Locking across the network can be slow Network topology problems Lack of adequate NAS server bandwidth Saturated network Improperly configured connection

Confusing / Complex Client Maps Confusing client maps may generate unexpected results. Perforce tries to do what you ve asked. In general, the last mapping wins. //depot1/... //client_name/... //depot2/... //client_name/subpath/...

Confusing / Complex Client Maps Client mappings do not use much memory unless multiple wildcards are used. Such views cause the server to do a lot of extra work mapping all combinations. //depot/.../subdir/... //client_name/.../subdir/...

Confusing / Complex Client Maps Mappings without a 1-to-1 relationship between client and server can be confusing. They can, given the proper conditions, also be harmful to Perforce servers.

Confusing / Complex Client Maps Example: With a new server create a new client with a default view and submit this file: //depot/subdir subdir/file Change the client view to this: //depot/... //client/... //depot/subdir/... //client/a/subdirectory/...

Confusing / Complex Client Maps Example: This command will then cause pre- 2001.1 servers enter an infinite loop: p4 dirs //client/* 2001.1 displays this message instead: Operation: user-dirs Operation 'user-dirs' failed. Client map too twisted for directory list.

Confusing / Complex Client Maps Perforce can not check your client specifications for confusing, complex, or ambiguous mappings. You must do so by hand.

Confusing / Complex Client Maps Some ways to address these issues: Upgrade your server to release 2001.1. Use the server debug flags. Carefully analyze each client specification in your system. Educate your users.

Background Processes Other processes can consume resources even on dedicated Perforce servers. Backup utilities Virus scanners Don t run these on db.* metadata files!

Backup Utilities Saturate disk and/or network access Compression uses a lot of CPU time Lock db.* which causes server to fail or Can back up inconsistent metadata files.

Virus Scanners Can cause problems even when not run on depot metadata. Many sites require scanning of versioned file tree. Monitor CPU utilization.

The Perforce Error Log The default error log messages help determine that client connections have failed. Default messages do not show what clients were doing or which clients had problems. Perforce server error: Date 2001/08/27 11:17:32: TCP send failed. write: socket: WSAECONNRESET

The Perforce Error Log The server flag server=1 adds: Date and time Process ID of the server process Perforce user ID Client specification name IP address of the client The operation that the client invoked

The Perforce Error Log Sample output with server=1 set: Perforce server info: 2001/08/26 16:20:16 pid 760 lesh@drumz 192.168.1.2 'user-opened' Perforce server info: 2001/08/26 16:20:16 pid 760 lesh@drumz 192.168.1.2 'user-resolve -n' Perforce server info: 2001/08/26 16:20:16 pid 760 lesh@drumz 192.168.1.2 'user-resolved'

The Perforce Error Log 2001.1 server flag server=2 adds an additional completed message: Perforce server info: 2001/08/26 16:50:16 pid 1460 weir@home 192.168.1.3 'user-changes -l -s submitted' Perforce server info: 2001/08/26 16:50:16 pid 1460 completed Perforce server info: 2001/08/26 16:50:25 pid 1500 lesh@drumz 192.168.1.2 'user-dirs -C -D //depot/*' Perforce server info: 2001/08/26 16:50:58 pid 1308 fred@terrapin 192.168.1.7 'user-files //...' Perforce server info: 2001/08/26 16:51:10 pid 1500 lesh@drumz 192.168.1.2 'user-verify //...' Perforce server error: Date 2001/08/26 16:52:34: TCP send failed. write: socket: WSAECONNRESET Can't invoke remote operation 'client-outputdata'. TCP send failed. write: socket: WSAECONNRESET Perforce server info: 2001/08/26 16:58:36 pid 1500 completed

The Perforce Error Log Some errors point directly at server problems. Librarian errors indicate that the server can not read/write the versioned file tree. Perforce server error: Date 2001/08/27 08:52:55: Operation: lbr-submitfile Operation 'lbr-submitfile' failed. Librarian checkin depot/file.txt failed. lock on depot/file.txt,v failed open for write: depot/,file.txt,: Access is denied.

Gigantic Operations User requests may seem innocuous but have large costs of execution. Cause disk, RAM, I/O strain Block other operations

Gigantic Operations Example: p4 submit Sends data to the server. Data is stored in a temporary location. Once data is on server, compute phase begins and appropriate locks are taken. Submit writes to number of tables. During write, a lock will block access to other operations that access these tables.

Gigantic Operations Imprecise operations with wildcards can use a lot of resources. Example: p4 files //depot/ /file.txt Causes a full table scan of db.rev Locks other users out of operations using db.rev while this occurs.

Gigantic Operations Most large operations can be prevented from overwhelming your server with maxresults. Set on group level. Restricts the maximum number of results returned by a query. Queries have interim steps.

Gigantic Operations Requests triggering maxresults show: Request too large (over 10000); see 'p4 help maxresults'. Large operations can usually be easily broken up into smaller ones: p4 sync //depot/... Becomes: p4 sync //depot/dir1/... p4 sync //depot/dir2/... p4 sync //depot/dir3/...

Gigantic Operations Use of p4 verify is quite CPU intensive. Checks/generates MD5 checksum of each revision of each file. Degrades performance.

Conclusions Perforce runs well when given proper resources Generate a baseline you can use evaluating server performance Use available tools Educate users Call support!