Staying Out of the Swamp Perforce User Conference 2001 Richard E. Baum
Introduction Perforce runs well when given proper resources. CPU requirements are quite small. A server s I/O bandwidth is generally the major performance limitation. A server that is responding poorly may in fact be swamped with requests.
What We Will Cover How to tell if your server is swamped Tools you can use to evaluate system performance External factors that cause swamp How Perforce can cause swamp
How Do I Tell If I m in the Swamp?
How Do I Tell If I m in the Swamp? Check for obvious signs of a problem. Excessive CPU utilization Abnormal memory usage I/O bandwidth problems Use available operating system tools to analyze system status. Standard OS tools provide most of the functionality required.
CPU Bottlenecks Check to see if there are any free processor cycles. If there are no free cycles see what is using the CPU time. It may not be perforce. Unix: use ps ef or ps axl Windows: Programs-> Administrative tools-> Performance monitor
CPU Bottlenecks See what Perforce process are running Note the parent/child relationships Abbreviated process table output (Unix): chinadoll:reb reb% ps -ef UID PID PPID C STIME TTY TIME CMD perforce 795 680 0 10:38:39 pts/4 0:00./p4d -p 1667 -r. perforce 1909 795 7 11:59:25 pts/4 0:33./p4d -p 1667 -r. perforce 1911 795 9 11:59:41 pts/4 0:09./p4d -p 1667 -r.
Memory Bottlenecks How much RAM is in the system? How much swap space is defined? How much of these is available? Unix: use vmstat,, swap s, dmesg Windows: Use the task manager. (Ctl-Alt-Del)->Task Manager
vmstat: Memory Bottlenecks Swap column - free pages of swap Free column - free pages of RAM swap s: How much swap is configured?
Memory Bottlenecks Is the system swapping? swap -s output of a Solaris system: chinadoll:reb reb% % swap -s total: 22232k bytes allocated + 4520k reserved = 26752k used, 1534024k available vmstat output of a Solaris system: chinadoll:reb reb% vmstat 3 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr dd dd f0 s0 in sy cs us sy id 0 0 0 1532968 496240 118 315 0 2 2 0 0 0 0 0 0 328 512 154 1 5 94 0 0 0 1532968 496240 118 315 0 2 0 0 0 3 0 0 0 337 514 169 2 5 93 0 0 0 1532912 496312 468 334 28 10 2 2 0 0 28 0 0 0 386 865 173 21 8 71 0 0 0 1532760 508968 1537 315 113 84 0 0 0 0 95 0 0 0 516 1991 176 79 18 3 0 0 0 1532736 542296 1541 315 113 70 0 0 0 0 91 0 0 0 508 1909 157 81 18 1 0 0 0 1532728 557656 1402 315 102 08 0 0 0 0 89 0 0 0 505 1804 181 70 16 14
Memory Bottlenecks How much memory is in the system? Partial dmesg output of a Solaris system: Sep 9 21:45:42 chinadoll unix: : [ID 389951 kern.info] mem = 655360K (0x28000000) Sep 9 21:45:42 chinadoll unix: : [ID 930857 kern.info] avail mem = 638574592
What We Know About the System System has 640MB of RAM (655360kb) Between 496240kb and 557656kb of free memory. Percentage of user, system, and idle CPU time during vmstat run Use OS tools to establish a baseline of what normal values are.
Detecting Problems on Unix vmstat output of a Solaris system that is swapping: procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 s3 in sy cs us sy id 0 0 0 644688 28240 0 1263 608 5 5 0 0 103 2 5 762 4440 714 26 40 33 1 0 0 532216 18940 7 2281 226 8 8 0 0 9 2 53 511 2892 600 14 62 24 6 0 0 528912 7652 15 2159 44 232 818 3072 302 7 3 8 611 2488 613 32 61 6 5 0 0 521648 7112 10 2369 94 222 330 1640 44 35 8 2 781 2569 742 30 70 0 6 0 0 525804 7136 21 2381 1310 672 1840 1500 599 24 5 21 637 2741 634 31 69 0 2 0 0 527992 7880 9 1349 3405 557 2066 1100 684 7 4 133 675 1457 643 16 45 39 1 0 0 530208 6780 13 1261 3553 1170 3580 1220 1079 32 4 165 725 1375 661 12 43 46 1 0 0 526996 7028 7 855 36 181 530 1576 149 10 2 4 569 4596 539 66 23 11 0 0 0 527548 8440 16 541 65 114 250 1408 55 14 1 1 505 2002 559 44 20 36 0 0 0 530664 11008 2 499 25 2 0 1032 0 16 6 1 383 1209 511 6 15 79 1 0 0 531460 35916 0 370 130 0 0 756 0 4 6 0 367 866 519 3 24 73 0 0 0 645488 123780 0 108 20 0 0 516 0 5 5 3 376 830 457 4 7 89
Detecting Problems on Unix Over 100mb of swap space in use CPU utilization peaks when system starts to swap, with zero idle cycles CPU utilization lower later, when paging memory in/out
Windows Task Manager Performance screen is most useful overall. Beware of deceptive information! CPU utilization is displayed as a percentage of total.
Windows Task Manager 256mb (262000kb) RAM Windows shows RAM + swap As total RAM 296192kb in use This system is swapping!
Windows Task Manager Perforce server is using 11% of CPU Under 2mb of RAM
I/O Problems Perforce can only move data as fast as the slowest point in the data path Two main areas for problems: Disk Network
I/O Problems Disk access: Fast SCSI drives can transfer 45mb/sec RAID arrays can increase performance Multiple conflicting requests Disk errors
I/O Problems vmstat shows number of disk operations per second vmstat output of a Solaris system that is swapping: procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 s3 in sy cs us sy id 0 0 0 644688 28240 0 1263 608 5 5 0 0 103 2 5 762 4440 714 26 40 33 1 0 0 532216 18940 7 2281 226 8 8 0 0 9 2 53 511 2892 600 14 62 24 6 0 0 528912 7652 15 2159 44 232 818 3072 302 7 3 8 611 2488 613 32 61 6 5 0 0 521648 7112 10 2369 94 222 330 1640 44 35 8 2 781 2569 742 30 70 0 6 0 0 525804 7136 21 2381 1310 672 1840 1500 599 24 5 21 637 2741 634 31 69 0 2 0 0 527992 7880 9 1349 3405 557 2066 1100 684 7 4 133 675 1457 643 16 45 39 1 0 0 530208 6780 13 1261 3553 1170 3580 1220 1079 32 4 165 725 1375 661 12 43 46 1 0 0 526996 7028 7 855 36 181 530 1576 149 10 2 4 569 4596 539 66 23 11 0 0 0 527548 8440 16 541 65 114 250 1408 55 14 1 1 505 2002 559 44 20 36 0 0 0 530664 11008 2 499 25 2 0 1032 0 16 6 1 383 1209 511 6 15 79 1 0 0 531460 35916 0 370 130 0 0 756 0 4 6 0 367 866 519 3 24 73 0 0 0 645488 123780 0 108 20 0 0 516 0 5 5 3 376 830 457 4 7 89
I/O Problems For more I/O detail try iostat Network: Network is not as fast as you think (10mb/100mb/1gb) Use of a hub on 100mb network (half duplex) Single network interface Same interface for NAS and users Physical problem (loose cable)
I/O Problems Network problems netstat output during large file transfer. netstat output of a Solaris system: chinadoll:reb reb% netstat -i -I le0 3 input le0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 0 0 0 0 0 1106096 0 740342 0 0 0 0 0 0 0 37 0 47 0 0 0 0 0 0 0 40 0 48 0 0 0 0 0 0 0 81 0 117 0 0 0 0 0 0 0 2397 0 4395 0 0 0 0 0 0 0 2776 0 4898 0 0 0 0 0 0 0 878 0 1708 0 0 0 0 0 0 0 776 0 1498 0 0 0 0 0 0 0 83 0 167 0 0 0 0 0 0 0 66 0 91 0 0
Windows Performance Monitor Allows monitoring of almost any part of the system. Monitor performance as it relates to: A particular process A particular thread The whole system
Windows Performance Monitor Example A Perforce operation that submitted a large binary file to the depot A scripted Perforce operation that performed many small submit operations. From the chart we can see that the network traffic is off the scale during the large submit. During the numerous small operations, the perforce process itself is performing much more work.
Windows Performance Monitor Number of bytes/sec processed by p4s Example Number of bytes written to/from the disk/sec Number of bytes/sec handled by the network card.
How Can Perforce Cause Server Swamp? There are many reasons your system can become overwhelmed Not all of these have to do with Perforce Your system may be used for other tasks as well.
Network Attached Storage Can allow faster disk access if properly configured Involves more complex configuration than local drives Opens additional areas for configuration problems
NAS Performance Issues Permission/locking problems NT service local user network access changed permissions Locking across the network can be slow Network topology problems Lack of adequate NAS server bandwidth Saturated network Improperly configured connection
Confusing / Complex Client Maps Confusing client maps may generate unexpected results. Perforce tries to do what you ve asked. In general, the last mapping wins. //depot1/... //client_name/... //depot2/... //client_name/subpath/...
Confusing / Complex Client Maps Client mappings do not use much memory unless multiple wildcards are used. Such views cause the server to do a lot of extra work mapping all combinations. //depot/.../subdir/... //client_name/.../subdir/...
Confusing / Complex Client Maps Mappings without a 1-to-1 relationship between client and server can be confusing. They can, given the proper conditions, also be harmful to Perforce servers.
Confusing / Complex Client Maps Example: With a new server create a new client with a default view and submit this file: //depot/subdir subdir/file Change the client view to this: //depot/... //client/... //depot/subdir/... //client/a/subdirectory/...
Confusing / Complex Client Maps Example: This command will then cause pre- 2001.1 servers enter an infinite loop: p4 dirs //client/* 2001.1 displays this message instead: Operation: user-dirs Operation 'user-dirs' failed. Client map too twisted for directory list.
Confusing / Complex Client Maps Perforce can not check your client specifications for confusing, complex, or ambiguous mappings. You must do so by hand.
Confusing / Complex Client Maps Some ways to address these issues: Upgrade your server to release 2001.1. Use the server debug flags. Carefully analyze each client specification in your system. Educate your users.
Background Processes Other processes can consume resources even on dedicated Perforce servers. Backup utilities Virus scanners Don t run these on db.* metadata files!
Backup Utilities Saturate disk and/or network access Compression uses a lot of CPU time Lock db.* which causes server to fail or Can back up inconsistent metadata files.
Virus Scanners Can cause problems even when not run on depot metadata. Many sites require scanning of versioned file tree. Monitor CPU utilization.
The Perforce Error Log The default error log messages help determine that client connections have failed. Default messages do not show what clients were doing or which clients had problems. Perforce server error: Date 2001/08/27 11:17:32: TCP send failed. write: socket: WSAECONNRESET
The Perforce Error Log The server flag server=1 adds: Date and time Process ID of the server process Perforce user ID Client specification name IP address of the client The operation that the client invoked
The Perforce Error Log Sample output with server=1 set: Perforce server info: 2001/08/26 16:20:16 pid 760 lesh@drumz 192.168.1.2 'user-opened' Perforce server info: 2001/08/26 16:20:16 pid 760 lesh@drumz 192.168.1.2 'user-resolve -n' Perforce server info: 2001/08/26 16:20:16 pid 760 lesh@drumz 192.168.1.2 'user-resolved'
The Perforce Error Log 2001.1 server flag server=2 adds an additional completed message: Perforce server info: 2001/08/26 16:50:16 pid 1460 weir@home 192.168.1.3 'user-changes -l -s submitted' Perforce server info: 2001/08/26 16:50:16 pid 1460 completed Perforce server info: 2001/08/26 16:50:25 pid 1500 lesh@drumz 192.168.1.2 'user-dirs -C -D //depot/*' Perforce server info: 2001/08/26 16:50:58 pid 1308 fred@terrapin 192.168.1.7 'user-files //...' Perforce server info: 2001/08/26 16:51:10 pid 1500 lesh@drumz 192.168.1.2 'user-verify //...' Perforce server error: Date 2001/08/26 16:52:34: TCP send failed. write: socket: WSAECONNRESET Can't invoke remote operation 'client-outputdata'. TCP send failed. write: socket: WSAECONNRESET Perforce server info: 2001/08/26 16:58:36 pid 1500 completed
The Perforce Error Log Some errors point directly at server problems. Librarian errors indicate that the server can not read/write the versioned file tree. Perforce server error: Date 2001/08/27 08:52:55: Operation: lbr-submitfile Operation 'lbr-submitfile' failed. Librarian checkin depot/file.txt failed. lock on depot/file.txt,v failed open for write: depot/,file.txt,: Access is denied.
Gigantic Operations User requests may seem innocuous but have large costs of execution. Cause disk, RAM, I/O strain Block other operations
Gigantic Operations Example: p4 submit Sends data to the server. Data is stored in a temporary location. Once data is on server, compute phase begins and appropriate locks are taken. Submit writes to number of tables. During write, a lock will block access to other operations that access these tables.
Gigantic Operations Imprecise operations with wildcards can use a lot of resources. Example: p4 files //depot/ /file.txt Causes a full table scan of db.rev Locks other users out of operations using db.rev while this occurs.
Gigantic Operations Most large operations can be prevented from overwhelming your server with maxresults. Set on group level. Restricts the maximum number of results returned by a query. Queries have interim steps.
Gigantic Operations Requests triggering maxresults show: Request too large (over 10000); see 'p4 help maxresults'. Large operations can usually be easily broken up into smaller ones: p4 sync //depot/... Becomes: p4 sync //depot/dir1/... p4 sync //depot/dir2/... p4 sync //depot/dir3/...
Gigantic Operations Use of p4 verify is quite CPU intensive. Checks/generates MD5 checksum of each revision of each file. Degrades performance.
Conclusions Perforce runs well when given proper resources Generate a baseline you can use evaluating server performance Use available tools Educate users Call support!