IBM FileNet Content Manager support for IBM General Parallel File System (GPFS) September 2014 IBM SWG Enterprise Content Management IBM FileNet Content Manager and IBM GPFS Copyright IBM Corporation 2014 Enterprise Content Management www.ibm.com No part of this document may be reproduced in any form by any means without prior written authorization of IBM. This document is provided as is without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranty of merchantability or fitness for a particular purpose. This document is intended for informational purposes only. It could include technical inaccuracies or typographical errors. The information herein and any conclusions drawn from it are subject to change without notice. Many factors have contributed to the results described herein and IBM does not guarantee comparable results. Performance numbers will vary greatly depending upon system configuration. All data in this document pertains only to the specific test configuration and specific releases of the software described.
IBM FileNet Content Manager and IBM GPFS Page 2 CONTENTS Introduction...3 More about GPFS...3 GPFS configuration models...4 SAN / direct access model (supported)...4 NSD / shared disk model (supported)...4 Share nothing model (not currently supported)...5 Configuration best practices...6 Storage considerations...6 NSD server considerations...6 Tier Breaker Disks...6 Optimal number of quorum nodes...6 GPFS performance considerations...6 pagepool...6 maxfilestocache...7 maxstatcache...7 maxmbps...7 blocksize...7 Separate disks for data and meta-data...7 Conclusion...7 References...8 Author Information...9
IBM FileNet Content Manager and IBM GPFS Page 3 Introduction IBM General Parallel File System (GPFS) is a POSIX compliant file management infrastructure that provides both outstanding performance and reliability. GPFS has been proven to perform in small as well as large clustered deployments, reaching up to thousands of nodes hosting multi-petabyte file systems. GPFS also provides industry leading high availability for IBM FileNet Content Manager. IBM FileNet Content Manger fully supports IBM GPFS for the file system tier of your enterprise ECM solution. By choosing the proper model, GPFS has the flexibility to meet any IT organization's needs. This document will provide a high-level overview of GPFS and how FileNet Content Manager can leverage GPFS's vast capabilities to manage file system content efficiently and effectively. Also, a brief discussion on how to configure GPFS for optimal performance and high availability with IBM FileNet Content Manager is provided. Following the best practices outlined in this document will help ensure IBM FileNet Content Manager provides the performance and reliability expected from a superior file system technology. More about GPFS Advanced document management capabilities are achieved using the GPFS Information Lifecycle Management (ILM) toolset. The ability to define storage policies and create storage pools gives IT departments the flexibility to store content on multiple storage tiers. This improves application performance in addition to saving money by moving less frequently accessed data to less costly storage solutions. Robust clustering features, automatic replication and snapshot capabilities provide a high level of fault tolerance. A properly configured GPFS cluster can remain online even after suffering multiple failures providing zero-down time. For more information regarding GPFS, please consult the GPFS Knowledge Center or contact your IBM sales representative. For more information regarding GPFS, please consult the IBM GPFS Knowledge Center.
IBM FileNet Content Manager and IBM GPFS Page 4 GPFS configuration models SAN / direct access model (supported) The SAN model, where all GPFS nodes are designated as server nodes with direct access to the underlying storage devices, is typically the best performing configuration, although there are drawbacks to this model that should be noted. The SAN model can be expensive and is less flexible than the NSD model. This is due to all GPFS nodes requiring the necessary hardware to connect directly to the storage device. Therefore, it may be impractical for large GPFS clusters to follow the SAN model and those customers should consider the NSD model. The model provides excellent high availability characteristics. When each node is designated as a server node and tie breaker disk(s) are in place, the cluster can remain online in the most trying of circumstances. Consider a 4 node cluster where each node has direct fiber connections to an underlying SAN storage device. If 3 of the nodes lose SAN connectivity, quorum can still be maintained with one node and the use of a tie breaker disk. The nodes that lost connectivity to the storage device will essentially become clients to the single remaining GPFS server and be able to retrieve data over the Ethernet communication channel until direct SAN connectivity can be restored. NSD / shared disk model (supported) Figure 1: SAN model The Network Shared Disk (NSD) model is considered the most flexible and scalable configuration. This model allows for GPFS servers and clients, where the servers have direct access to the storage device(s) and a client can access this data from any network addressable location. When using the NSD model, it is suggested to use the highest performing Ethernet option available between nodes. Both Ethernet and InfiniBand is supported for the LAN fabric. A
IBM FileNet Content Manager and IBM GPFS Page 5 minimum of two GPFS servers with direct disk access is not only important for maintaining high availability, application performance may be impacted negatively if the GPFS server(s) cannot meet the I/O demands of the GPFS clients. Figure 2: NSD model Share nothing model (not currently supported) This model is not supported by IBM FileNet CM. Due to the stateless nature of FileNet CM, each CPE instance must have access to all underlying content. This model is primarily designed for Hadoop MapReduce and SAP HANA applications. Cloud deployments can also leverage this model since each tenant represents a separate application environment with full data isolation between tenants, but that type of configuration is out of the scope of this document.
IBM FileNet Content Manager and IBM GPFS Page 6 Configuration best practices Storage considerations It is important to ensure the SAN device is highly available, meaning there are no single points of failure present. Highly available configurations include using multiple RAID controllers, configuring SAN failover and defining primary and backup servers for each LUN. It is best practice for a NSD to be associated with a single LUN. Each LUN can have up to 8 GPFS servers defined. NSD server considerations For optimal performance, it is necessary to configure multiple I/O paths for each GPFS server. A minimum of 2 I/O paths is suggested for parallel I/O operations. For the SAN model, even though all GPFS servers have a direct SAN connection, you should define servers for the NSDs so if the fiber connection fails, access to the NSD from that I/O server can still occur over the network. GPFS will always prefer block devices over a network shared disk. Tier Breaker Disks Tie breaker disks should only be needed in small clusters. In a two node cluster, it is best to require both nodes for quorum and then create tie breaker disk(s) in case one node goes offline so that quorum is still maintained. GPFS best practices state it is best to create either 1 or 3 tie breaker disks. Large clusters do not typically benefit from tie breaker disks since there are typically enough nodes in the cluster to maintain quorum rather than rely on tie breaker disks. Optimal number of quorum nodes In general, it is suggested to have 3 to 5 quorum nodes and never more than 7 as this can have a negative impact on cluster performance during recovery without providing increased availability. GPFS performance considerations pagepool Sufficient pagepool (pinned memory) is critcal for optimal application performance. GPFS does not use operating system file cache and will rely on the pagepool for cache, similar to a database bufferpool. The pagepool can be increased dynamically but not reduced dynamically and can be set at the node level for increased flexibility.
IBM FileNet Content Manager and IBM GPFS Page 7 maxfilestocache Controls the number of files that can be held in the pagepool. General guidance states this setting should be large enough to handle the number of concurrently open files in addition to recently used files. Consider full text indexing of recently ingested content. FileNet CM will need to access the content element once the index request is processed in order to perform extraction. If this content is found in the cache, the overall indexing performance can be improved dramatically. For scenarios where recently ingested content is not required for immediate retrieval, the default value should suffice. maxstatcache Defaults to 4 times the maxfilestocache which should be sufficient for general purpose workloads. FileNet CM workloads that are heavily oriented towards content retrieval may see improved performance by increasing this value. If you have increased the maxfilestocache parameter to a very large value, the maxstatcache may be set unnecessarily high and will consume memory that could be used by other applications. maxmbps Recommended to be set to twice the throughput required by the system. In GPFS v3.5, the default is 2048, which should be sufficient for most applications. blocksize The default block size should be sufficient for most applications, although a larger blocksize may be considered if large content is being stored on the file system. For maximum performance, it is important to consider the GPFS block size when defining the RAID stripe size for the underlying LUN(s). The GPFS block size should match, or be a multiple of the RAID stripe size. Separate disks for data and meta-data GPFS provides the ability to define a disk to hold data only, file system meta-data only or both data and meta-data. For general purposes, it is not necessary to separate meta-data from data. If your FileNet Content Manager deployment frequently performs file system meta-data operations, application performance may be improved by dedicating multiple disks for only file system meta-data. Conclusion IBM FileNet Content Manger fully supports IBM GPFS for the file system tier of your enterprise ECM solution. Choosing either the SAN or NSD model, GPFS has the flexibility to meet any IT organization's needs. Following the best practices outlined in this document will help ensure both models provide the performance and reliability expected from a superior file system technology.
IBM FileNet Content Manager and IBM GPFS Page 8 References 1. IBM FileNet P8 Software: http://www.ibm.com/software/ecm/filenet 2. IBM GPFS Knowledge Center: http://www.ibm.com/support/knowledgecenter/ssfkcn/gpfs_welcome.html
IBM FileNet Content Manager and IBM GPFS Page 9 Author Information Michael Bordash, ECM Server System Test Engineer Contributors Matthew Vest, ECM Server System Test & Performance Engineering Senior Manager Dave Royer, ECM Performance Architect, Senior Software Engineer Special thanks to the following members of the ECM CE development team: Tim Morgan Disclaimer The information in this publication is not intended as a substitution of the IBM FileNet product documentation provided by IBM. Please see www.ibm.com/software/ecm/filenet for more information about what publications are considered to be product documentation. References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program or service. Information in this publication was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels. The information contained in this publication was derived under specific operating and environmental conditions. While IBM has reviewed the information for accuracy under the given conditions, the results obtained in your operating environments may vary significantly. Accordingly, IBM does not provide any representations, assurances, guarantees, or warranties regarding performance. Any information about non-ibm ("vendor") products in this document has been supplied by the vendor and IBM assumes no responsibility for its accuracy or completeness. IBM, IBM FileNet Content Manager, DB2, WebSphere, AIX, Rational, and Tivoli are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both. Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. UNIX is a trademark of The Open Group. Windows is a registered trademark of Microsoft Corporation in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others. Copyright IBM Corporation 2012 Produced in the United States of America All Rights Reserved The e-business logo, the eserver logo, IBM, the IBM logo, IBM Directory Server, DB2, FileNet, FileNet Content Manager and WebSphere are trademarks of International Business Machines Corporation in the United States, other countries or both. The following are trademarks of other companies: Solaris, Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both. Windows and Windows 2008 Enterprise Edition are trademarks of Microsoft Corporation in the United States and/or other countries Oracle 9i and all Oracle-based trademarks and logos are trademarks of the Oracle Corporation in the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. Information in this paper as to the availability of products was believed accurate as of the time of publication. IBM cannot guarantee that identified products will continue to be made available by their suppliers. This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice. Any references in this document to non-ibm web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents.
Copyright IBM Corporation 2014 IBM 3565 Harbor Boulevard Costa Mesa, CA 92626-1420 USA Printed in the USA 01-07 All Rights Reserved. IBM and the IBM logo are trademarks of IBM Corporation in the United States, other countries, or both. All other company or product names are registered trademarks or trademarks of their respective companies. The IBM home page on the Internet can be found at ibm.com