Axway API Management 7.5.x Cassandra Best practices. #axway

Size: px

Start display at page:

Download "Axway API Management 7.5.x Cassandra Best practices. #axway"

Adrian Fleming
5 years ago
Views:

1 Axway API Management 7.5.x Cassandra Best practices #axway

2 Axway API Management 7.5.x Cassandra Best practices Agenda Apache Cassandra - Overview Apache Cassandra - Focus on consistency level Apache Cassandra - Additional definitions Cassandra configuration for API Mgt - Overview Cassandra configuration for API Mgt - Single node deployment Cassandra configuration for API Mgt - 3 Node Cluster Deployment (Single Datacenter) Apache Cassandra - Reference Apache Cassandra - Tools Apache Cassandra - To go further in Cassandra understanding 1

3 Apache Cassandra Overview 2

4 Cassandra overview What is Cassandra? Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database), for managing very large amounts of structured data spread out across the world. It provides highly available service with no single point of failure. Cassandra is classified as a NoSQL database. Primary objective of a NoSQL database is to have: simplicity of design horizontal scaling finer control over availability 3

5 Cassandra overview Architecture Cluster with 4 Cassandra nodes Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster. All the nodes in a cluster play the same role. Each node is independent and at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write requests can be served from other nodes in the network. Node 4 Node 1 Node 3 Node 2 4

6 Cassandra overview Write process We consider the following data to illustrate the Cassandra features: Jim Age:36 Car:camaro Gender:M Carol Age:37 Car:subaru Gender:F Johnny Age:12 Gender:M Suzy Age:10 Gender:F Primary key Let s see how Cassandra stores those data: Partitioners : it determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a function for deriving a token representing a row from its partion key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token. Jim Carol Johnny Suzy 5e A9a F4eb27cea7 78b421309e Several functions are available to distribute data. In our example, for each primary key, the MD5 hash function is used (128 bit). 5

7 Cassandra overview Write process A range of hash is associated to each node: Node Start End Node 1 Node 2 Node 3 Node 4 0xc x x x x x x xc Broken into ranges (In our example we have 4 nodes so each node = a quarter of it) Data are written in the node corresponding to the related range: Jim Carol Johnny Suzy 5e A9a F4eb27cea7 78b421309e Node 3 Node 4 Node 1 Node 3 6

Cassandra overview Replication factor - Definition The replication factor is the total number of replicas across the cluster. You define the replication factor for each data center.

A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node.

8 Cassandra overview Replication factor - Definition The replication factor is the total number of replicas across the cluster. You define the replication factor for each data center. Generally you should set the replication strategy greater than one, but no more than the number of nodes in the cluster. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. If we go back to our previous example : With a replication factor set to 3, there are 3 replicas for each data : Jim Suzy Carol Node 4 Node 1 Replication made clockwise Jim Suzy Carol Johnny Node 2 Carol Johnny Jim Node 3 Suzy Johnny 7

9 Cassandra overview Reading Data Reading data is performed in parallel across a cluster. A user requests data from any node (which becomes that user s coordinator node ), with the user s query being assembled from one or more nodes holding the necessary data. Read - Process The client is aware of every single node. It can ask every single node (Every node can received a read question). In this example, the node 4 do not have the right data. The node 4 knows all nodes of the cluster. It will play the role of a coordinator. 5µs ack Node 1 Primary If a particular node having the required data is down, Cassandra simply requests data from another node holding a replicated copy of that data. Client Read 12µs ack Node 4 500µs ack Node 3 + Copy of 1 12µs ack Node 2 + Copy of 1 8

10 Cassandra overview Read - Process If node 4 is crashed, the client will ask the node 1 Node 1 Primary Client Read Node 4 Node 2 + Copy of 1 Node 3 + Copy of 1 9

11 Apache Cassandra Focus on consistency level 10

12 Consistency Definition Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For any given read or write operation, the client application decides how consistent the requested data must be. Even at low consistency levels, Cassandra writes to all replicas of the partition key, even replicas in other data centers. The consistency level determines only the number of replicas that need to acknowledge the write success to the client application. Typically, a client specifies a consistency level that is less than the replication factor specified by the keyspace. This practice ensures that the coordinating server node reports the write successful even if some replicas are down or otherwise not responsive to the write. Resource: 11

13 Consistency (Write) Definition The coordinator sends a write request to all replicas that own the row being written. As long as all replica nodes are up and available, they will get the write regardless of the consistency level specified by the client. The write consistency level determines how many replica nodes must respond with a success acknowledgment in order for the write to be considered successful. Success means that the data was written to the commit log and the memtable as described in About writes. Level definition One : A write must be written to the commit log and memtable of at least one replica node. Two : A write must be written to the commit log and memtable of at least two replica node. Quorum* : A write must be written to the commit log and memtable on a quorum of replica nodes across all data centers. Local_Quorum* : Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in the same data center as the coordinator node. Avoids latency of interdata center communication. All : A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition. Note* : Q=QUORUM (Q = N / 2 + 1) with N=Replication factor 12

14 Consistency (Write) Example (1/3) Example (2/3) If we go back to our previous example (4 nodes / replication factor=3). The incoming write (1) will go to all 3 nodes (2) that own the requested row : Client Write Johnny 1 Write Node 4 Node 1 +Johnny Write 2 Client Write Node 3 +Johnny Write Node 2 +Johnny If the write consistency level specified by the client is ONE (1), the first node to complete the write responds back to the coordinator (3), which then proxies the success message back to the client (4). A consistency level of ONE means that it is possible that 2 of the 3 replicas could miss the write if they happened to be down at the time the request was made. If a replica misses a write, Cassandra will make the row consistent later using one of its built-in repair mechanisms: hinted handoff, read repair, or anti-entropy node repair. Write Johnny Write consistency level = One 1 Write 12µs ack 4 3 5µs ack Node 4 Write 2 Node 1 +Johnny Write Write Node 2 Node 3 13

15 Consistency (Write) Example (3/3) If the write consistency level specified by the client is ALL (1), a write must be written to the commit log and memtable on all replica nodes in the cluster for that partition (2). The coordinator node must wait the acknowledgement (3) of all replicas (specified by the consistency level. 3 in our example) before acknowledging the client (4): Client Write Johnny Write consistency level = All 1 Write 12µs ack 4 3 5µs ack Node 4 Write 2 Node 1 +Johnny Write 12µs ack Write 3 Node 2 +Johnny 500µs ack 3 Node 3 +Johnny 14

16 Consistency (Read) Definition There are three types of read requests that a coordinator node can send to a replica: A direct read request A digest request A background read repair request The coordinator node contacts one replica node with a direct read request. Then the coordinator sends a digest request to a number of replicas determined by the consistency level specified by the client. The digest request checks the data in the replica node to make sure it is up to date. Then the coordinator sends a digest request to all remaining replicas. If any replica nodes have out of date data, a background read repair request is sent. Read repair requests ensure that the requested row is made consistent on all replicas. For a digest request the coordinator first contacts the replicas specified by the consistency level. The coordinator sends these requests to the replicas that are currently responding the fastest. The nodes contacted respond with a digest of the requested data; if multiple nodes are contacted, the rows from each replica are compared in memory to see if they are consistent. If they are not, then the replica that has the most recent data (based on the timestamp) is used by the coordinator to forward the result back to the client. To ensure that all replicas have the most recent version of frequently-read data, the coordinator also contacts and compares the data from all the remaining replicas that own the row in the background. If the replicas are inconsistent, the coordinator issues writes to the out-of-date replicas to update the row to the most recent values. This process is known as read repair. Read repair can be configured per table for non-quorum consistency levels (using read_repair_chance), and is enabled by default 15

17 Consistency (Read) Level definition One : Returns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to make the other replicas consistent. Two : Returns the most recent data from two of the closest replicas. Quorum* : Returns the record after a quorum of replicas from all data centers has responded. Local_Quorum* : Returns the record after a quorum of replicas in the current data center as the coordinator node has reported. Avoids latency of inter-data center communication. All : Returns the record after all replicas have responded. The read operation will fail if a replica does not respond. Note* : Q=QUORUM (Q = N / 2 + 1) with N=Replication factor 16

18 Consistency (Read) Example (1/2) In a single data center cluster with a replication factor of 3, and a read consistency level of ONE (1), the closest replica for the given row is contacted to fulfill the read request (2). In the background a read repair is potentially initiated : Client Read Johnny consistency level = One Node 1 Johnny 1 Read Read 2 12µs ack 5µs ack Node 4 Node 3 Johnny Node 2 Johnny Example (2/2) In a single data center cluster with a replication factor of 3, and a read consistency level of QUORUM (1), 2 of the 3 replicas for the given row must respond to fulfill the read request (2&3). If the contacted replicas have different versions of the row, the replica with the most recent version will return the requested data. In the background, the third replica is checked for consistency with the first two, and if needed, a read repair is initiated for the out-of-date replicas: Client Read Johnny Read consistency level = QUORUM 1 Read 12µs ack 4 3 5µs ack Node 4 2 Node 1 +Johnny Read Read 12µs ack Read 3 Node 2 +Johnny 500µs ack Node 3 +Johnny 17

19 Consistency Summary Using a replication factor of 3, a quorum is 2 nodes. The cluster can tolerate 1 replica down. Node 1 Johnny Node 2 Johnny Node 3 Johnny Node 1 Node 2 Node 3 Node 4 Node N Node 4 Johnny Johnny Johnny Node N 2 nodes can still provide the data The Quorum can be achieved Only one node can still provide the data. The Quorum can never be achieved Using a replication factor of 6, a quorum is 4. The cluster can tolerate 2 replicas down. Node 1 Johnny Node 2 Johnny Node 3 Johnny Node 4 Johnny Node 5 Johnny Node 6 Johnny Node 7 Node N Node 1 Johnny Node 2 Johnny Node 3 Johnny Node 4 Johnny Node 5 Johnny Node 6 Johnny Node 7 Node N 4 nodes can still provide the data The Quorum can be achieved Only 3 nodes can still provide the data The Quorum can never be achieved 18

20 Apache Cassandra Additional definitions 19

Seed Nodes Definition The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster.

21 Seed Nodes Definition The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Cassandra nodes use this list of hosts to find each other and learn the topology of the ring. To prevent problems in gossip communications, use the same list of seed nodes for all nodes in a cluster. More than a single seed node per data center is recommended for fault tolerance Example: cassandra.yaml 20

22 Keyspace Definition A cluster is a container for keyspaces typically a single keyspace. A keyspace is the outermost container for data in Cassandra, corresponding closely to a relational database. In the same way that a database is a container for tables, a keyspace is a container for a list of one or more column families. A column family is roughly analogous to a table in the relational model, and is a container for a collection of rows. Each row contains ordered columns. Column families represent the structure of your data. Each keyspace has at least one and often many column families. Like a relational database, a keyspace has a name and a set of attributes that define keyspace-wide behavior : Replication factor Replica placement strategy Column families 21

23 Cassandra configuration for API Mgt Overview 22

24 Cassandra usage for API Management API Gateway KPS Custom KPS tables * API Gateway OAuth Token Store * API Manager (Client Registry, API Catalog, quotas) API Gateway Client Registry (Client Registry for API Keys and OAuth solution, when API Gateway only is used) (*) Cassandra is optional for those, other data store options are available 23

25 Cassandra configuration for API Management - overview Regarding the previous chapters, elements that must be configured are : Element Configuration Configuration location / tool Each Cassandra node - Node configuration - IP, port - Node configuration for client (API GW/Manager) connection - rpc address and port - Node configuration related to cluster - Seed node (for the node to be aware of all other nodes in the cluster) - Listen address and port used for internode communication (used for replication) Keyspace - Replication factor - Replica placement strategy Cassandra client (API Gateway/Manager) Read and write consistency level Cassandra.yaml Policy studio Policy studio 24

26 Best Practices (1/2) Always build your Cassandra deployment pattern first Configure JAVA_HOME referencing an independent ORACLE JRE 1.8 installation This completely separates the interdependency between APIGateway and Cassandra Enable Authentication and SSL between Cassandra Hector client and Cassandra server (multi-node cluster) Enable SSL communication between Cassandra servers (multi-node cluster) For optimal write performance, place the commit log be on a separate disk partition. Advice per datastax 25

27 Best Practices (2/2) Configure Cassandra H/A before installing the product Supported configuration is only strong consistency with minimum: 3 nodes, Quorum for read and write level of consistency, replication factor : 3 Synchronize time on all servers Don't add a node to the cluster if the seed node is not started Start Cassandra Node 1 (seed) first, after it has booted, start Cassandra Node 2, and, after this has booted, start Cassandra Node 3 26

28 Cassandra configuration for API Mgt Single node deployment 27

29 Single Node A single node deployment is the simplest Suitable for development environment ONLY The use of LOCALHOST within the cassandra.yaml and Cassandra- Host (see Policy Studio server settings) is only suitable for a single node deployment. NOTE: IPV6 may need to be disabled API Gateway with (or without API Manager) Cassandra Node 1 28

30 Single node configuration Cassandra.yaml seed_provider: # Addresses of hosts that are deemed contact points. # Cassandra nodes use this list of hosts to find each other and learn # the topology of the ring. You must change this if you are running # multiple nodes! - class_name: org.apache.cassandra.locator.simpleseedprovider parameters: # seeds is actually a comma-delimited list of addresses. # Ex: "<ip1>,<ip2>,<ip3>" - seeds: " " listen_address Interface used for data connection (PORT 7000 or 7001) listen_address: # listen_interface: eth0 Note: Choose listen address or listen interface not both rpc_address Interface used for client connection (port 9160 and 9042) rpc_address: # rpc_interface: eth1 Note: Choose listen address or listen interface not both Keyspace and client configuration Register Host Designate Admin Node Manager Configure API Gateway Instance Configure Hector Client via Policy Studio Once configuration is deployed Cassandra keyspace will be created Install APIMGR Server Settings / Cassandra / Hosts 29

31 Cassandra configuration for API Mgt 3 Node Cluster Deployment (Single Datacenter) 30

32 3 Node Cluster DC 1: Cassandra DB Node 1: Cassandra DB Node 2: Cassandra DB Node 3: API Gateway with (or without API Manager) API Gateway with (or without API Manager) Cassandra Node 1 Cassandra Node 1 Cassandra Node 2 31

33 Each node configuration Cassandra.yaml seed_provider: # Addresses of hosts that are deemed contact points. # Cassandra nodes use this list of hosts to find each other and learn # the topology of the ring. You must change this if you are running # multiple nodes! - class_name: org.apache.cassandra.locator.simpleseedprovider parameters: # seeds is actually a comma-delimited list of addresses. # Ex: "<ip1>,<ip2>,<ip3>" - seeds: (all Cassandra instances should reference the same seed) listen_address Interface used for data connection (PORT 7000 or 7001) listen_address: (change address to correspond with Cassandra instance) # listen_interface: eth0 Note: Choose listen address or listen interface not both rpc_address Interface used for client connection (port 9160 and 9042) rpc_address: (change address to correspond with Cassandra instance) # rpc_interface: eth1 Note: Choose listen address or listen interface not both 32

Keyspace and client configuration For the first API Gateway/Manager Register Host Configure 1 APIGateway instance Configure Hector Client via Policy Studio Once configuration is deployed Cassandra

34 Keyspace and client configuration For the first API Gateway/Manager Register Host Configure 1 APIGateway instance Configure Hector Client via Policy Studio Once configuration is deployed Cassandra keyspace will be created Install APIMGR on first gateway Update Read/Write Consistency Level to QUORUM for KPS collections via Policy Studio Register remaining hosts and configure APIGateway Instance --- AFTER replication factor is updated Update replication factor Login to the Cassandra DB Node 1 # cd../cassandra/bin #./cqlsh <IP Address>./cqlsh # Find keyspace > DESCRIBE KEYSPACES; Example: x8746e4a4_e423_40ac_95a7_ e4e5d_group_2 # Execute the following command to alter table > ALTER KEYSPACE x8746e4a4_e423_40ac_95a7_ e4e5d_group_2 WITH REPLICATION = {'class' : SimpleStrategy', 'replication_factor : 3 }; # Exit cqlsh utility and run the following command on all cassandra instances: Run nodetool repair x8746e4a4_e423_40ac_95a7_ e4e5d_group_2 on all cassandra instances. 33

35 Keyspace and client configuration For the other API Gateway/Manager Register remaining hosts and configure API Gateway Instance 34

36 Apache Cassandra Reference 35

37 Reference Reference- /tmp --noexec /tmp noexec (Cassandra Only) If /tmp noexec is configured an error will be generated when cassandra is started Solution: Create tmp directory in /opt/cassandra Edit the cassandra-env.sh file which would be under /opt/axway/cassandra/conf folder and add the following lines: # vi cassandra-env.sh JVM_OPTS="$JVM_OPTS - Djava.io.tmpdir=$CASSANDRA_HOME/tmp Solution 2: This works for both APIGateway and Cassandra sudo mount -o remount,exec /tmp Reference: JAVA_HOME JAVA_HOME # tar xvzf jdk-8u101-linux-x64.tar.gz -C /opt/jdk/ Make sure the JAVA_HOME variable is available for all the users, by adding the following entries in /etc/profile file. # sudo vi /etc/profile JAVA_HOME=/opt/jdk/jdk1.8.0_101 Export PATH=$PATH:$JAVA_HOME/bin 36

38 Apache Cassandra Tools 37

39 Tools 38

40 Tools DBeaver - Linux 39

41 Apache Cassandra To go further in Cassandra understanding 40

42 Cassandra: Components Write process - Additional definitions Commit log: The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. Mem-table: A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. SSTable: It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. 41

43 Cassandra: Components Write process (1/3) Client 1 The client send the request to the node Update users Set firstname = Patrick Where id= pmcfadin 1 2 It is written into the commit log (written on the server disk). It is very fast. Cassandra server File system Write Rowkey, Column (id= pmcfadin,firstname = Patrick 2 id= pmcfadin, firstname = Patrick Commit log Data directory Resource: 42

44 Cassandra: Components Write process (2/3) Client 3 Then the data is put on a memtable stored in memory Update users Set firstname = Patrick Where id= pmcfadin 4 4 Acknowledgement to the client Cassandra server File system 3 Memtable Table = users id= pmcfadin firstname = Patrick Lastname name = McFadin id= pmcfadin, firstname = Patrick Commit log Data directory Resource: 43

45 Cassandra: Components Write process (3/3) Client Update users Set firstname = Patrick Where id= pmcfadin 5 The flush process writes out data into a file called SStable. It is flushed to disk. It is not about random IO but sequential IO (sequential write). It is ordered by time. Cassandra server File system Memtable Table = users id= pmcfadin firstname = Patrick Lastname name = McFadin 5 id= pmcfadin firstname = Patrick Lastname name = McFadin Commit log Data directory Resource: 44

46 Replication Strategies Definition A replication strategy determines the nodes where replicas are placed. Two replication strategies are available: SimpleStrategy : Use only for a single data center. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or data center location). NetworkTopologyStategy Use when you have (or plan to have) your cluster deployed across multiple data centers. This strategy specify how many replicas you want in each data center. Strategy is configured per KEYSPACE 45

47 Snitch Definition A snitch determines which data centers and racks nodes belong to. They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into data centers and racks. Specifically, the replication strategy places the replicas based on the information provided by the new snitch. All nodes must return to the same rack and data center. Cassandra does its best not to have more than one replica on the same rack (which is not necessarily a physical location). Example SimpleSnitch The SimpleSnitch is used only for single-data center deployments. The SimpleSnitch (default) is used only for single-data center deployments. It does not recognize data center or rack information and can be used only for single-data center deployments or single-zone in public clouds. It treats strategy order as proximity, which can improve cache locality when disabling read repair. Using a SimpleSnitch, you define the keyspace to use SimpleStrategy and specify a replication factor. GossipingPropertyFileSnitch Automatically updates all nodes using gossip when adding new nodes and is recommended for production. This snitch is recommended for production. It uses rack and data center information for the local node defined in the cassandra-rackdc.properties file and propagates this information to other nodes via gossip. Referenced in cassandra.yaml endpoint_snitch:simplesnitch 46

48 Internode communications (gossip) Definition Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a version associated with it, so that during a gossip exchange, older information is overwritten with the most current state for a particular node. 47

49 Thank you! 48

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 2 Archana Nagarajan, Krishna Ramesh, Raghav Ravishankar, Satish Parasaram Drawbacks of RDBMS Replication Lag Master Slave Vertical Scaling. ACID doesn