Index. Sam R. Alapati 2018 S. R. Alapati, Expert Apache Cassandra Administration,

Size: px

Start display at page:

Download "Index. Sam R. Alapati 2018 S. R. Alapati, Expert Apache Cassandra Administration,"

Lenard Ellis
5 years ago
Views:

1 A Accrual detection mechanism, 157 Administration tools cassandra-stress tool, 24 cassandra utility, 24 nodetool utility, 24 SSTable utilities, 24 Akka, 280 Allocation algorithm, ALTER KEYSPACE privilege, 433 Amazon EC2, 157 AMIs, 90 AWS, 91 configuring Cassandra cluster, create instance, 91 add storage, 92 AMI, 91 choose type, 91 configure details, 92 configure security group, 92 key pair, 93 review launch, 92 tag, 92 install Cassandra, Amazon Machine Image (AMI), 91 Amazon s DynamoDB, 17 Amazon Web Services (AWS), 91 Anti-entropy repair definition, 166 Merkel trees, 166 nodetool repair command, 167 ANY consistency level, 134 Apache Cassandra, 20 bloom filters, 21 build from source, 39 cassandra.yaml file, 42 check status, clearing cassandra data, 62 cluster, 27 compaction, 21 compression, 20 definition, 3 distributed data, 22 drawbacks immutable tables and mutation, 19 lack of support for transactions, 19 no indexes, 18 no joins, 18 querying data, 18 eventual consistency, 18 flags, highly scalable, 6 high performing database, 5 large environments, 21 node, 27 prerequisites Java, Python, Sam R. Alapati 2018 S. R. Alapati, Expert Apache Cassandra Administration, 455

2 Apache Cassandra (cont.) service command, stopping, table rows, 26 user and group, 36 verify version, 63 write-heavy workloads, 22 writes, 26 Apache configuration Cassandra-specific plugins, installing and configuring NRPE, 367 Nagios server configuration, 368 Apache Kafka, 280 Apache Mesos, 280 Apache Spark, 109, 271 Cassandra database, configuration, 273 install, prerequisites, pyspark command, RDD, 275 spark-cassandra-connector, 275 spark-shell command, start up cluster, Apache Sqoop, 300 Atomic operation, 143 Atomicity, Consistency, Isolation, and Durability (ACID) atomicity, 129 batch method, 130 durability, 130 isolation, 130 periodic, 130 requirements, 14 Authentication, auto_bootstrap property, 83 B Backing up data automatic snapshots, 287 incremental, 287 snapshot, SSTables, 283, 285 Basically available eventually consistent (BASE) system, 16 Batchlog, 143 Batch method, 130 Batch operations, single and multiple partition configuration, 144 INSERT statements, 145 single and multiple partitions, Behavior-driven development (BDD), 259 Coshxlabs code, 263 Gherkin syntax, 262 installing Cucumber, 261 installing Docker, 260 installing Docker-compose, 261 run with Cucumber, Big data, 7, 12 Bin directory, 53 Binary tarball, Bloom filters, 21, 30, 124, 379 B-tree, 117, 128 Bulk data COPY command cassandra table, import and export data, 296 sstableloader import, 300 load external data, 299 Bundler,

3 C Cache directory, 45 Cardinality, 230 Cassandra, node, 150 CassandraAuthorizer, 431 Cassandra Cluster Manager (CCM) Apache Spark (see Apache Spark) BDD Coshxlabs code, 263 Gherkin syntax, 262 installing Cucumber, 261 installing Docker, 260 installing Docker-compose, 261 run with Cucumber, Cassandra cluster, 268 definition, 266 install, 267 SSTable, 270 status, 269 using cqlsh, 270 Cassandra data modeling avoid querying across partitions, 23 avoid updates and deletes, 23 duplicating data, 23 even data distribution, 23 vs. RDBMS, 31 Cassandra query language (CQL), 7 altering a table, 213 bin directory, 53 capture command, 48 cluster topology, 199 collection data types frozen values, 243 list type, 242 map type, 243 multiple/ addresses, 240 set type, collection set, list/map, column_definition, 202 command line options, 48 composite partition key, compound keys and clustering columns, conditional statement, connect, 65 copy command, 49 cqlshrc config file, create keyspace, 66 create table, 66 counter column to track values, 212 deleting rows and columns, table, 216 describe command, dropping a table, 214 expand command, find versions, 64 functions, aggregates and user types, 200 garbage collection, 219 getting help, indexing, Cassandra database dropping an index, 225 high and low cardinality columns, 223 partitions, 223 primary index, 221 secondary index, 221, 224 updated/deleted columns, 223 usage, 222 INSERT statement, insert test data, 67 keyspace altering, 194 creation, durable writes property,

4 Cassandra query language (CQL) (cont.) management, 190 multiple datacenters, 193 nodetool repair command, 195 qualifier, 197 relational database system, 189 removing, repairing, 197 replicas, data centers, 196 replication factor, materialized view creation, denormalization, 226 dropping, 226 updation, 227 options, 64 partition key, primary key, , query quarters table, 67 secondary index creation, 230 drawbacks, SASI, 231 SELECT statement built-in functions, collection column, 236 filtering, GROUP BY clause, 232 IN keyword, JSON format, 236 LIMIT N option, ORDER BY clause, selection clause, 228 syntax, 227 WHERE clause, start, 46 static columns, structures, 189 table creation, table options clustering order, 211 compact tables, 211 tombstones, table configuration, TIMESTAMP, 217 tombstones, deletion marker, 217 truncating a table, 215 TTL value, tuples, 243 UDFs, UPDATE statement, user-defined types, 244 zombie records and node repair, time zones, 46 tracing command, 52 Cassandra s access control matrix, 433 Cassandra-stress tool command, replication, compaction and compression options, running mixed workload, 393 multiple nodes, 395 read test, 393 write test, YAML-based profile, 395, 397 cassandra.yaml file, 43 Change data capture (CDC) logging, 239 Clearing cassandra data, 62 Clear the screen (CLS), 64 Cloud applications, 3 Cluster, 27, 150 Linux server (see Linux server) Cluster deployment Cassandra-stress, 69 choose CPUs,

5 choose storage, 70 compaction, 72 disk storage, 71 install PDSH, network considerations, 70 NFS, SAN and NAS, 71 RAM, 70 storage requirements, 72 usable disk capacity, 72 Cluster health check JConsole, JMX clients, 346 nodetool info command, 342, 343 nodetool status command, 342 thread pools statistics, Clustering key/column, 202 Cluster maintenance tasks flushing and draining data, handling data corruption, rebuilding indexes, 307 repairing data (nodetool repair), system.size_estimates table, 307 cluster_name parameter, 43 Column family databases, 11 Column-oriented database, 14 Commit log, 27, 45, 151 Commodity servers, 6 Compactions, 21, 30, 72, 125 Compaction strategies enabling and disabling, 404 global compaction parameters, configuration compaction_throughput mb_ per_sec parameter, 406 concurrent_compactors Property, 405 snapshot_before_compaction property, 405 sstable_preemptive_open_ interval_in_mb Property, 406 LCS, logging, 408 setting, 407 STCS, testing, TWCS, Compare and set (CAS), 146 Compound primary key, 209 Conceptual modeling, 103 Concurrent-Mark Sweep (CMS), 75 conf directory, 54 Consistency, availability, and partition tolerance (CAP) theorem, 131 BASE system, 16 consistency, 17 principles availability, 15 consistency, 15 partition tolerance, 15 Container, 250 Coordinators, 28, 122 Counter cache, 119, Counter data type, 212 Cucumber, , D Data caches, 20 Datacenter, 5, 27, 150 Datacenter-related maintenance tasks, Data corruption checking, 331 fixing, Data file directory,

6 Data manipulation language (DML), 189, 227 Data modeling cluster nodes, 106 data-driven vs. query-driven data, 100 ease of use, 102 partition, 106 physical, 105 Pro Cycling statistics, 103 queries, 107 read limitations, 109 reliability, 102 scalability, 102 sort order, 101 structured process, 102 write limitations, 108 DataStax, 13 courses, 4 development tools, 33 DSE, DataStax Enterprise (DSE), 13, DataStax, Inc., 32 Debian packages, 39, 41 Decentralized database, 13 Decommissioning datacenter, Disk storage, Distributed database system, 10, 16, 26 Docker broadcast_address, 258 Cassandra cluster, command line utility, 252 container, 250 dc property, 258 endpoint_snitch, 258 environment variables for, host server, 259 images, listen_address, 258 num_tokens, 258 rack property, 258 run cqlsh, 257 systemctl status command, 251 Ubuntu server installation, using volumes, 258 Document databases, 10 Dynamic ring participation, 181 E Endpoint range vs. subrange repair, 169 Eventual consistency, 14, 18 anti-entropy, 111 consistency levels, 111 reconciliation, 111 repairing data, requirements, 110 F Facebook, 4 Failure detection mechanism, 157 Fault tolerance, 4 Firewall, 45 port access, 79 ports configuration, 439 Flexible data model, 5 Full vs. incremental repair, 168 G Garbage collection, 219 Gossip management, 29, 323, 325 Gossip protocols, 126, 149 accrual detection mechanism, 157 cluster_name,

7 failure detection mechanism, 157 listen_address, 156 process, seed nodes and, 156 seed_provider, 156 storage_port, 156 Graph databases, node, 12 H Handling consistency, Handoff process, Hash ring, 31 Hinted handoff, 29, 31 consists of, 159 for datacenter, 160, 161 definition, 158 enable cluster, 158 max_hint_window_in_ms attribute, 162 sethintedhandoffthrottlekb property, 162 statushandoff command, 160 stores in directory, 160 truncatehints command, 161 write_request_timeout_in_ms property, 162 I Incremental repairs, 167, 170, 172 files, 124 J Javadoc directory, 55 Java garbage collection (GC), 75, Java heap size, 75 Java hugepages setting, 77 Java Management Extensions (JMX), 25, 346 Java Virtual Machine (JVM), 25 JConsole, 347 connection login page, 348 jmxsh, 350 Overview tab, 349 JMX authentication and authorization, K Kafka, Apache, 280 Key cache, 119, 381 Key nodetool maintenance commands decommissioning nodes cassandra.override_ decommission=true Option, 312 command, data remove and restart, nodetool assassinate, 312, 314 Keyspaces, 28, 151 Key-value databases, 10 L LeveledCompactionStrategy (LCS), lib directory, 54 Lightweight transactions, 108 cautions, 147 insert cyclist with ID number, set of operations, 147 Linearizable consistency, 146 Linux server disable swap, 76 disable zone_reclaim_mode, 73 Java heap size, 75 Java hugepages setting, 77 Java version,

8 Linux server (cont.) and Kernel settings, 73 NUMA systems, 73 PAM security settings, 75 setting shell limits, 76 synchronize clock and enable NTP, 73 TCP settings, 74 user resource limits, 74 listen_address property, 44 listen_address parameter, 82 listen_interface parameter, 44 Logback, 353 Logging configuration, 352 locations setting, 352 logback configure, 353 logback logging framework appender, benefits, 353 layout, 355 logback.xml file, logger class, 354 setting up log rotation, 359 nodetool setlogginglevel command, Logical modeling, 105 Log-structured merge-tree (LSM tree), 128 Log-structured storage engine, 13 M Medium data, 7 Memtable, 27, 114 definition, 151 threshold, 416 memtable_cleanup_threshold parameter, 416 memtable_flush_writers parameter, 417 Merkle trees, 166 Mesos, Apache, 280 Minimal configuration properties, 43 MongoDB, 12, 17 Monitoring Cassandra LAMP stack installation, Nagios configuration file, 366 installation, plugins installation, 364 NRPE installation, Multi-node Casandra cluster auto_bootstrap property, 83 broadcast_rpc_address property, 82 change node IP address, 87 client ports, 79 configuration, 81 datacenter configuration, endpoint_snitch option, firewall port access configuration, 79 initialize cluster with multiple datacenters, 84 inter-node ports, 79 IP addresses, 80 keyspaces, 89 listen_address parameter, 82 node is down, num_tokens property, 81 ports, 79 rack names, rpc_address property, 82 seed nodes, 80 seeds property, 82 select name for datacenter, startup process, stopping, 85 version mismatch, Murmur3 partitioning strategy,

9 N Nagios build dependencies, 363 Cassandra cluster hosts, monitoring, 367 configuration, 366 installation, NRPE, 361 plugins, 361, 364 user and group, 363 Nagios Remote Plugin Executor (NRPE), 361 Network-attached storage (NAS), 71 Network information, 340 Network interface cards (NICs), 71 Network Time Protocol (NTP), 73 NetworkTopologyStrategy, 28 Node management adding, data center, cluster joining, dead node replacement, 319 decommission datacenter, moving, 320 removing node, 318 running node replacement, Node repair, definition, 158 hinted handoff (see Hinted handoff) Node restart method, 293, 295 Nodetool drain command, 326 nodetool proxyhistograms command, 337, 338 nodetool tablehistograms command, nodetool upgradetsstables command, 414 Nodetool utility, 24, 53 nodetool info command, nodetool status command, 58 Normalization theory, 100 NoSQL databases column family, 11 document, 10 graph, key-value, 10 num_tokens property, 81 O ONE consistency level, 133 Open source database, 4 OpsCenter, 33 Optimal storage, 70 Optimistic replication, 109 Oracle JDK, 6, P PAM security settings, 75 Parallel distributed shell (PDSH), Parallel repair, 168 Partitioner function, 29 Partitioner range repair, 168 Partitioners Murmur3Partioner, 183 and partitioning strategies, 182 RandomPartitioner, 182 Partition key cache, 20 Paxos protocol, 19, Peer-to-peer architecture, 9 Peer-to-peer system, 149 Performance, Cassandra compression data ALTER TABLE statement, 414 configuration, efficacy testing, 415 modifying, compression algorithm, 414 turning off,

10 Performance, Cassandra (cont.) data caching Cassandra stores, 382 configuration, counter cache, global caching parameters, monitoring, tracing database operations, types, 381 JVM and garbage collection strategies, 422 stress testing cassandra (see Cassandra-stress tool) tracing to analyze performance managing tracing, 374 read request, write request, tuning bloom filters, 379 Phi Accrual Failure Detection, 155 Physical data modeling, 105 Probabilistic tracing strategy, Python, Q Query-driven data modeling, 100 Querying data, see Cassandra query language (CQL), SELECT statement Quorum calculate, 135 datacenter cluster, 135 EACH_QUORUM, 132 LOCAL_QUORUM, 133, 138 read consistency levels, 138 replication factor, write consistency levels, 133 Quorum reads and writes, 110 R Rack, 150 Random selection algorithm, 178 Rapid read protection, 141 consistency levels, 164 speculative_retry property, 165 supports, 165 Read consistency levels ALL, 137 cluster with two datacenters, 142 requests, 137 single datacenter, 142 Reading data coordinator, 127 filter command, 127 gossip protocol, 126 request, 127 write data affects, 128 Read repair definition, 162 read_repair_chance property, 163 Read requests direct, 140 repair request, 140 replica node, 140 Referential integrity, 101 Relational database management system (RDBMS), 7, 31 Relational databases data locality, 9 lower cost, 9 no failover, 9 peer-to-peer architecture, 9 RDMSs and big data, 7 reliability, 9 sharding, 8 third normal form, 8 464

11 Repairing data, Replica placement strategy, 28, 152 Replication strategy definition, 179 group, 180 NetworkTopologyStrategy, 180 SimpleStrategy, 180 switch keyspace, 182 Resilient Distributed Dataset (RDDs), 275 Restoring data commit log manual archive, 295 point-in-time recovery, 295 restore, 295 cycling keyspace, 289, 291, 293 node restart method, 293, 295 run repair, 293 set location, 296 set timestamp, 296 from snapshot, 288 using sstableloader, 293 Role-backed access control cycling_admin, 436 granting permissions, 435 login accounts, object permissions, 436 permissions command, 437, 438 view, permissions granted, 438 Row caching, 20, 119, 381 S SAN, 71 Secondary indexes, 18, 109 Security configuring authentication, firewall ports configuration, 439 JMX authentication and authorization, 451, 453 roles creation administrator privileges, 424 AllowAuthenticator, 425 assigning permissions, login accounts, configure authentication, logging, 424 password changing, 429 properties, 427 superuser account, 430 SSL encryption (see SSL encryption) Seed nodes, 80, 156 Secondary indexes, 105 Sequential vs. parallel repair, 168 Serial consistency settings, 139 Sharding, 8 Simple Build Tool (SBT), 272 Single-token architecture, 174 SizeTieredCompactionStrategy, SMACK stack, Snapshot before compacting data, 287 copy data, 289, 291, 293 list node, restoring data, 288 run repair, 293 using sstableloader, 293 schema, Snitches, 30 cassandra-rackdc.properties, cassandra-topology.properties, 187 CloudstackSnitch, 186 dynamic by default, 185 Ec2MultiRegionSnitch, 186 Ec2Snitch, 186 GoogleCloudSnitch,

12 Snitches (cont.) GossipingPropertyFileSnitch, 185 PropertyFileSnitch, 186 RackInferringSnitch, 186 SimpleSnitch, 185 Snitch serves, Solid-state drives (SSDs), 70, 129 Sorted string table (SSTable), 27 Speculative retrying, Sqoop, Apache, 300 SSH tools, 24 SSL encryption client encryption, enabling, inter-node encryption, enabling, 448 java cryptography extension files installation, 440 server certificates CA to Keystore, 446 certificate authority, creation, cluster configuration, keystores, 447 creation, nodes, server truststore, 447 signed certificates, keystore, 446 signing, CA s public key, 445 signing requests, 444 SSTable Attached Secondary (SASI), 222, 231 SSTables, 19, 21, 151 caching data, 119 compaction operation, data file, 117 data structures, for durable storage, 117 Stores data four-node, 152 hash values, Strict consistency, 110 Subrange repair, 169 Switching snitches, 321, 323 Symbolic link, 42 T Table statistics, TCP settings, 74 Test-driven development (TDD), 259 Third normal form, 8 Time-to-live (TTL), 212, TimeWindowCompactionStrategy (TWCS), Tokens, Tombstones, 19, 21, 108, 217 Tools directory, 54 Tracing data, 372 Tunable consistency, 14, 29, 100, 109, 111, 131 U Ubuntu server, Universally unique identifier (UUID), 205 User-defined aggregates (UDAs), 189, 247 User-defined functions (UDFs), 189, User-defined types (UDTs), 189, 244 V Vagrant tool, 260 Virtual machine (VM), 250 Virtual nodes (vnodes), 81 disable, 179 num_tokens parameter, 178 rebalance data, 173 ring with, 176 tokens,

13 W, X Write amplification (WA), 129 Write consistency ALL consistency level, 132 ANY consistency level, 134 default, 132 hinted handoff, 132 LOCAL_ONE consistency level, 134, 138 ONE consistency level, 133, 138 quorum-related levels, , 138 serial consistency settings, 139 TWO and THREE consistency levels, 133, 138 Writes data bloom filters, 124 commit log binary files, 123 to protect changes, space threshold, 115 Y index file, 124 internal operations, 122 memtable configure cleanup threshold, 115 configure flushing data, 114 database flushes, 113 durability, 114 flushing to disk, 123 nodetool drain command, 116 nodetool flush command, 116 request flow, 119, 121 role of hints, 122 SSTables (see SSTables) YAML-based profile file, 395 Z Zombie,

Glossary. Updated: :00

Glossary. Updated: :00 Updated: 2018-07-25-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.