App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1
Topics cover in this lesson What is Datastore? Datastore and relational database Scalability, reliability and performance Datastore Internals Bigtable Datastore Basics Operation Entity, Properties and Keys Properties and Value Types Datastore APIs 2
What is Datastore? Datastore is a database (persistent storage) for AppEngine AppEngine Traditional Web Apps Web application AppEngine (Java, Python, Pert/CGI PHP framework PHP, Go) Ruby on Rails Persistent storage Datastore RDBMS: MySQL, MS SQL, Oracle 3
What is Datastore? Persistent storage for AppEngine AppEngine is very scalable=> Many instances => Central Server to store data from all instances. Why not RDB? Scalability! 4
Datastore and RDBMS Datastore RDBMS Query SQL-like query Full support of SQL language flexibility language : Limited to simple filter and -Table JOIN - Flexible filtering sort - Subquery Reliability and Highly scalable and Hard to scale Scalability reliable with performance Datastore t offsers Google-level l l scalability 5
Problems of Scalability and Reliability Single Instance Performance limited by machine resource Single point of failure Replication (copies) increases reliability Consistency among instances Sharding (Split among machines) Lock control (transaction) [Shard = split server into multiple machines] 6
Strong Consistency and Eventual Consistency Strong Consistency Eventual Consistency Data is always consistent among all database instances -Just after write operation - Crash in the middle of write operation - > All server returns the same results. Takes time until all data becomes consistent after write (Think of DNS as an example) DNS i di t ib t d d t b t DNS is a distributed database system. Updated configuration on domain -> Not reflected to all DNS immediately. For a certain period of time, some DNS servers return old. 7
Scalability, Reliability and Performance on RDB Replication and/or sharding for scalability But Strong consistency on RDB slows write operations due to lock. Join operation is a bottleneck due to data shuffling. RDB ensures strong consistency -> Hard to ensure scalability. Datastore for AppEngine 8
Datastore Internals Based on Bigtable, which offers super high scalability. High availability by High Replication Datastore (HRD) Synchronous write on multiple datacenters. Supports strong consistency among multiple rows 9
What is Bigtable? Scalable, ab distributed, highly-available aaabead and structured storage Bigtable is not database itself (it doesn t support query) Consistency Strong consistency for single row Eventual consistency for multi-row level Google usage In production since April 2005 Web search, youtube 10
Automatic Scale-out of Bigtable table server 11
Bigtable Data Model Key value data storage A row has a Key and Columns Sorted by Key In lexical order Enables range query by application 12
Bigtable Operations CRUD on a row Create, Read, Update and Delete operations Preserves single-row strong consistency (not multiple row). Scan by range of keys But can not search by column values 13
Scalability is based on Bigtable automated sharding. Megastore supports transactions (strong consistency) 14
Property = actual data you want to store 15
16
Property can have multiple values. (Multiple data for one property) 17
18
19
20
App Engine: Datastore Query, Index and Transactions Part 2 21
22 22
23 23
24 24
25 25
Bigtable can scan on a key, not value! Index table on Bigtable: Property name and value Implement query on bigtable (without reading actual value) 26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
Caveats = limitations 48 48
49 49
50 50
51 51
52 52
53 53
54 54
55 55
56 56
57