Review of Morphus Feysal ibrahim Computer Science Engineering, Ohio State University Columbus, Ohio ibrahim.71@osu.edu Abstract Relational database dominated the market in the last 20 years, the businesses and application developers were happy with relational database until industries start getting a huge data. Data storing and data traffic became a big problem to sql database systems and it needed a bigger server, since servers couldn't get big enough to solve the problems, we were forced to use non relational database systems which supports multiple servers system (NoSql). NoSql database system solved the data traffic and data storing problem, but Reconfiguration operation became a big issue. In this short paper i present the summary of Morphus: Supporting Online Reconfiguration in sharded NoSQL System Paper. 1. Introduction NoSql database System like MongoDB uses several servers to prevent data traffic problem and to ease the pain of system scaling, these database systems use scale out approach by adding servers as need it, which solves the data storage problem. The MongoDB(Nosql database type) uses three type of servers, the first type called mongod servers which stores the chunks and the data, these are organized in sets and every set has an identical servers, one been the primary the others are the secondary servers. Data CRUD always happens in the primary server and then it passes the update to secondary servers using oplog replay. The second type is config server which stores the database configuration, Mongo server handles all the query operations, mongo gets the query request and it matches to mongod servers using the configurations in the config servers.[1] NoSql database systems have disadvantages, as the article mentioned NoSql has problems with reconfiguration operations. For example if the Business owners want to change the shard key or the chunk size, NoSql database systems like MongoDb has two ways of doing reconfiguration operations manually 1) saving the database which will cause a system shutdown period 2) creating cluster of servers with the new database configuration and migrate the data from the old cluster, the problem with the second approach is during the migration time there can t be no read and write operations, so NoSql database System lacks availability and it wouldn t support concurrent of data reading and writing during the reconfiguration time. [1] 2. System design A system called Morphus was created by computer scientist to solve the reconfiguration problems with MongoDB system by created automated reconfiguration (an online reconfiguration). In the early stage of the online reconfiguration, Morphus sends a query request to mongod servers to create an empty chunks and assign the new shard key to those chunks which won t have any effects to read and write
operations, and then morphus isolates one of the secondary server from each set of Mongod servers, morphus use these secondary servers to do data transfer. During the Isolation phase, Morphus will mute the slave oplog replay(a log that copies the written operation that happened in the primary server) of the secondary to prevent the writing operations and it will collect the timestamp(this timestamp will tell where to start the replay oplog). IN the third phase, Morphus performs decision making of data transferring by using either greedy algorithm or load balance via bipartite match, and data transferring will happen in third stage of automate reconfiguration process. In the end of the execution phase, the slavery oplog replay of the isolated secondary servers will be turned on, and all the written operation that happen during the reconfiguration will replayed on the isolated secondary servers. At this point, the isolated servers are up to date with the new shard keys and morphus system will make the isolated secondary servers primary servers, the old primary keys and the other secondary key will be updated with new chunks and the new shard keys. [1] 3. Network Awareness The chunk_based data migration approach that morphus system introduced had two problems 1) data size transfer issue and 2) the time it take to data transfer. To solve those two problems weighted fair sharing transfer approach was used, lets name the amount of data to send from destination to distance D and the time it takes to send a L, and the weighted amount X which equals DxL. The Weighted variable decide how many sockets will go to the follow, this WFS approach solved these two problems. Also the morphus system assumes that all the servers are in a one datacenter, the question is what happens when servers are in a different datacenters? in this case servers within the same datacenter will have datacenter tag name which will identify the isolated secondary servers within the same datacenter, so all the isolated secondary servers in the same datacenter will be reconfigured together.[1] 4. Related Work Marphus Morphus tries to reconfigure primary keys with the new primary keys by using one of the slave servers which have the same data with the primary server. The slave servers are isolated while the other servers are normally working with the old primary key. The isolated servers primary key will be changed during the reconfiguration time and it will get the update of write operations that happened during that time. These servers will be the primary servers, and the other primary/secondary servers will be updated with the new primary key and the old primary servers will be secondary servers.[1] Transactional Auto scaler: Elastic Scaling of Replicated In Memory Transactional Data Grids: Elastic scaling has been crucial in cloud computing, not know the amount of users that could visit a web application and commit transactions could raise a flag. Lots product based applications use an auto scaler application on top of their transactions in memory data grids to scale up or down based on the the scalability of trends. this auto scaler applications add nodes as the the amount of transactions increase, but ability to scale the system is limited the increase of same users that trying to process same data
and the increase of users in the network. Transactional Auto Scaler provides a system that precisely predicts the performance an application will achieve to a scale a system. TAS system uses black box, machine learning model to predict the the changes of network latency when system scaled up or down different times, and it uses analytic model to predict the changes of data when different users try to process same data and to catch CPU convention when multiple processes is running, Analytic Model also covers the two things that Machine learning lack 1) forecasting situation that has not been received any knowledge(limited extrapolation Power) and 2) reducing the training phase duration. [3] Elasticity in Cloud Computing In cloud computing industries s system have many resources, only a number of those sources are available based on the data size and the number of users. What Elasticity tries to do is allowing the system to automatically adapt its capacity workload over time by activating or deactivating a grid component. for example, if a system is using three servers to serve the purpose of its users and the amount of the users increased, using Elasticity approach the system would automatically be able to activate as many components as need it. It use matching function M(w)= r to capture the minimum grid components need it for the system to meet the performance requirement.[2] Zoolander Storing data can take a time, lots industries add delay formal to their database management, the longer latency of access storage will cause an increase time web page takes to load. The purpose of Zoolander is to prevent the slow storage so that the response time would not be affected. it takes replicate of predictable approach, it creates new nodes and copies all data to every node, and each node(duplicate nodes) will get all read/write accesses, Zoolander gives up throughput to achieve good response time. [5] Maestro The process of data storing in disk arrays has been difficult because of the different application that has different workload which shares servers in the disk arrays, Maestro system provides a way to manage the servers in the disk arrays to provide different performance for different application. It checks the performance of each application and stories the applications dynamically in the array servers so that the diverse of performance could be achieved with dynamic partitioning. [4] Adaptive Performance Aware Distributed Memory Caching: Dynamic web application oriented use memcached system to improve performance. What memcached do is it caches data to a RAM, so the amount of reading data from the database/servers could be reduced, but if the workload hugely increases it could result cache everloading. The Adaptive caching provides an automatic adjustment cashing based on how each cache server executes, the adaptive hash space scheduler calculates the hit rate and usage rate of each cache server, and the controller can auto scale memory cache servers to meet response time goal.[6] 5. Conclusion After evaluating Morphus with big data companies, the scientist noticed that morphus provides a highly availability of reading and writing during the reconfiguration time and the percentage of success writing is slightly decreased. When increased both
chunks and data size, the reconfiguration time when up, also most of the reconfiguration time is used the execution time. Increasing the number of identical servers in one set will result fast reconfiguration time, also WFS with large number of sockets improves the migration performance. MongoDb performs reconfiguration operations way better with Morphus system. 5. References [1] Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta. Morphus: Supporting Online Reconfiguration in sharded NoSQL System. In proceedings of the 12th International Conference on Autonomic Computing (ICAC 2015), [2] Nikolas R. Herbst, S. Kounev, R. Reussner. Elasticity in cloud computing: what is, and what is not. In proceedings of the 10th International Conference on Autonomic Computing (ICAC 2013), San Jose, CA, June 24 28. [3] D Didona, P Romano, S Peluso, F Quaglia. Transactional auto scaler: elastic scaling of in memory transactional data grids. In proceedings of the 9th international conference on Autonomic computing, ICAC 2012. [4] A Merchant, M Uysal, P Padala, X Zhu, S Singhal, K Shin. Maestro: Quality of service in large disk array. In Proceedings of the 8th ACM international conference on Autonomic computing, Karlsruhe, Germany June 14 18, 2011 [5] C Stewart, A Chakrabarti, R Griffith. Zoolander: Efficiently meeting very strict, low latency SLOs. In proceedings of the 10th international conference on Autonomic computing, ICAC 2013. [6] J Hwang, T Wood. Adaptive Performance Aware Distributed Memory Caching. In proceedings of the 10th international conference on Autonomic computing, ICAC 2013. [7] J Li, NK Sharma, DRK Ports, SD Gribble. Tales of the tail: Hardware, os, and application level sources of tail latency. In proceedings of the ACM Symposium on Cloud Computing, SOCC 2014.