Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems

Size: px
Start display at page:

Download "Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems"

Transcription

1 Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems Wesley Chow Ning Tan December 17, 2012 Abstract In this project, we investigate the possibility of using redundant requests and redundant replies to reduce the read latency in Apache Cassandra. Our techniques are applicable to all distributed data storage systems. We test varying the threshold when we send duplicate requests (fast-retries), both statically (in milliseconds) and dynamically (by percentile). We also attempt to vary sending duplicate replies with some small percentage. We show that in most variations, fast-retry and duplicate reply performance comes close to the baseline in the average case, and performs better than the baseline in the long tail. In order to give a systematic way to dynamically determine the optimal fast-retry threshold, we apply a graphical model to predict network latency in the system. We show that by exhibiting the correlation between nodes and time, our model is more accurate when compared to the previously proposed PBS model. We implement our redundant request and redundant replies technique in Cassandra and stress test the read performance using Berkeley s Psi Millennium cluster. 1 Introduction 1.1 Distributed Data Stores Distributed data stores have attracted quite a lot of interest in recent years, mostly due to the scalability, availability and efficiency it provides [20]. For example, Google developed BigTable [21] as the storage system for their services including Gmail, Google Map, Google Reader, Youtube, and Google Earth. Amazon developed Dynamo [23] for as a primary-key accessed data store for services such as its shopping cart. These systems typically replicate data across different machines and data centers [19, 20, 21, 23]. This allows the system to achieve high availability and partition tolerance when machines failed, as the data will still be available from other machines and other data centers [25]. This also allows us a method of improving system performance instead of waiting for a response from only one machine, the system send messages to all the replicas and wait for a fraction of them to response, thus improving the latency. However, there is a price to pay for this beneficial latency decrease. In particular, the strong consistency constraint is replaced with eventual consistency in these systems, meaning there is no guarantee on returning the most recent version of the data [32]. The only assumption is that over a sufficiently long period of time and an absence of writes, all replicas will eventually be consistent [19, 32]. This latency-consistency trade-off has had important implications in system design [19]. For systems with strict latency requirements, consistency is usually sacrificed. For instance, Amazon reported that 100ms of extra latency would result in a 1% loss in sales [28]. Therefore, they need to ensure that latency is low, even at the long tail (say 99.99th percentile). Google also reported that 500ms of extra latency would decrease their traffic by 20% [29], which results in a severe penalty in their revenue. On the other hand, this comes with a cost contacting fewer replicas generally weakens the consistency guarantees on queried data. Therefore, in order to achieve the optimal performance, one needs a better understanding of the latency vs. consistency trade-off. 1.2 Partial Quorums Most distributed data stores other than Dynamo are open-sourced and thus highly customizable. Users typically have the ability to choose the replication factor (N), the read quorum (R), and the write quorum (W). This gives them the ability to choose any sort of consistency they desire. If R + W N, then there is a strict quorum and any data read will have strong consistency. However, if R + W N, then there is only a partial quorum with eventual consistency guarantees. Partial quorums provide latency benefits at the cost of consistency over strict quorums. Another benefit of the flexibility of setting R and W is that users

2 can shift latencies around based on the application s needs. If the application is very read heavy for example, then having a lower R and a higher W will provide better consistency with a lower aggregate latency with the same number of replicas. 1.3 Read Paths Distributed Data stores such as Dynamo, Cassandra [27], LinkedIn s Voldemort [11], and Basho s Riak [3] all process reads differently. When it comes to achieving read quorums, these systems vary from two extremes. Dynamo and Riak will always send read requests to all N replicas and wait for only R responses [15, 23]. Voldemort will only send to R replicas and wait for all of them to respond [18]. Cassandra on the other hand, will send only R requests 90% of the time and N requests 90% of the time for consistency purposes (read-repair) [6, 7]. Sending more requests at the start is similar to what we are trying to achieve. Both methods will increase the load on the system while attempting to decrease latency. 1.4 Probabilistic Graphical Model The Graphical Model, also known as a Markov Random Field, is a well-studied model in statistic and machine learning. It brings together graph theory and probability theory in a powerful formalism [34]. It has proven to be very useful in various fields including bioinformatics, speech processing, image processing and control theory [34]. In terms of modeling network traffic, graphical model comes very handy as it encodes the network topology naturally. One thing that separates graphical modeling from previous models like PBS [20] (Probabilistic Bounded Staleness) is that graphical modeling captures the correlation between the nodes. While the usual independent and identically distributed random variable (i.i.d) assumption makes it much easier while doing analysis, we remark that the correlations between nodes and time usually play a crucial role in the performance of the system, especially when it comes to long tail performance. 1.5 Previous Work Sending redundant requests has been proven a very successful technique in Google s BigTable, especially when it comes to long-tail performance. Sending out a redundant request within 10ms if the initial request had not completed improved BigTable s 99.9th percentile latency from 994ms to 50ms [22]. There has been intensive study on applying graphical models in traffic modeling, prediction and classification [26, 30, 31, 36]. For example, [26] applies graphical modeling to model the traffic in the Greater Seattle area. [31] applies graphical modeling to model the traffic in London. In terms of network traffic, [30] uses graphical modeling for semi-supervised traffic classification. 1.6 Contributions of this Project We make the following contributions in the paper: We implement the fast-retry and duplicate reply method in Cassandra. We stress test many variations of fast-retry and duplicate reply on an eight node Psi Millennium cluster. We develop a new way of modeling network latency based on an undirected graphical model. This model captures the correlations between replicas, and therefore models and predicts the network traffic more accurately. The rest of the paper is organized as follows. Section 2 gives some background on graphical models. Section 3 describes the fast-retry and duplicate replies techniques and their potential benefits. In Section 4, we describe a graphical model that can be used to predict network traffic. Section 5 discusses learning parameters and Section 6 discusses inference algorithms in the graphical model. Section 7 discusses our implementation of fast-retry and duplicate replies in Cassandra. Section 8 describes our evaluation methods and the variations we tried. Section 9 discusses the results from the evaluation. Finally, Sections 10 and 11 contain our conclusions and possible future directions. 2 Preliminaries 2.1 Graphical Model In the probabilistic undirected graphical model, we have a graph G = (V, E), where V is the set of vertices, E V V is the edges between the vertices. In the model each vertex v V is corresponding to a random variable X v, which takes value from the domain D. For each maximal clique C in the graph, there is an associated potential function φ C (x C ) : D C R +, and the probability of an event happening is given by the following expression: (2.1) P(X = x) = 1 Z φ C (x C ) C C where Z is the normalization factor, also known as partition function. While the definition could be somewhat complicated and non-intuitive, it is shown in Hammersley- Clifford theorem that this is equivalent to a simple conditional independent condition: Theorem 2.1. (Hammersley-Clifford) The probability on V can be written as in (2.1) if and only if for any sets S 1, S 2, S 3 V such that S 3 separates S 1 and

3 S 2, the variables X S1 and X S2 are independent when conditioning on X S3. 3 Fast-retry and Duplicate reply The idea of sending duplicate requests is not new. However, to the best of our knowledge, there has not been any work done on finding the optimal threshold with which to send a duplicate request. Send a request too quickly, and the response might already be on the way and the extra load incurred was wasted. Wait too long, however, and the ability to reduce latencies is severely diminished. The ideal case would be to send one or more duplicate requests as soon as possible without inducing any latency penalty to the system. Similarly, when the replica receives this request, it incurs a disk operation in order to retrieve the value. There is no way for the replica to know if the duplicate request it just received was sent before its response was received or if the response was never received. The disk operation cost might become irrelevant as more companies move their systems to Solid State Drives. Amazon built Dynamo using SSDs [33], but this is not the common case. Duplicate replies are in a sense the pre-emptive move to counter a lossy network. If the network is constantly dropping some small percentage of packets, then in theory sending duplicate replies could help. The question is how to determine when to send a duplicate reply. We use the simplest method of sending a duplicate reply some percentage of the time. As there has been no prior work done on this area, we set out to test this hypothesis and determine its usefulness. 4 Modeling In this section we discuss the details of modeling and predicting the network traffic using graphical modeling. We make the following assumptions about the network traffic distribution: Condition 1. We assume that the connections between nodes satisfy the conditional independent condition: that is, given two set of edges in the network topology, if they are separated by the third set, then the traffic on the two sets are independent conditioning on the third set. This is a reasonable assumption because all the traffic between the two sets must go through the third set, therefore once the third set is fixed, the two sets would not be able to interfere with each other, and therefore they should behave independently. Condition 2. We assume that the traffic on the graph is only affected by the traffic in K previous time steps. That is, we assume that the traffic before K steps on the graph has negligible influence on current traffic. Equipped with the assumptions, we re now ready to describe our graphical model. The graph in our model will encode both time and network topology. Specifically, each vertex in the graph can be represented as a tuple (V i, V j, T ). That is, the vertex is a random variable represents the traffic condition between node i and node j at time T. Given two vertices (V i, V j, T 1 ) and (V k, V l, T 2 ), there is an edge between them if and only if either 1) T 1 = T 2 and (V i, V j ) shares a vertex with (V k, V l ) or 2) T 2 T 1 = 1 and (V i, V j ) = (V k, V l ). 5 Learning Parameters in Graphical Model Given the historical data, we use the data as training data to build our graphical model. Specifically, we learn the potential functions in the graphical model. Several algorithms were introduced in this context. Before getting to the choice of the learning algorithm, first we have to specify the metric on which we evaluate the quality of a set of parameter. In this work, we follow the classical framework of likelihood maximization, in which we try to find the set of parameters that maximizes the chance of us observing the data. Maximum likelihood is an extremely well studied framework in statistics, several methods were introduced to find the optimal parameters. In this work, we choose to use the EM algorithm introduced by Dempster, Laird and Rubin [24]. The reason we choose this algorithm based on the following reasoning: The convergence of the EM algorithm is very well understood. Specifically, it is shown by Wu that the EM algorithm converges under reasonable assumptions [37] The EM algorithm deals well with the existence of latent variables. This is very important in our application because the historical data we have may not be complete this could be due to server failures or packet drops. 6 Inference Algorithms in our Graphical Model There are several ways to do inference in graphical models. For example, one could use Sum-Product algorithm, which is known to compute all the marginals in linear time on trees. However, when it comes to general graphs, no polynomial time algorithm is known to compute the marginals exactly. Any algorithm has to either give up on efficiency or accuracy. Indeed, algorithms in both regimes were proposed. For example, Junction Tree algorithm can be used to do exact inference, however the running time has an exponential dependence

4 on the size of cliques in the graph. On the other hand, one could efficiently compute the Bethe Approximation, which only gives an approximation of the marginals. In this work, we use the Junction Tree algorithm, with the reasoning that the number of nodes we are dealing with is relatively small. Therefore, the Junction Tree algorithm can still satisfy our time-efficiency need. However, once the number of nodes grow, we will have to look into approximation algorithms that are more efficient. 7 Implementation Apache Cassandra ( is an open-sourced distributed NoSQL data store used by many companies including Netflix [4], Twitter [5] and Reddit [8]. It uses the distributed system model from Amazon s Dynamo and the data model from Google s BigTable. It takes the idea of dividing work into stages with separate stages pools from SEDA [6, 35]. We are going to focus only on the read path, since that is where our optimizations come into play. When a read is initiated, the following steps occur: [6] 1. The StorageProxy queries for the nodes (endpoints) that are responsible for replicas of specified key. 2. The currently alive endpoints are sorted by proximity. In our case, this is simply the round trip latency. This is kept track of by the LatencyTracker, which will also be used by our fast-retries later. 3. The closest endpoint is then sent a request for the actual data. This is handled by the ReadCallback class which will timeout after a user specified timeout. 4. The remaining R - 1 nodes are sent a digest request. Digests take the same CPU and disk I/O cost as a regular read, but lessen the load on the network. 5. If there are no digest mismatches, the data is returned. Otherwise, read-repair occurs and then the results are read again. 6. As this is happening, read-repair will probabilistically be sent messages to compute the digest of the response for increased consistency. We modify 1.1.6, the latest stable version of Cassandra [27]. In our implementation, we ignore read-repair, as we are not concerned with the consistency or staleness of the data we get back [20]. In terms of lines of code, fast-retry and duplicate replies are extremely simple. Depending on our configuration, our ReadCallback timeout is shortened from the default (two seconds) to our fast-retry timeout. Once the ReadCallback times out the first time, we send another request to the same endpoint and wait the remaining amount of time. The user set remote procedure call timeout is never exceeded even with fast-retry. To use dynamic fast-retry thresholds, we need to have access to running percentile data from past reads. Luckily, the LatencyTracker keeps a count of all total operations and a rough latency histogram. The LatencyTracker s histogram s buckets consists of ninety-two buckets spanning from one microsecond to thirty-six seconds. Each bucket is 1.2 times larger than the last bucket, giving us an inexact latency percentile measurement. However, this approximation is acceptable for our purposes. Implementation of duplicate replies was very simple. With some probability, the ReadVerbHandler will send two replies instead of one. We chose to send the duplicate request to the same node instead of sending to the N - R nodes which were still unsolicited. Mostly this is because in our tests, our replication factor N was 4, our read quorum was 3, and the nodes never went down. Had our test setup been more complex, perhaps with a larger N or with some system churn, sending to the N - R nodes instead of the same R nodes on fast-retry would have been worth implementing. 8 Evaluation 8.1 Setup We tested our algorithms on the Psi Millennium Cluster here at Berkeley [17]. The cluster consisted of eight nodes, each with 32 8-core Intel(R) Xeon(R) 2.60GHz chips. While the nodes had large SSDs, we chose to use NFS to introduce a larger I/O bottleneck. This was mostly done to simulate what would happen in a more heavily loaded system without having to go through the trouble of running an I/O intensive task on each of the nodes. However, the nodes each had 128GB of RAM, which significantly limited the amount of actual disk operations performed, especially as we only inserted and read about 8GB of data. Unfortunately, this cluster is most likely not representative of what companies are running their distributed data storage instances on. When the cluster initializes, each node is manually assigned a token in order to ensure completely even partitioning of the key-space. This was done in order to remove partitioning randomness and ensure that each run of our tests had the exact same environment. To prepare our tests, we used the built-in Cassandra Stress Tool [2]. The setup involves first placing five million keys that are each replicate to four nodes. Since our cluster is all in one data center, we used SimpleStrategy which simply places replicas on nodes

5 clockwise on the key ring without considering rack or data center location. As our nodes were all in the same data center, and their rack configuration unknown, SimpleStrategy was the best option. All our tests attempt to read the five million keys we just inserted, with a read quorum of 3. The test script was run on one node, but the script initiates read requests to all nodes in our cluster. This shares the coordination workload among the entire cluster instead of just one node. 8.2 Variations We started our tests simple and added on variations as we went. First, we tried sending a fast-retry at various static thresholds. Fig.1 shows the performance between the 90th and 99th percentiles and Fig.2 shows the performance between the 99th and 99.9th percentile. We show only three static thresholds to avoid clutter. Fig.3 shows the same experiment, except we added an artificial CPU load to every node in the cluster. This was done by running a script that created ten threads that would endlessly refine an estimation of π using Leibniz s formula [10] and periodically print the output. After this experiment, we changed our algorithm and made our fast-retries dynamic. Instead of sending a duplicate request after waiting some static amount of time, we send another request at the current 95th, 97th, 99th, and 99.9th percentiles. Fig.4 shows the behavior of percentile based dynamic retries from the 0th to the 99th percentiles, and Fig.5 show the 99th to 99.9th percentiles. In the above experiment, our fast-retries would be sent off at whatever the current percentile setting dictated, with a minimum of 1ms, up to the 2 second timeout. However, as it is unlikely that fast-retries would make much of a difference on the low end of that scale, we introduced a minimum wait time before a fast retry is sent. We then re-ran the above percentiles except with a 5ms minimum wait time. This is shown in Fig.6. Since our nodes are sitting in the same data center, network conditions such as packet drops are unlikely to happen. If a distributed data store is spread across a wide-area network however, this is much more likely to happen. Therefore, we used the netem [12] tool to introduce artificial packet drops in the network. We started with a 3% packet drop rate with a 25% correlation. This means that 3% of packets would be dropped, with the likelihood of each successive packet being dropped depending 25% on whether the last one was dropped. Using the correlation value allows us to simulate packet burst losses, a common packet loss pattern in real world networks. We varied the percentile at which we sent a fast-retry, keeping the same 5ms minimum wait time as before. The results of the 3% packet loss trial is shown in Fig.7. 5% packet losses with 25% correlation is shown in Fig.8. The last variation we did was to test the effects of randomly sending a duplicate reply. The system maintained a 5% packet loss with 25% correlation, and we sent a duplicate reply with 3%, 5%, 7%, and 10% probabilities. This variation always sent a duplicate request at the 97th percentile threshold. The results of this trial is shown in Fig.9. Note that different variations should not be compared against each other. Certain variations logged a lot more data than other variations, leading to different performance baselines. 9 Results 9.1 Fast-retry and Duplicate reply Static fastretry did not provide any latency benefits over regular Cassandra until approximately the 99.9th percentile. Below this threshold, static fast-retry performs on par with or slightly worse than the baseline. This held true whether or not we applied a heavy CPU load to nodes in the system. The lack of difference between the light and heavy CPU load cases can be explained by the hardware our tests ran on. Having 32 8-core processors per node makes it very difficult to make the performance CPUbound. In most real-world applications, the bottleneck is in disk I/O anyways, not the CPU. Dynamic fast-retry on the other hand, performed well at every percentile. Between the 0th and 99th percentiles, dynamic fast-retry performed on par with the baseline. In fact, we would be surprised if dynamic fast-retry affected the performance at lower percentiles, as no duplicate work is done unless a request s duration reaches the specified percentile. Between the 99th and 99.9th percentiles, dynamic fast-retry performs just slightly better than the baseline. This is true for both the 1ms and 5ms minimum wait times. As our system is very powerful and the network is very reliable, the minimal performance gains here are not discouraging. Introducing packet loss into our cluster allows fastretry to demonstrate its potential latency improvements more easily. After introducing a 3% packet drop into the network, we begin to see a more noticeable difference between the baseline and our dynamic fastretry implementation. Increase that rate to 5%, and fast-retry suddenly performs much better than the baseline. Using a 95th percentile dynamic threshold results in a 67ms 99.9th percentile latency. The baseline achieves a 117ms latency, almost doubling the fast-retry result! We ran the duplicate reply tests with a 97th percentile fast-retry with the same 5% packet loss settings

6 as before. Unfortunately, sending duplicate replies does not seem to produce a noticeable performance gain over just having 97th percentile fast-retry. 9.2 Graphical Model In order to verify the accuracy of our graphical model, we use the baseline data as training data, then use our model to predict the the performance of the fast-retry algorithm. Also, we compare our performance with the performance of PBS model prediction. Things are slightly subtle in our case, as the LatencyTracker only records the minimum latency among all the replicas, and we do not have any information regarding the remaining latencies. We handle this by assigning a latent variable to each time step, indicating the id of the replica with the minimum latency. As we mentioned before, the EM algorithm can be applied to deal with the existence of latent variable, therefore we still are able to calculate the Maximum Likelihood Distribution. However, this approach comes with a penalty: the lack of information severely limits the power of the model. In particular, we will not be able to capture any correlation between replicas since the data does not distinguish them. Indeed, in our final prediction, the replicas behave identical and independently. However, we will still be able to capture the correlation between the latencies and time, which plays a crucial role in the result. 9.3 Modeling Evaluation We compare the performance of our modeling algorithm and PBS prediction algorithm. We compare the performance in the simplest configuration: no heavy background workload and fixed threshold fast-retry. This is because that both models do not take into consideration of background workload and packet drops, therefore would yield undesirable performance upon the existence of these factors. However, we remark that both models can be easily modified to take these factors into account. As shown in Fig.10 and Fig.11, we compare the performances in two settings: 7ms fast-retry and 15ms fast-retry. We can see that our graphical model yields more accurate predictions in both settings. The main reason is that the PBS prediction usually tends to be over-optimistic in terms of latency predictions. For example, it is nearly impossible for two consecutive requests to both end up at long tail ( 99% latency), while this scenario actually happens a lot in practice. Graphical model avoids these kinds of inaccuracies by exploiting the correlation between two consecutive requests, therefore yielding a more accurate prediction. 10 Conclusions The static fast-retry technique should only be used very carefully by the owner of a system. From the 0th to the 99th percentile, fast-retry latencies are on par with the baseline Cassandra. However, from 99th percentile to around the 99.8th percentile, the baseline does better than fast-retry in most of the settings we tried. Only after this point does fast-retry improve upon the baseline performance. Instead, users of distributed storage systems should use percentile based fast-retry. Percentile based fast-retry improves performance over the baseline implementation across the board in every percentile for every configuration we tried. The latency improvements are especially noticeable when the network consistently drops some percentage of packets (5% in our tests). As for modeling network latencies, we show in this work that our graphical model provides more accurate modeling and prediction compared to PBS. However, we did not implement our prediction algorithm in Cassandra. We hope this could be done in future so that we could dynamically adjust the fast-retry threshold based on the model s predictions. Sending duplicate replies, at least in our implementation, does not appear to be all that beneficial. The results are fairly close to sending only one reply, but one set of runs did not generate enough data for us to have a confident conclusion as to the efficacy of this method. That said, our guess would be that this method (as implemented now) is ineffective and should not be implemented. 11 Future Work Our results were all gathered from a very powerful cluster living in a single data center. While the techniques we tried were effective, it is hard for the effects to be noticeable on such a small scale using such powerful computational resources. Using a testbed that is closer to real-world applications could potentially show that our techniques have a much larger effect. Also, while each trial was five million operations, each trial was only run once. Running the trials more times would reduce noise that may have been introduced through other environmental factors. One factor that comes to mind is if other nodes in the Psi Millennium cluster ran a bandwidth intensive application that introduced cross-traffic in our system and skewed our results. In our fast-retry, we send a duplicate reply to the same node it sent the original to. In a variation on fastretry, the coordinator could instead send a request to a node that has not been solicited yet - provided the read quorum R is lower than the replication factor N. By sending to the remaining N - R nodes, we could potentially decrease latencies further by contacting different

7 nodes that might not be down - perhaps the reason the initial node did not reply is because it suddenly became inaccessible. This technique would most likely show the best results over the baseline in cases of system churn. As the duplicate reply method did not yield very positive or very negative results, the technique warrants further study. The ideal behavior would be for some Oracle to know that a packet would be dropped and preemptively send a duplicate. As this is not possible, one could attempt to achieve the next best thing. Since packet losses tend to happen in bursts, a system could approximate the packet loss rate at any given time. Using this data, it could be possible to know when to scale up or down percentage with which it sends a duplicate reply, achieving improved read latencies. For the graphical model approach, we suspect that the performance of the modeling algorithm will improve if more detailed and accurate data is provided that is, the detailed latencies between replicas. Once the data is available, our modeling algorithm will capture the correlation between replicas. We believe that it will be particularly useful when this approach is applied to a distributed data store where the data is placed across a wide area, as correlation among replicas play a more crucial role in this scenario. Acknowledgements We would like to thank Peter Bailis and Shivaram Venkataraman for their guidance throughout the project. We would like to thank the AMP Genomics project for lending us their systems, and Jon Kuroda for facilitating the process. Also, Anthony D. Joseph, John D. Kubiatowicz, and Aaron Davidson for help at various points. References [1] Apache cassandra 1.1 documentation - cassandrastress. references/stress_java, December [2] Basho riak. riak-overview/, December [3] Benchmarking cassandra scalability on aws - over a million writes per second. benchmarking-cassandra-scalability-on.html, November [4] Cassandra at twitter today. http: //engineering.twitter.com/2010/07/ cassandra-at-twitter-today.html, July [5] Cassandra wiki: Architecture internals. apache.org/cassandra/architectureinternals, Decemeber [6] Cassandra wiki: Operations. org/cassandra/operations#repairing_missing_or_ inconsistent_data, December [7] January state of the servers. reddit.com/search/label/cassandra, January [8] Leibniz formula for pi. wiki/leibniz_formula_for_pi, December [9] Linkedin voldemort. project-voldemort.com/voldemort/, December [10] netem. collaborate/workgroups/networking/netem, December [11] Riak read path - get fsm. https: //github.com/basho/riak_kv/blob/ 42eb6951b369e3fd9a42f7f54fb7618a40f1a9fb/ src/riak_kv_get_fsm.erl#l153, June [12] Uc berkeley cluster computing. millennium.berkeley.edu/wiki/psi, December [13] Voldemort read path - pipelineroutedstore. master/src/java/voldemort/store/routed/ PipelineRoutedStore.java#L186, September [14] D. Abadi. Consistency tradeoffs in modern distributed database system design: Cap is only part of the story. Computer, 45(2):37 42, [15] P. Bailis, S. Venkataraman, M.J. Franklin, J.M. Hellerstein, and I. Stoica. Probabilistically bounded staleness for practical partial quorums. Proceedings of the VLDB Endowment, 5(8): , [16] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, [17] J. Dean. Achieving rapid response times in large online services. googleusercontent.com/external_content/ untrusted_dlcp/research.google.com/en/us/ people/jeff/berkeley-latency-mar2012.pdf, May [18] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon s highly available key-value store. In ACM SIGOPS Operating Systems Review, volume 41, pages ACM, [19] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1 38, [20] Seth Gilbert and Nancy Lynch. Brewer s conjecture and the feasibility of consistent available partitiontolerant web services. In In ACM SIGACT News, page 2002, [21] E.J. Horvitz, J. Apacible, R. Sarin, and L. Liao. Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service.

8 arxiv preprint arxiv: , [22] A. Lakshman and P. Malik. Cassandraa decentralized structured storage system. Operating systems review, 44(2):35, [23] G. Linden. Make data useful. Presentation, Amazon, November, [24] G. Linden. Marissa mayer at web 2.0. Online at: [25] C. Rotsos, J. Van Gael, A.W. Moore, and Z. Ghahramani. Probabilistic graphical models for semisupervised traffic classification. In Proceedings of the 6th International Wireless Communications and Mobile Computing Conference, pages ACM, [26] S. Sun, C. Zhang, and G. Yu. A bayesian network approach to traffic flow forecasting. Intelligent Transportation Systems, IEEE Transactions on, 7(1): , [27] W. Vogels. Eventually consistent. Communications of the ACM, 52(1):40 44, [28] W. Vogels. Amazon dynamodb a fast and scalable nosql database service designed for internet scale applications /01/amazon-dynamodb.html, January [29] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1-2):1 305, [30] M. Welsh, D. Culler, and E. Brewer. Seda: An architecture for well-conditioned, scalable internet services. In ACM SIGOPS Operating Systems Review, volume 35, pages ACM, [31] J. Whittaker, S. Garside, and K. Lindveld. Tracking and predicting a network traffic process. International Journal of Forecasting, 13(1):51 61, [32] CF Wu. On the convergence properties of the em algorithm. The Annals of Statistics, 11(1):95 103, Appendix This appendix section contains the graphs we have for this report. All the graphs are generated from Matlab. Unless otherwise specified, all the latencies are measured in millisecond. Figure 1: Figure 2: Figure 3:

9 Figure 4: Figure 7: Figure 5: Figure 8: Figure 6: Figure 9:

10 Figure 10: Figure 11:

How Eventual is Eventual Consistency?

How Eventual is Eventual Consistency? Probabilistically Bounded Staleness How Eventual is Eventual Consistency? Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica (UC Berkeley) BashoChats 002, 28 February

More information

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

L22: NoSQL. CS3200 Database design (sp18 s2)   4/5/2018 Several slides courtesy of Benny Kimelfeld L22: NoSQL CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 4/5/2018 Several slides courtesy of Benny Kimelfeld 2 Outline 3 Introduction Transaction Consistency 4 main data models

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley

@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley PBS @ Twitter 6.22.12 Peter Bailis @pbailis Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica UC Berkeley PBS Probabilistically Bounded Staleness 1. Fast 2. Scalable 3. Available solution:

More information

The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G.

The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G. The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall,

More information

NOSQL DATABASE PERFORMANCE BENCHMARKING - A CASE STUDY

NOSQL DATABASE PERFORMANCE BENCHMARKING - A CASE STUDY STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LXIII, Number 1, 2018 DOI: 10.24193/subbi.2018.1.06 NOSQL DATABASE PERFORMANCE BENCHMARKING - A CASE STUDY CAMELIA-FLORINA ANDOR AND BAZIL PÂRV Abstract.

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Eventual Consistency 1

Eventual Consistency 1 Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information

Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study

Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study Ricardo Gonçalves Universidade do Minho, Braga, Portugal, tome@di.uminho.pt Abstract. Major web applications need the partition-tolerance

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

A Review to the Approach for Transformation of Data from MySQL to NoSQL

A Review to the Approach for Transformation of Data from MySQL to NoSQL A Review to the Approach for Transformation of Data from MySQL to NoSQL Monika 1 and Ashok 2 1 M. Tech. Scholar, Department of Computer Science and Engineering, BITS College of Engineering, Bhiwani, Haryana

More information

Tail Latency in ZooKeeper and a Simple Reimplementation

Tail Latency in ZooKeeper and a Simple Reimplementation Tail Latency in ZooKeeper and a Simple Reimplementation Michael Graczyk Abstract ZooKeeper [1] is a commonly used service for coordinating distributed applications. ZooKeeper uses leader-based atomic broadcast

More information

A Cloud Storage Adaptable to Read-Intensive and Write-Intensive Workload

A Cloud Storage Adaptable to Read-Intensive and Write-Intensive Workload DEIM Forum 2011 C3-3 152-8552 2-12-1 E-mail: {nakamur6,shudo}@is.titech.ac.jp.,., MyCassandra, Cassandra MySQL, 41.4%, 49.4%.,, Abstract A Cloud Storage Adaptable to Read-Intensive and Write-Intensive

More information

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000 Brewer s CAP Theorem Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000 Written by Table of Contents Introduction... 2 The CAP-Theorem...

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 11. Advanced Aspects of Big Data Management Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Low Latency via Redundancy

Low Latency via Redundancy Low Latency via Redundancy Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker Presenter: Meng Wang 2 Low Latency Is Important Injecting just 400 milliseconds

More information

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,

More information

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies! DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Evaluating Auto Scalable Application on Cloud

Evaluating Auto Scalable Application on Cloud Evaluating Auto Scalable Application on Cloud Takashi Okamoto Abstract Cloud computing enables dynamic scaling out of system resources, depending on workloads and data volume. In addition to the conventional

More information

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled ) Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2016 Rise of Web and cluster-based computing NoSQL Movement Relationships vs. Aggregates Key-value store XML or JSON

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi Journal of Energy and Power Engineering 10 (2016) 405-410 doi: 10.17265/1934-8975/2016.07.004 D DAVID PUBLISHING Shirin Abbasi Computer Department, Islamic Azad University-Tehran Center Branch, Tehran

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

An Efficient Distributed B-tree Index Method in Cloud Computing

An Efficient Distributed B-tree Index Method in Cloud Computing Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 214, 8, 32-38 32 Open Access An Efficient Distributed B-tree Index Method in Cloud Computing Huang Bin 1,*

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [REPLICATION & CONSISTENCY] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L25.1 L25.2 Topics covered

More information

Enhancing the Query Performance of NoSQL Datastores using Caching Framework

Enhancing the Query Performance of NoSQL Datastores using Caching Framework Enhancing the Query Performance of NoSQL Datastores using Caching Framework Ruchi Nanda #1, Swati V. Chande *2, K.S. Sharma #3 #1,# 3 Department of CS & IT, The IIS University, Jaipur, India *2 Department

More information

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time)

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time) EECS 262a Advanced Topics in Computer Systems Lecture 22 Reprise: Stability under churn (Tapestry) P2P Storage: Dynamo November 20 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical Engineering

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

arxiv: v3 [cs.dc] 1 Oct 2015

arxiv: v3 [cs.dc] 1 Oct 2015 Continuous Partial Quorums for Consistency-Latency Tuning in Distributed NoSQL Storage Systems [Please refer to the proceedings of SCDM 15 for the extended version of this manuscript.] Marlon McKenzie

More information

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. Q SCALABLE FOCUS WEB SERVICES Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. EVENTUALLY CONSISTENT 14 October 2008 ACM QUEUE rants:

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases Khalid Mahmood Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, Karachi Pakistan khalidmdar@yahoo.com

More information

Reducing Disk Latency through Replication

Reducing Disk Latency through Replication Gordon B. Bell Morris Marden Abstract Today s disks are inexpensive and have a large amount of capacity. As a result, most disks have a significant amount of excess capacity. At the same time, the performance

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

CAP Theorem, BASE & DynamoDB

CAP Theorem, BASE & DynamoDB Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan

More information

On the Energy Proportionality of Distributed NoSQL Data Stores

On the Energy Proportionality of Distributed NoSQL Data Stores On the Energy Proportionality of Distributed NoSQL Data Stores Balaji Subramaniam and Wu-chun Feng Department. of Computer Science, Virginia Tech {balaji, feng}@cs.vt.edu Abstract. The computing community

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

Defining Weakly Consistent Byzantine Fault-Tolerant Services

Defining Weakly Consistent Byzantine Fault-Tolerant Services Defining Weakly Consistent Byzantine Fault-Tolerant Services Atul Singh, Pedro Fonseca, Petr Kuznetsov,, Rodrigo Rodrigues, Petros Maniatis MPI-SWS, Rice University, TU Berlin/DT Labs, Intel Research Berkeley

More information

Aerospike Scales with Google Cloud Platform

Aerospike Scales with Google Cloud Platform Aerospike Scales with Google Cloud Platform PERFORMANCE TEST SHOW AEROSPIKE SCALES ON GOOGLE CLOUD Aerospike is an In-Memory NoSQL database and a fast Key Value Store commonly used for caching and by real-time

More information

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook Cassandra - A Decentralized Structured Storage System Avinash Lakshman and Prashant Malik Facebook Agenda Outline Data Model System Architecture Implementation Experiments Outline Extension of Bigtable

More information

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School

More information

SoftNAS Cloud Performance Evaluation on Microsoft Azure

SoftNAS Cloud Performance Evaluation on Microsoft Azure SoftNAS Cloud Performance Evaluation on Microsoft Azure November 30, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for Azure:... 5 Test Methodology...

More information

Scaling DreamFactory

Scaling DreamFactory Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud

More information

10.0 Towards the Cloud

10.0 Towards the Cloud 10.0 Towards the Cloud Distributed Data Management Wolf-Tilo Balke Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10.0 Special Purpose Database

More information

Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System

Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System Thazin Nwe 1, Junya Nakamura 2, Ei Chaw Htoon 1, Tin Tin Yee 1 1 University of Information

More information

Documentation Accessibility. Access to Oracle Support

Documentation Accessibility. Access to Oracle Support Oracle NoSQL Database Availability and Failover Release 18.3 E88250-04 October 2018 Documentation Accessibility For information about Oracle's commitment to accessibility, visit the Oracle Accessibility

More information

HTRC Data API Performance Study

HTRC Data API Performance Study HTRC Data API Performance Study Yiming Sun, Beth Plale, Jiaan Zeng Amazon Indiana University Bloomington {plale, jiaazeng}@cs.indiana.edu Abstract HathiTrust Research Center (HTRC) allows users to access

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?

More information

Jim Metzler. Introduction. The Role of an ADC

Jim Metzler. Introduction. The Role of an ADC November 2009 Jim Metzler Ashton, Metzler & Associates jim@ashtonmetzler.com Introduction In any economic environment a company s senior management expects that their IT organization will continually look

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

File Size Distribution on UNIX Systems Then and Now

File Size Distribution on UNIX Systems Then and Now File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,

More information

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Evaluation of the Huawei UDS cloud storage system for CERN specific data th International Conference on Computing in High Energy and Nuclear Physics (CHEP3) IOP Publishing Journal of Physics: Conference Series 53 (4) 44 doi:.88/74-6596/53/4/44 Evaluation of the Huawei UDS cloud

More information

Modern Database Concepts

Modern Database Concepts Modern Database Concepts Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz NoSQL Overview Main objective: to implement a distributed state Different objects stored on different

More information

Identifying Important Communications

Identifying Important Communications Identifying Important Communications Aaron Jaffey ajaffey@stanford.edu Akifumi Kobashi akobashi@stanford.edu Abstract As we move towards a society increasingly dependent on electronic communication, our

More information

Apache Cassandra - A Decentralized Structured Storage System

Apache Cassandra - A Decentralized Structured Storage System Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Take Back Lost Revenue by Activating Virtuozzo Storage Today

Take Back Lost Revenue by Activating Virtuozzo Storage Today Take Back Lost Revenue by Activating Virtuozzo Storage Today JUNE, 2017 2017 Virtuozzo. All rights reserved. 1 Introduction New software-defined storage (SDS) solutions are enabling hosting companies to

More information

Cluster-Level Google How we use Colossus to improve storage efficiency

Cluster-Level Google How we use Colossus to improve storage efficiency Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

Performance Evaluation of NoSQL Databases

Performance Evaluation of NoSQL Databases Performance Evaluation of NoSQL Databases A Case Study - John Klein, Ian Gorton, Neil Ernst, Patrick Donohoe, Kim Pham, Chrisjan Matser February 2015 PABS '15: Proceedings of the 1st Workshop on Performance

More information

Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra

Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra Huajin Wang 1,2, Jianhui Li 1(&), Haiming Zhang 1, and Yuanchun Zhou 1 1 Computer Network Information

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

SoftNAS Cloud Performance Evaluation on AWS

SoftNAS Cloud Performance Evaluation on AWS SoftNAS Cloud Performance Evaluation on AWS October 25, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for AWS:... 5 Test Methodology... 6 Performance Summary

More information

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and

More information

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Key-Value Document Column Family Graph John Edgar 2 Relational databases are the prevalent solution

More information

CSE-E5430 Scalable Cloud Computing Lecture 10

CSE-E5430 Scalable Cloud Computing Lecture 10 CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Introduction to Computer Science William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Chapter 9: Database Systems supplementary - nosql You can have data without

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

6.UAP Final Report: Replication in H-Store

6.UAP Final Report: Replication in H-Store 6.UAP Final Report: Replication in H-Store Kathryn Siegel May 14, 2015 This paper describes my research efforts implementing replication in H- Store. I first provide general background on the H-Store project,

More information

Dynamo: Key-Value Cloud Storage

Dynamo: Key-Value Cloud Storage Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems

More information

Indexing Large-Scale Data

Indexing Large-Scale Data Indexing Large-Scale Data Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook November 16, 2010

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

Introduction to the Active Everywhere Database

Introduction to the Active Everywhere Database Introduction to the Active Everywhere Database INTRODUCTION For almost half a century, the relational database management system (RDBMS) has been the dominant model for database management. This more than

More information

arxiv: v1 [cs.db] 26 Apr 2012

arxiv: v1 [cs.db] 26 Apr 2012 Probabilistically Bounded Staleness for Practical Partial Quorums Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica University of California, Berkeley {pbailis,

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

Programming Project. Remember the Titans

Programming Project. Remember the Titans Programming Project Remember the Titans Due: Data and reports due 12/10 & 12/11 (code due 12/7) In the paper Measured Capacity of an Ethernet: Myths and Reality, David Boggs, Jeff Mogul and Chris Kent

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Block Storage Service: Status and Performance

Block Storage Service: Status and Performance Block Storage Service: Status and Performance Dan van der Ster, IT-DSS, 6 June 2014 Summary This memo summarizes the current status of the Ceph block storage service as it is used for OpenStack Cinder

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

Migrating Oracle Databases To Cassandra

Migrating Oracle Databases To Cassandra BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Hardware Sizing Using Amazon EC2 A QlikView Scalability Center Technical White Paper June 2013 qlikview.com Table of Contents Executive Summary 3 A Challenge

More information

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns White Paper Table of Contents Abstract... 3 Introduction... 3 Performance Implications of In-Memory Tables...

More information

Research. Eurex NTA Timings 06 June Dennis Lohfert.

Research. Eurex NTA Timings 06 June Dennis Lohfert. Research Eurex NTA Timings 06 June 2013 Dennis Lohfert www.ion.fm 1 Introduction Eurex introduced a new trading platform that represents a radical departure from its previous platform based on OpenVMS

More information

Weak Consistency as a Last Resort

Weak Consistency as a Last Resort Weak Consistency as a Last Resort Marco Serafini and Flavio Junqueira Yahoo! Research Barcelona, Spain { serafini, fpj }@yahoo-inc.com ABSTRACT It is well-known that using a replicated service requires

More information

Trade- Offs in Cloud Storage Architecture. Stefan Tai

Trade- Offs in Cloud Storage Architecture. Stefan Tai Trade- Offs in Cloud Storage Architecture Stefan Tai Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud-

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

CS6450: Distributed Systems Lecture 15. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information