Open-Source Databases: Within, Outside, or Beyond Lehman s Laws of Software Evolution?

Size: px
Start display at page:

Download "Open-Source Databases: Within, Outside, or Beyond Lehman s Laws of Software Evolution?"

Transcription

1 Open-Source Databases: Within, Outside, or Beyond Lehman s Laws of Software Evolution? Ioannis Skoulis, Panos Vassiliadis, and Apostolos Zarras Dept. of Computer Science and Engineering University of Ioannina (Hellas) {iskoulis, pvassil, zarras}@cs.uoi.gr Abstract. Lehman s laws of software evolution is a well-established set of observations (matured during the last forty years) on how the typical software systems evolve. However, the applicability of these laws on databases has not been studied so far. To this end, we have performed a thorough, large-scale study on the evolution of databases that are part of larger open source projects, publicly available through open source repositories, and report on the validity of the laws on the grounds of properties like size, growth, and amount of change per version. Keywords: Schema evolution, software evolution, Lehman s laws 1 Introduction Software evolution is the change of a software system over time, typically performed via a remarkably difficult, complicated and time consuming process, software maintenance. In an attempt to understand the mechanics behind the evolution of software and facilitate a smoother, lest disruptive maintenance process, Meir Lehman and his colleagues introduced a set of rules in mid seventies [1], also known as the Laws on Software Evolution (Sec. 2). Their findings, that were reviewed and enhanced for nearly 40 years [2], [3], have, since then, given an insight to managers, software developers and researchers, as to what evolves in the lifetime of a software system, and why it does so. Other studies ([4], [5], [6] to name a few significant ones) have complemented these insights in this field, typically with particular focus to open-source software projects. In sharp distinction to traditional software systems, database evolution has been hardly studied throughout the entire lifetime of the data management discipline. This deficit in our knowledge is disproportional to the severity of the implications of database evolution, and in particular, of database schema evolution. A change in the schema of a database may immediately drive surrounding applications to crash (in case of deletions or renamings) or be semantically defective or inaccurate (in the case of information addition, or restructuring). Overall, schema evolution threatens the syntactic and semantic validity of the surrounding applications and severely affects both developers and end-users. Given this importance, it is only amazing to find out that in the past 40 years of database

2 research, only three(!) studies [7], [8] and [9] have attempted a first step towards understanding the mechanics of schema evolution. Those studies, however, focus on the statistical properties of the evolution and do not provide details on the actual events, or the mechanism that governs the evolution of database schemata. In this paper, we perform the first large-scale study of schema evolution in the related literature. Specifically, we study the evolution of the logical schema of eight databases, that are parts of publicly available, open-source software projects. To achieve the above goal, we have collected, cleansed and processed the available versions of the database schemata for the eight case studies. Moreover, we have extracted the changes that have been performed in these versions and, finally, we have come up with the respective datasets that can serve as a foundation for future analysis by the research community (Sec. 3). Concerning the applicability of Lehman s laws to open-source databases, our results show that the essence of Lehman s laws holds: evolution is not about uncontrolled growth; on the contrary, there appears to be a stabilization mechanism that employs perfective maintenance to control the otherwise growing trend of increase in information capacity of the database (Sec. 4). Having said that, we also observe that the growth mechanisms and the change patterns are quite different between open source databases and typical software systems. 2 Lehman Laws of Software Evolution in a Nutshell Meir M. Lehman and his colleagues, have introduced, and subsequently amended, enriched and corrected a set of rules on the behavior of software as it evolves over time [1], [2], [3]. Lehman s laws focus on E-type systems, that concern software solving a problem or addressing an application in the real-world [2]. The main idea behind the laws of evolution for E-type software systems is that their evolution is a process that follows the behavior of a feedback-based system. Being a feedback-based system, the evolution process has to balance (a) positive feedback, or else the need to adapt to a changing environment and grow to address the need for more functionality, and, (b) negative feedback, or else the need to control, constrain and direct change in ways that prevent the deterioration of the maintainability and manageability of the software. In the sequel we list the definitions of the laws as they are presented in [3], in a more abstract form than previous versions and with the benefit of retrospect, after thirty years of maturity and research findings. (I) Law of Continuing Change An E-type system must be continually adapted or else it becomes progressively less satisfactory in use. (II) Law of Increasing Complexity As an E-type system is changed its complexity increases and becomes more difficult to evolve unless work is done to maintain or reduce the complexity. (III) Law of Self Regulation Global E-type system evolution is feedback regulated.

3 (IV) Law of Conservation of Organisational Stability The work rate of an organisation evolving an E-type software system tends to be constant over the operational lifetime of that system or phases of that lifetime. (V) Law of Conservation of Familiarity In general, the incremental growth (growth ratio trend) of E-type systems is constrained by the need to maintain familiarity. (VI) Law of Continuing Growth The functional capability of E-type systems must be continually enhanced to maintain user satisfaction over system lifetime. (VII) Law of Declining Quality Unless rigorously adapted and evolved to take into account changes in the operational environment, the quality of an E-type system will appear to be declining. (VIII) Law of Feedback System E-type evolution processes are multi-level, multi-loop, multi-agent feedback systems. Before proceeding with our study, we present a first apodosis of the laws, taking into consideration both the wording of the laws, but most importantly their accompanying explanations [3]. An E-Type software system continuously changes over time (I) obeying a complex feedback-based evolution process (VIII). On the one hand, due to the need for growth and adaptation that acts as positive feedback, this process results in an increasing functional capacity of the system (VI), produced by a growth ratio that is slowly declining in the long term (V). The process is typically guided by a pattern of growth that demonstrates its self-regulating nature: growth advances smoothly; still, whenever there are excessive deviations from the typical, baseline rate of growth (either in a single release, or accumulated over time), the evolution process obeys the need for calibrating releases of perfective maintenance (expressed via minor growth and demonstrating negative feedback) to stop the unordered growth of the system s complexity (III). On the other hand, to regulate the ever-increasing growth, there is negative feedback in the system controlling both the overall quality of the system (VII), with particular emphasis to its internal quality (II). The effort consumed for the above process is typically constant over phases, with the phases disrupted with bursts of effort from time to time (IV). 3 Experimental Setup of the Study Datasets. We have studied eight database schemata from open-source software projects. ATLAS 1 is a particle physics experiment at CERN, with the goal of learning about the basic forces that have shaped our universe famously known for the attempt on the Higgs boson. BioSQL 2 is a generic relational model covering sequences, features, sequence and feature annotation, a reference taxonomy, and ontologies from various sources such as GenBank or Swissport. Ensembl is Page

4 a joint scientific project between the European Bioinformatics Institute (EBI) 3 and the Wellcome Trust Sanger Institute (WTSI) 4 which was launched in 1999 in response to the imminent completion of the Human Genome Project. The goal of Ensembl was to automatically annotate the three billion base pairs of sequences of the genome, integrate this annotation with other available biological data and make all this publicly available via the web. MediaWiki 5 was first introduced in early 2002 by the Wikimedia Foundation along with Wikipedia, and hosts Wikipedia s content since then. Coppermine 6 is a photo gallery web application. OpenCart 7 is an open source shopping cart system. PhpBB 8 is an Internet forum package. TYPO3 9 is a free and open source web content management framework. Dataset Collection and Processing. A first collection of links to available datasets was made by the authors of [9], [10] 10 ; for this, these authors deserve honorable credit. We isolated eight databases that appeared to be alive and used (as already mentioned, some of them are actually quite prominent). For each dataset we gathered as many schema versions (DDL files) as we could from their public source code repositories (cvs, svn, git). We have targeted main development branches and trunks to maximize the validity of the gathered resources. We are interested only on changes of the database part of the project as they are integrated in the trunk of the project. Hence, we collected all the versions of the database, committed at the trunk or master branch, and ignored all other branches of the project. We collected the files during June For all of the projects, we focused on their release for MySQL (except ATLAS Trigger, available only for Oracle). The files were then processed by sequential pairs from our tool, Hecate, that allows the detection of (a) changes at the attribute level, and specifically, attributes inserted, deleted, having a changed data type, or participation in a changed primary key, and (b) changes at the relation level, with relations inserted and deleted, in a fully automated way. Hecate, was then used to give us (a) the differences between two subsequent committed versions, and (b) the measures we needed to conduct this study for example, the size of the schema (in number of tables and attributes), the total number of changes for each transition from a version to the next, which we also call heartbeat, or the growth assessed as the difference in the size of the schema between subsequent versions. Hecate, along with all the data sets and our results are available at our group s public repository

5 Fig. 1. Combined demonstration of heartbeat (number of changes per version) and schema size (no. of tables) for Coppermine and Ensemble. The left axis signifies the amount of change and the right axis the number of tables. 4 Assessing the Laws for Schema Evolution The laws of software evolution where developed and reshaped over forty years. Explaining each law in isolation from the others is precarious as it risks losing the deeper essence and inter-dependencies of the laws [3]. To this end, in this section, we organize the laws in three thematic areas of the overall evolution management mechanism that they reveal. The first group of laws discusses the existence of a feedback mechanism that constrains the uncontrolled evolution of software. The second group discusses the properties of the growth of the system,

6 Fig. 2. Combined demonstration of heartbeat (number of changes per version) and schema size (no. of tables) for Atlas and BioSQL. The left axis signifies the amount of change and the right axis the number of tables. i.e., the part of the evolution mechanism that accounts for positive feedback. The third group of laws discusses the properties of perfective maintenance that constrains the uncontrolled growth, i.e., the part of the evolution mechanism that accounts for negative feedback. 4.1 Is There a Feedback-based System for Schema Evolution? Law of continuing change (Law I) The main argument of the first law is that the schema continuously changes over time. To validate the hypothesis that the law of continuing change holds, we study the heartbeat of the schema s life (see Fig. 1 and 2 for a combined demonstration of heartbeat and schema size). With the exception of BioSQL that appeared to be sleeping for some years and was later re-activated, in all other cases, we have changes (sometimes moderate, sometimes even excessive) over the entire lifetime of the database schema. An important observation stemming from the visual inspection of our changeover-time data, is that the term continually in the law s definition is challenged: we observe that database schema evolution happens in bursts, in grouped periods of evolutionary activity, and not as a continuous process! Take into account that the versions with zero changes are versions where either commenting and beautification takes place, or the changes do not refer to the information capacity of the schema (relations attributes and constraints) but rather, they concern the physical level properties (indexes, storage engines, etc) that pertain to performance aspects of the database. Can we state that this stillness makes the schema unsatisfactory (referring back to the wording of the first law by Lehman)? We believe that the answer to the question is negative: since the system hosting the database continues to be in use, user dissatisfaction would actually call for continuous growth of the database, or eventual rejection of the system. This does not happen. On the other hand, our explanation relies on the reference nature of the database

7 in terms of software architecture: if the database evolves, the rest of the code, which is basically using the database (and not vice versa), breaks! Overall, if we account for the exact wording of the law, we conclude that the law partially holds. Law of feedback system (Law VIII) The wording of Law VIII refers to the existence of a self-stabilizing feedback mechanism that governs evolution. Its experimental evaluation typically refers to the possibility of demonstrating adherence to a basic formula of feedback, by estimating the size of the system (here: in terms of number of relations) accurately i.e., with small error compared to the actual values. The formula typically used [2] is: Ŝi = Ŝi 1 + E, where Ŝ refers to the estimated system size and E is a model parameter approximating effort (actually obtained as the average value of a set of past assessments of E). Related literature [2] suggests computing E as the average value of individual E i, one per transition. Then, we need to estimate these individual effort approximations. [2] suggests two formulae that we generalize here as follows: E i = si sα, where s i refers to the actual size of the schema at version i i 1 1 j=α s 2 j and α refers to the version from which counting starts. Specifically, [2] suggests two values for α, specifically (i) 1 (the first version) and (ii) s i 1 (the previous version). We now move on to discuss what seems to work and what not for the case of schema evolution. We will use the OpenCart data set as a reference example; however, all datasets demonstrate exactly the same behavior. First, we assessed the formulae of [2]. In this case, we compute the average E of the individual E i over the entire dataset. We employ four different values for α, specifically 1, 5, 10, and n, with n being the entire data set size, and depict the result in Fig. 3, where the actual size is represented by the blue solid line. The results indicate that the approximation modestly succeeds in predicting an overall increasing trend for all four cases, and, in fact, all four approximations targeted towards predicting an increasing tendency that the actual schema does not demonstrate. At the same time, all four approximations fail to capture the individual fluctuations within the schema lifetime. A better estimation occurred when we realized that back in 1997 people considered that the parameter E was constant over the entire lifetime of the project; however, later observations (see [3]) led to the revelation that the project was split in phases. So, for every version i, we compute E as an average over the last τ E j values, with small values for τ (1/5/10). As we can see in Fig. 3, the idea of computing the average E with a short memory of 5 or 10 versions produced extremely accurate results. This holds for all data sets. This observation also suggests that, if the phases that [3] mentioned actually exist for the case of database schema, they are really small and a memory of 5-10 versions is enough to produce very accurate results. Overall, the evolution of the database schema appears to obey the behavior of a feedback-based mechanism, as the schema size of a certain version of the Ŝ 2 i 1

8 Fig. 3. Actual and estimated schema size via a total (left) and a bounded (right) average of individual E i for OpenCart; the x-axis shows the version id. database can be accurately estimated via a regressive formula that exploits the amount of changes in recent, previous versions. Law of self-regulation (Law III). Whereas the law simply states that the evolution of software is feedback regulated, its experimental validation in the area of software systems is typically supported by the observation of a recurring pattern of smooth expansion of the system s size(a.k.a. baseline growth), that is interrupted with releases of perfective maintenance with size reductions or with releases of growth. Moreover, due to a previous wording of the law (e.g., see [2]) that described change to follow a normal distribution, the experimental assessment included the validation of whether growth demonstrates oscillations around the average value [1,2,3]. Size. The evolution of size can be observed in Fig. 1 and 2. We have to say that we simply do not detect the same behaviour that Lehman did (contrast Fig. 1, 2 to the respective figures of articles [1] and [2]): in sharp contrast to the smooth baseline growth that Lehman has highlighted, the evolution of the size of the studied database schemata provides a landscape with a large variety of sequences of the following three fundamental behaviors. In all schemata, we can see periods of increase, especially at the beginning of their lifetime or after a large drop in the schema size. This is an indication of positive feedback, i.e., the need to expand the schema to cover the information needs of the users. In all schemata, there are versions with drops in schema size. Those drops are typically sudden and steep and usually take place in short periods of time. Sometimes, in fact, these drops are of significantly larger size than the typical change. We can safely say that the existence of these drops in the schema size indicate perfective maintenance and thus, the existence of a negative feedback mechanism in the evolution process.

9 Fig. 4. Growth for Coppermine and Ensembl (over version id, concealed for fig. clarity); the slowly dropping solid line shows the linear interpolation of the values In all schemata, there are periods of stability (i.e., size stays still, or nearstill). Growth. Growth (i.e., the difference in the size between two subsequent versions) in all datasets has the following broad characteristics (Fig. 4, 6). In terms of tables, in most cases, growth is small (typically ranging within 0 and 1), and moderately small when it comes to attributes. We have too many occurrences of zero growth, typically iterating between small non-zero growth and zero growth. Due to perfective maintenance, we also have negative values of growth (less than the positive ones). We do not have a constant flow of versions where the schema size is continuously changing; rather, we have small spikes between one and zero. Thus, we have to state that the growth comes with a pattern of spikes. Due to this characteristic, the average value is typically very close to zero (on the positive side) in all datasets, both for tables and attributes. There are few cases of large change too; we forward the reader to Law V for a discussion of their characteristics. We would like to put special emphasis to the observation that change is small. In terms of tables, growth is mostly bounded in small values. This is not directly obvious in the charts, because they show the ripples; however, almost all numbers are in the range of [-2..2] in fact, mostly in the range [0..2]. Few abrupt changes occur. In terms of attributes, the numbers are higher, of course, and depend on the dataset. Typically those values are bounded within [-20,20]. However, the deviations from this range are not many. In the course of our deliberations, we have observed a pattern common in all datasets: there is a Zipfian model in the distribution of frequencies. Observe Fig. 5 that comes with two parts, both depicting how often a growth value appears in the attributes of Ensemble. The x-axis keeps the delta size and the y-axis the number of occurrences of this delta. In the left part we include zeros in the counting (343 occurrences out of 528 data points) and in the right part we exclude them (to show that the power law does not hold only for the most popular value). We observe that there is a small range of deltas, between -2 and

10 Fig. 5. Frequency of change values for Ensembl attributes 4 that takes up 450 changes out of the 528. This means that, despite the large outliers, change is strongly biased towards small values close to zero. Despite the fact that change does not follow the pattern of baseline smooth growth of Lehman and the fact that change obeys a Zipfian distribution with a peak at zero, we have to say that the presence of feedback in the evolution process is evident; thus the law holds. 4.2 Properties of Growth for Schema Evolution Law of continuing growth (Law VI). The sixth law of continuing growth requires us to verify whether the information capacity of the system (schema size) continuously grows. In all occasions, the schema size increases in the long run (Fig. 1, 2). We frequently observe some shrinking events in the timeline of schema growth in all data sets. However, all data sets demonstrate the tendency to grow over time. However, we also have differences from the traditional software systems that the law studies: as with Law I, the term continually is questionable. As already mentioned (refer to Law III and Fig. 1, 2), change comes with frequent (and sometimes long) periods of stability, where the size of the schema does not change (or changes very little). Therefore we can conclude that the law holds, albeit modified to accommodate the particularities of database schemata. Law of conservation of familiarity (Law V). A first question, of central interest for the fifth law s intuition is: What happens after excessive changes? Do we observe small ripples of change, showing the absorbing of the change s impact in terms of corrective maintenance and developer acquaintance with the new version of the schema? An accompanying question, typically encountered in the literature, is: What is the effect of age over the growth and the growth ratio of the schema? Is it slowly declining, constant or oblivious to age? Again, we would like to remind the reader on the properties of growth, discussed in Law III of self-regulation: the changes are small, come with spike patterns between zero and non-zero deltas and the average value of growth is very close to zero.

11 Fig. 6. Different patterns of change in attribute growth of Mediawiki (over version-id, concealed for fig. clarity) Concerning the ripples after large changes, we can detect several patterns. Observe Fig. 6, depicting attribute growth for the MediaWiki dataset. Due to the fact that this involves the growth of attributes, the phenomena are amplified compared to the case of tables. Reading from right to left, we can see that there are indeed cases where a large spike is followed by small or no changes (case 1). However, within the small pool of large changes that exist overall, it is quite frequent to see sequences of large oscillations one after the other, and quite frequently being performed around zero too (case 2). In some occurrences, we see both (case 3). Concerning the effect of age, we do not see a diminishing trend in the values of growth; however, age results to a reduction in the density of changes and the frequency of non-zero values in the spikes. This explains the drop of the average value in almost all the studied data sets (Fig. 4): the linear interpolation drops; however, this is not due to the decrease of the height of the spikes, but due to the decrease of their density. The heartbeat of the systems tells a similar story: typically, change is quite more frequent in the beginning, despite the fact that existence of large changes and dense periods of activities can occur in any period of the lifetime. Fig. 1 and 2 clearly demonstrate this by combining schema size and activity. This trend is typical for almost all of the studied databases (with phpbb being the only exception, demonstrating increased activity in its latest versions with the schema size oscillating between 60 and 63 tables). Concerning the validity of the law, we believe that the law is possible but not confirmed. The law states that the growth is constrained by the need to maintain familiarity. However, the peculiarity of databases, compared to typical software systems, is that there are other good reasons to constrain growth: (a)

12 a high degree of dependence of other modules from the database, and, (b) an intense effort to make the database clean and organized. Therefore, conservation of familiarity, although important cannot solely justify the limited growth. The extent of the contribution of each reason is unclear. Law of conservation of organizational stability (Law IV). To validate the hypothesis that the law of conservation of organizational stability holds, we need to establish that the project s lifetime is divided in phases, each of which (a) demonstrates a constant growth, and, (b) is connected to the next phase with an abrupt change. Moreover, abrupt changes should occur from time to time and not all the time (resulting in extremely short phases). If we focus on the essence of the law, we can safely say that it does not hold. The heartbeats of Fig. 1 and 2 and the arbitrary sequencing of spikes and stability (Fig. 4, 6) make it impossible to speak about constant growth, even in phases. The open-source nature of our cases plays a role to that too. 4.3 Perfective Maintenance for Schema Evolution Law of increasing complexity (Law II). The law states that complexity increases with age, unless effort is taken to prevent this nevertheless, the law does not prescribe a clear assessment method for its validity. The rationale behind verifying the law dictates the observation of (a) an increasing trend in complexity of a software system, battled by (b) a perfective maintenance activity that attempts to reduce it and demonstrated by drops in the system size and rate of expansion. As there is no precise definition and measurement of complexity in the law, different metrics have been employed (coupling, cyclomatic complexity, etc. see [6] for a review). Unfortunately, as most of these metrics are non-applicable to the case of databases, we take a definition already found in Lehman [1]: complexity is defined as the number of modules handled (in our case tables added or modified) over the absolute value of growth per transition. This formula approximates how much effort has been invested in expanding the system over the actual difference achieved (large values demonstrate too much effort for too small change). Related literature typically speaks for increasing complexity [1], [2], [3], [6], although there have been counterarguments for the case of open source software [5]. In our case, in all the datasets but Biosql, complexity, as defined in the previous paragraph, does not increase (Fig. 7). The phenomenon must be coupled with the drop in change density (Law V) and although we cannot provide undisputable explanation, we offer the synergy of two causes: (a) the increasing dependence of the surrounding code to the database that makes developers more cautious to perform schema changes as they incur higher maintenance costs, and, (b) the success of the perfective maintenance, which results in a clean schema, requiring less corrective maintenance in the future. Although we cannot confirm or disprove the law based on undisputed objective measurements, we have indications that the second law partially holds, albeit with

13 completely different connotations than the ones reported by Lehman for typical software systems: in the case of database schemata, complexity, when measured as the fraction of expansion effort over actual growth, drops. Law of declining quality (Law VII) The seventh law postulates that quality declines with age unless the system is rigorously adapted to its external environment. Lehman and Fernandez-Ramil [3] avoid both (a) a definition of quality the definition, measurement, modelling and monitoring of software qualityrelated characteristics are very dependent on application, organisation, product and process characteristics and goals, and, (b) giving any other support to the law than a logical proof: as the system expands over time, its complexity rises and thus the addressing of user requirements and removal of defects becomes more and more difficult, unless work is done to confront the phenomenon ( the decline in software quality with age, appears to relate to a growth in complexity that must be associated with ageing ). We have already demonstrated that the rationale behind complexity increase is not supported by our observations. At the same time, we cannot assess schema quality with undisputed means. Therefore, we cannot confirm or disprove the law based on undisputed objective measurements. 5 Discussion In this section, we summarize fundamental observations and patterns that have been detected in our study. We intentionally avoid the term law, as we do not have unshakeable evidence for their explanation. Apart from the empirical grounding, due a very large amount of datasets that obey the same patterns (which we believe we have fairly attained), we would require an undisputed rationalized grounding, that can be obtained via a clear explanation of the underlying mechanism that guides them, also established on measured, undisputed data. Feedback-based Behavior for Schema Evolution. As an overall trend, the information capacity of the database schema is enhanced i.e., the size Fig. 7. Complexity for Coppermine and Ensembl (over version-id, concealed for clarity)

14 grows in the long term (VI). The existence of perfective maintenance is evident in almost all datasets with the existence of relation and attributes removals, as well as observable drops in growth and size of the schema (sometimes large ones). In fact, growth frequently oscillates between positive and negative values (III). The schema size of a certain version of the database can be accurately estimated via a regressive formula that exploits the amount of changes in recent, previous versions (VIII). Based on the above, we can state that the essence of Lehman s laws applies to open-source databases too: Schema evolution demonstrates the behavior of a feedback-regulated system, as it obeys the antagonism between the need for expanding its information capacity to address user needs and the need to control the unordered expansion, with perfective maintenance. Observations concerning the heartbeat of change. The database is not continuously adapted, but rather, alterations occur from time to time, both in terms of versions and in terms of time (I). Change does not follow patterns of constant behaviour (IV). Age results in a reduction of the density of changes to the database schema in most cases (V). Schema growth is small (observations): Growth is typically small in the evolution of database schemata, compared to traditional software systems (III). The distribution of occurrences of the amount of schema change follows a Zipfian distribution, with a predominant amount of zero growth in all data sets. Plainly put, there is a very large amount of versions with zero growth, both in the case of attributes and in the case of tables. The rest of the frequently occurring values are close to zero, too. The average value of growth is typically close to zero (although positive) (III) and drops with time, mainly due to the drop in change density (V). Threats to validity. We start with a fundamental inquiry: are databases E-type systems, so that this research is meaningful in the first place? Despite a fundamental difference (as databases involve information and not functional capacity), databases, closely resemble E-type systems as they address the real problem of query answering, come with their own user community (developers, DBA s), and act as fairly independent modules in information systems. Concerning the external validity of our study, its context concerns the study of the evolution of the logical schema of databases in open-source software. We avoid generalizing our findings to databases operating in closed environments and we stress that our study has focused only on the logical structure of databases, avoiding physical properties (let alone instance-level observations). Overall, we believe we have provided a safe, representative experiment with a significant number of schemata, having different purposes in the real world and time span (from rather few (40) to numerous (500+) versions). Our findings are generally consistent (with few exceptions that we mentioned). Concerning internal validity and cause-effect relationships, we avoid directly relating age with phenomena like the dropping density of changes or the size growth; on the contrary, we attribute the phenomena to a confounding variable, perfective maintenance actions, which we anticipate to be causing the observed behavior. When it comes to construct validity, all the measures we have employed are accurate, consistent with the metrics used in the related literature and appropriate for assessing the law to

15 which they are employed. The only exceptions to this statement are Laws II and VII dealing with the complexity and the quality of the schemata. Both terms are very general and the related database literature does not really provide adequate metrics other than size-related (which we deem too simple for our purpose); our own measurement of complexity requires deeper investigation. Therefore, the undisputed assessment of these laws remains open. Future Work. The extension of the study to more datasets, possibly nonrelational too, and the study of databases in closed environments for large periods of time, are possible roads for future research. Concerning the current findings of our study, the detailed understanding of the feedback mechanism, especially when it comes to ageing and complexity (Law II) as well as patterns of growth (Laws III and V), or patterns in the heartbeat of the evolution, are open issues worth investigating. Acknowledgment. This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program Education and Lifelong Learning of the National Strategic Reference Framework (NSRF) - Research Funding Program: Thales. Investing in knowledge society through the European Social Fund. References 1. Belady, L.A., Lehman, M.M.: A model of large program development. IBM Systems Journal 15(3) (1976) Lehman, M.M., Ramil, J.F., Wernick, P., Perry, D.E., Turski, W.M.: Metrics and laws of software evolution - the nineties view. In: 4th IEEE International Software Metrics Symposium (METRICS 1997). (1997) Lehman, M.M., Fernandez-Ramil, J.C.: Rules and Tools for Software Evolution Planning and Management. In: Software Evolution and Feedback: Theory and Practice. John Wiley and Sons Ltd (2006) ISBN-13: Xing, Z., Stroulia, E.: Analyzing the evolutionary history of the logical design of object-oriented software. IEEE Trans. Software Eng. 31(10) (2005) Fernández-Ramil, J., Lozano, A., Wermelinger, M., Capiluppi, A.: Empirical studies of open source evolution. In: Software Evolution. (2008) Xie, G., Chen, J., Neamtiu, I.: Towards a better understanding of software evolution: An empirical study on open source software. In: 25th IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, Alberta, Canada. (2009) Sjøberg, D.: Quantifying schema evolution. Information and Software Technology 35(1) (1993) Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Metrics for the prediction of evolution impact in etl ecosystems: A case study. J. Data Semantics 1(2) (2012) Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in wikipedia: toward a web information system benchmark. In: Proceedings of ICEIS 2008, Citeseer (2008) 10. Curino, C.A., Moon, H.J., Zaniolo, C.: Graceful database schema evolution: the prism workbench. Proceedings of the VLDB Endowment 1 (2008)

Growing up with Stability: how Open-Source Relational Databases Evolve

Growing up with Stability: how Open-Source Relational Databases Evolve Growing up with Stability: how Open-Source Relational Databases Evolve Ioannis Skoulis a,1, Panos Vassiliadis b, Apostolos V. Zarras b a Opera, Helsinki, Finland b University of Ioannina, Ioannina, Hellas

More information

Open-Source Databases: Within, Outside, or Beyond Lehman's Laws of Software Evolution?

Open-Source Databases: Within, Outside, or Beyond Lehman's Laws of Software Evolution? This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic

More information

Gravitating to Rigidity: Patterns of Schema Evolution -and its Absence- in the Lives of Tables

Gravitating to Rigidity: Patterns of Schema Evolution -and its Absence- in the Lives of Tables Gravitating to Rigidity: Patterns of Schema Evolution -and its Absence- in the Lives of Tables Panos Vassiliadis a, Apostolos V. Zarras a, Ioannis Skoulis b,1 a University of Ioannina, Ioannina, Hellas

More information

Software Evolution: An Empirical Study of Mozilla Firefox

Software Evolution: An Empirical Study of Mozilla Firefox Software Evolution: An Empirical Study of Mozilla Firefox Anita Ganpati Dr. Arvind Kalia Dr. Hardeep Singh Computer Science Dept. Computer Science Dept. Computer Sci. & Engg. Dept. Himachal Pradesh University,

More information

Survival in schema evolution: putting the lives of survivor and dead tables in counterpoint

Survival in schema evolution: putting the lives of survivor and dead tables in counterpoint Survival in schema evolution: putting the lives of survivor and dead tables in counterpoint Panos Vassiliadis and Apostolos V. Zarras Department of Computer Science and Engineering, University of Ioannina,

More information

Schema Evolution Survival Guide for Tables: Avoid Rigid Childhood and You re En Route to a Quiet Life

Schema Evolution Survival Guide for Tables: Avoid Rigid Childhood and You re En Route to a Quiet Life Noname manuscript No. (will be inserted by the editor) Schema Evolution Survival Guide for Tables: Avoid Rigid Childhood and You re En Route to a Quiet Life Panos Vassiliadis Apostolos V. Zarras This is

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

How is Life for a Table in an Evolving Relational Schema? Birth, Death & Everything in Between

How is Life for a Table in an Evolving Relational Schema? Birth, Death & Everything in Between This work was partially supported from the European Community's FP7/2007-2013 under grant agreement number 257178 (project CHOReOS). How is Life for a Table in an Evolving Relational Schema? Birth, Death

More information

Guidelines for the application of Data Envelopment Analysis to assess evolving software

Guidelines for the application of Data Envelopment Analysis to assess evolving software Short Paper Guidelines for the application of Data Envelopment Analysis to assess evolving software Alexander Chatzigeorgiou Department of Applied Informatics, University of Macedonia 546 Thessaloniki,

More information

COPYRIGHTED MATERIAL. Introduction. Chapter 1

COPYRIGHTED MATERIAL. Introduction. Chapter 1 Chapter 1 Introduction Performance Analysis, Queuing Theory, Large Deviations. Performance analysis of communication networks is the branch of applied probability that deals with the evaluation of the

More information

Categorizing Migrations

Categorizing Migrations What to Migrate? Categorizing Migrations A version control repository contains two distinct types of data. The first type of data is the actual content of the directories and files themselves which are

More information

Empirical Study on Impact of Developer Collaboration on Source Code

Empirical Study on Impact of Developer Collaboration on Source Code Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca

More information

Automatic Identification of Important Clones for Refactoring and Tracking

Automatic Identification of Important Clones for Refactoring and Tracking Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,

More information

Software Evolution. Dr. James A. Bednar. With material from

Software Evolution. Dr. James A. Bednar.  With material from Software Evolution Dr. James A. Bednar jbednar@inf.ed.ac.uk http://homepages.inf.ed.ac.uk/jbednar With material from Massimo Felici, Conrad Hughes, and Perdita Stevens SAPM Spring 2012: Evolution 1 Software

More information

Implementing ITIL v3 Service Lifecycle

Implementing ITIL v3 Service Lifecycle Implementing ITIL v3 Lifecycle WHITE PAPER introduction GSS INFOTECH IT services have become an integral means for conducting business for all sizes of businesses, private and public organizations, educational

More information

Sample Exam. Advanced Test Automation - Engineer

Sample Exam. Advanced Test Automation - Engineer Sample Exam Advanced Test Automation - Engineer Questions ASTQB Created - 2018 American Software Testing Qualifications Board Copyright Notice This document may be copied in its entirety, or extracts made,

More information

An Archiving System for Managing Evolution in the Data Web

An Archiving System for Managing Evolution in the Data Web An Archiving System for Managing Evolution in the Web Marios Meimaris *, George Papastefanatos and Christos Pateritsas * Institute for the Management of Information Systems, Research Center Athena, Greece

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Evaluating the Evolution of a C Application

Evaluating the Evolution of a C Application Evaluating the Evolution of a C Application Elizabeth Burd, Malcolm Munro Liz.Burd@dur.ac.uk The Centre for Software Maintenance University of Durham South Road Durham, DH1 3LE, UK Abstract This paper

More information

Training & Documentation. Different Users. Types of training. Reading: Chapter 10. User training (what the system does)

Training & Documentation. Different Users. Types of training. Reading: Chapter 10. User training (what the system does) Training & Documentation Reading: Chapter 10 Different Users Types of training User training (what the system does) Operator training (how the system works) Special training needs: new users vs. brush-up

More information

Chapter 5. Track Geometry Data Analysis

Chapter 5. Track Geometry Data Analysis Chapter Track Geometry Data Analysis This chapter explains how and why the data collected for the track geometry was manipulated. The results of these studies in the time and frequency domain are addressed.

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY

TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY LINGUACULTURE, 1, 2010 TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY Nancy Matis Abstract This article briefly presents an overview of the author's experience regarding the

More information

From Craft to Science: Rules for Software Design -- Part II

From Craft to Science: Rules for Software Design -- Part II From Craft to Science: Rules for Software Design -- Part II by Koni Buhrer Software Engineering Specialist Rational Software Developing large software systems is notoriously difficult and unpredictable.

More information

Algorithms for Event-Driven Application Brownout

Algorithms for Event-Driven Application Brownout Algorithms for Event-Driven Application Brownout David Desmeurs April 3, 2015 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Johan Tordsson, Co-Supervisor: Cristian Klein Examiner:

More information

Information systems design: a procedural approach

Information systems design: a procedural approach Information systems design: a procedural approach G. Haramis 1, G. Pavlidis 2, Th. Fotiadis 1, Ch. Vassiliadis 1 & Ch. Tsialtas 1 University of Macedonia 2 University of Patras Abstract The procedure of

More information

Whitepaper Spain SEO Ranking Factors 2012

Whitepaper Spain SEO Ranking Factors 2012 Whitepaper Spain SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Simulating Growth of Transportation Networks

Simulating Growth of Transportation Networks The Eighth International Symposium on Operations Research and Its Applications (ISORA 09) Zhangjiajie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 348 355 Simulating Growth of Transportation

More information

M. Xie, G. Y. Hong and C. Wohlin, "A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction", Quality and Reliability

M. Xie, G. Y. Hong and C. Wohlin, A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction, Quality and Reliability M. Xie, G. Y. Hong and C. Wohlin, "A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction", Quality and Reliability Engineering International, Vol.13, pp. 247-353, 1997. 1

More information

Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy. Miroslav Bures and Bestoun S. Ahmed

Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy. Miroslav Bures and Bestoun S. Ahmed 1 Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy Miroslav Bures and Bestoun S. Ahmed arxiv:1802.08005v1 [cs.se] 22 Feb 2018 Abstract Executing various sequences of system

More information

Managing Changes to Schema of Data Sources in a Data Warehouse

Managing Changes to Schema of Data Sources in a Data Warehouse Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Managing Changes to Schema of Data Sources in

More information

Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design

Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design Viet C. Trinh vtrinh@isl.ucf.edu Gregory A. Holifield greg.holifield@us.army.mil School of Electrical Engineering and

More information

IMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING

IMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING IMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING Pardeep kaur 1 and Er. Rupinder Singh 2 1 Research Scholar, Dept. of Computer Science and Engineering, Chandigarh University, Gharuan, India (Email: Pardeepdharni664@gmail.com)

More information

Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism

Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism V.Narasimha Raghavan, M.Venkatesh, Divya Sridharabalan, T.Sabhanayagam, Nithin Bharath Abstract In our paper, we are utilizing

More information

Whitepaper US SEO Ranking Factors 2012

Whitepaper US SEO Ranking Factors 2012 Whitepaper US SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics Inc. 1115 Broadway 12th Floor, Room 1213 New York, NY 10010 Phone: 1 866-411-9494 E-Mail: sales-us@searchmetrics.com

More information

turning data into dollars

turning data into dollars turning data into dollars Tom s Ten Data Tips November 2008 Neural Networks Neural Networks (NNs) are sometimes considered the epitome of data mining algorithms. Loosely modeled after the human brain (hence

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

SYSTEM CONCEPTS. Definition of a System

SYSTEM CONCEPTS. Definition of a System 2 SYSTEM CONCEPTS A system is a group of interrelated components working together towards a common goal, by accepting inputs and producing outputs in an organized transformation process. The interrelated

More information

Scope and Sequence for the New Jersey Core Curriculum Content Standards

Scope and Sequence for the New Jersey Core Curriculum Content Standards Scope and Sequence for the New Jersey Core Curriculum Content Standards The following chart provides an overview of where within Prentice Hall Course 3 Mathematics each of the Cumulative Progress Indicators

More information

Guarding the Quality of Your Data, Your Most Valued Asset

Guarding the Quality of Your Data, Your Most Valued Asset Guarding the Quality of Your Data, Your Most Valued Asset Introduction Over the last decade or so, organizations have increasingly harnessed Business Intelligence to positively impact top line growth.

More information

LDO Current Foldback/Current Limiter Circuits

LDO Current Foldback/Current Limiter Circuits Technical Information Paper No.0030(Ver.01) LDO Current Foldback/Current Limiter Circuits INTRODUCTION This TIP is aimed at improving the knowledge of engineers and sales people, as it is not really possible

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315 (S)LOC Count Evolution for Selected OSS Projects Tik Report 315 Arno Wagner arno@wagner.name December 11, 009 Abstract We measure the dynamics in project code size for several large open source projects,

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Whitepaper Italy SEO Ranking Factors 2012

Whitepaper Italy SEO Ranking Factors 2012 Whitepaper Italy SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Impact of Dependency Graph in Software Testing

Impact of Dependency Graph in Software Testing Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,

More information

Maintaining Mutual Consistency for Cached Web Objects

Maintaining Mutual Consistency for Cached Web Objects Maintaining Mutual Consistency for Cached Web Objects Bhuvan Urgaonkar, Anoop George Ninan, Mohammad Salimullah Raunak Prashant Shenoy and Krithi Ramamritham Department of Computer Science, University

More information

NATIONAL CYBER SECURITY STRATEGY. - Version 2.0 -

NATIONAL CYBER SECURITY STRATEGY. - Version 2.0 - NATIONAL CYBER SECURITY STRATEGY - Version 2.0 - CONTENTS SUMMARY... 3 1 INTRODUCTION... 4 2 GENERAL PRINCIPLES AND OBJECTIVES... 5 3 ACTION FRAMEWORK STRATEGIC OBJECTIVES... 6 3.1 Determining the stakeholders

More information

Conclusions. Chapter Summary of our contributions

Conclusions. Chapter Summary of our contributions Chapter 1 Conclusions During this thesis, We studied Web crawling at many different levels. Our main objectives were to develop a model for Web crawling, to study crawling strategies and to build a Web

More information

Method to Study and Analyze Fraud Ranking In Mobile Apps

Method to Study and Analyze Fraud Ranking In Mobile Apps Method to Study and Analyze Fraud Ranking In Mobile Apps Ms. Priyanka R. Patil M.Tech student Marri Laxman Reddy Institute of Technology & Management Hyderabad. Abstract: Ranking fraud in the mobile App

More information

Recording end-users security events: A step towards increasing usability

Recording end-users security events: A step towards increasing usability Section 1 Network Systems Engineering Recording end-users security events: A step towards increasing usability Abstract D.Chatziapostolou and S.M.Furnell Network Research Group, University of Plymouth,

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for

More information

EVALUATION OF THE USABILITY OF EDUCATIONAL WEB MEDIA: A CASE STUDY OF GROU.PS

EVALUATION OF THE USABILITY OF EDUCATIONAL WEB MEDIA: A CASE STUDY OF GROU.PS EVALUATION OF THE USABILITY OF EDUCATIONAL WEB MEDIA: A CASE STUDY OF GROU.PS Turgay Baş, Hakan Tüzün Hacettepe University (TURKEY) turgaybas@hacettepe.edu.tr, htuzun@hacettepe.edu.tr Abstract In this

More information

Thwarting Traceback Attack on Freenet

Thwarting Traceback Attack on Freenet Thwarting Traceback Attack on Freenet Guanyu Tian, Zhenhai Duan Florida State University {tian, duan}@cs.fsu.edu Todd Baumeister, Yingfei Dong University of Hawaii {baumeist, yingfei}@hawaii.edu Abstract

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Practical Design of Experiments: Considerations for Iterative Developmental Testing

Practical Design of Experiments: Considerations for Iterative Developmental Testing Practical Design of Experiments: Considerations for Iterative Developmental Testing Best Practice Authored by: Michael Harman 29 January 2018 The goal of the STAT COE is to assist in developing rigorous,

More information

INSPIRE and SPIRES Log File Analysis

INSPIRE and SPIRES Log File Analysis INSPIRE and SPIRES Log File Analysis Cole Adams Science Undergraduate Laboratory Internship Program Wheaton College SLAC National Accelerator Laboratory August 5, 2011 Prepared in partial fulfillment of

More information

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar, Youngstown State University Bonita Sharif, Youngstown State University Jenna Wise, Youngstown State University Alyssa Pawluk, Youngstown

More information

The Linux Kernel as a Case Study in Software Evolution

The Linux Kernel as a Case Study in Software Evolution The Linux Kernel as a Case Study in Software Evolution Ayelet Israeli Dror G. Feitelson Department of Computer Science The Hebrew University, 9194 Jerusalem, Israel Abstract We use over 8 versions of the

More information

6.871 Expert System: WDS Web Design Assistant System

6.871 Expert System: WDS Web Design Assistant System 6.871 Expert System: WDS Web Design Assistant System Timur Tokmouline May 11, 2005 1 Introduction Today, despite the emergence of WYSIWYG software, web design is a difficult and a necessary component of

More information

Section of Achieving Data Standardization

Section of Achieving Data Standardization Section of Achieving Data Standardization Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie, Maryland 20716 Tele: 301-249-1142 Email: mmgorman@wiscorp.com Web: www.wiscorp.com 1 Why Data

More information

A4.8 Fitting relative potencies and the Schild equation

A4.8 Fitting relative potencies and the Schild equation A4.8 Fitting relative potencies and the Schild equation A4.8.1. Constraining fits to sets of curves It is often necessary to deal with more than one curve at a time. Typical examples are (1) sets of parallel

More information

A nodal based evolutionary structural optimisation algorithm

A nodal based evolutionary structural optimisation algorithm Computer Aided Optimum Design in Engineering IX 55 A dal based evolutionary structural optimisation algorithm Y.-M. Chen 1, A. J. Keane 2 & C. Hsiao 1 1 ational Space Program Office (SPO), Taiwan 2 Computational

More information

Lotus Sametime 3.x for iseries. Performance and Scaling

Lotus Sametime 3.x for iseries. Performance and Scaling Lotus Sametime 3.x for iseries Performance and Scaling Contents Introduction... 1 Sametime Workloads... 2 Instant messaging and awareness.. 3 emeeting (Data only)... 4 emeeting (Data plus A/V)... 8 Sametime

More information

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY Proceedings of the 1998 Winter Simulation Conference D.J. Medeiros, E.F. Watson, J.S. Carson and M.S. Manivannan, eds. HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A

More information

Efficient Voting Prediction for Pairwise Multilabel Classification

Efficient Voting Prediction for Pairwise Multilabel Classification Efficient Voting Prediction for Pairwise Multilabel Classification Eneldo Loza Mencía, Sang-Hyeun Park and Johannes Fürnkranz TU-Darmstadt - Knowledge Engineering Group Hochschulstr. 10 - Darmstadt - Germany

More information

Panel: Pattern management challenges

Panel: Pattern management challenges Panel: Pattern management challenges Panos Vassiliadis Univ. of Ioannina, Dept. of Computer Science, 45110, Ioannina, Hellas E-mail: pvassil@cs.uoi.gr 1 Introduction The increasing opportunity of quickly

More information

5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing. 6. Meta-heuristic Algorithms and Rectangular Packing

5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing. 6. Meta-heuristic Algorithms and Rectangular Packing 1. Introduction 2. Cutting and Packing Problems 3. Optimisation Techniques 4. Automated Packing Techniques 5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing 6.

More information

TCM Health-keeping Proverb English Translation Management Platform based on SQL Server Database

TCM Health-keeping Proverb English Translation Management Platform based on SQL Server Database 2019 2nd International Conference on Computer Science and Advanced Materials (CSAM 2019) TCM Health-keeping Proverb English Translation Management Platform based on SQL Server Database Qiuxia Zeng1, Jianpeng

More information

Administration of Interconnection between Networks and Faced Issues. Wang Jian

Administration of Interconnection between Networks and Faced Issues. Wang Jian Administration of Interconnection between Networks and Faced Issues Wang Jian Jiangsu Communications Administration (JSCA) August, 2002 1 Contents Goals of Interconnection Successful Factors for Interconnection

More information

The influence of caching on web usage mining

The influence of caching on web usage mining The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,

More information

Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3. Set Theory. 3.1 What is a Set? Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

More information

Harmonization of usability measurements in ISO9126 software engineering standards

Harmonization of usability measurements in ISO9126 software engineering standards Harmonization of usability measurements in ISO9126 software engineering standards Laila Cheikhi, Alain Abran and Witold Suryn École de Technologie Supérieure, 1100 Notre-Dame Ouest, Montréal, Canada laila.cheikhi.1@ens.etsmtl.ca,

More information

1 Homophily and assortative mixing

1 Homophily and assortative mixing 1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate

More information

EC121 Mathematical Techniques A Revision Notes

EC121 Mathematical Techniques A Revision Notes EC Mathematical Techniques A Revision Notes EC Mathematical Techniques A Revision Notes Mathematical Techniques A begins with two weeks of intensive revision of basic arithmetic and algebra, to the level

More information

Towards more robust internetworks:

Towards more robust internetworks: Towards more robust internetworks: an application of graph theory Authors Jamie Greenwood, MSc (Royal Holloway, 2016) Stephen Wolthusen, ISG, Royal Holloway Abstract As networks become increasingly connected

More information

Cost Effectiveness of Programming Methods A Replication and Extension

Cost Effectiveness of Programming Methods A Replication and Extension A Replication and Extension Completed Research Paper Wenying Sun Computer Information Sciences Washburn University nan.sun@washburn.edu Hee Seok Nam Mathematics and Statistics Washburn University heeseok.nam@washburn.edu

More information

Investigating F# as a development tool for distributed multi-agent systems

Investigating F# as a development tool for distributed multi-agent systems PROCEEDINGS OF THE WORKSHOP ON APPLICATIONS OF SOFTWARE AGENTS ISBN 978-86-7031-188-6, pp. 32-36, 2011 Investigating F# as a development tool for distributed multi-agent systems Extended abstract Alex

More information

Automatic Merging of Specification Documents in a Parallel Development Environment

Automatic Merging of Specification Documents in a Parallel Development Environment Automatic Merging of Specification Documents in a Parallel Development Environment Rickard Böttcher Linus Karnland Department of Computer Science Lund University, Faculty of Engineering December 16, 2008

More information

CSC 408F/CSC2105F Lecture Notes

CSC 408F/CSC2105F Lecture Notes CSC 408F/CSC2105F Lecture Notes These lecture notes are provided for the personal use of students taking CSC 408H/CSC 2105H in the Fall term 2004/2005 at the University of Toronto. Copying for purposes

More information

Strategies for sound Internet measurement

Strategies for sound Internet measurement Strategies for sound Internet measurement Vern Paxson IMC 2004 Paxson s disclaimer There are no new research results Some points apply to large scale systems in general Unfortunately, all strategies involve

More information

5 The Control Structure Diagram (CSD)

5 The Control Structure Diagram (CSD) 5 The Control Structure Diagram (CSD) The Control Structure Diagram (CSD) is an algorithmic level diagram intended to improve the comprehensibility of source code by clearly depicting control constructs,

More information

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Properties of Biological Networks

Properties of Biological Networks Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási

More information

How to Conduct a Heuristic Evaluation

How to Conduct a Heuristic Evaluation Page 1 of 9 useit.com Papers and Essays Heuristic Evaluation How to conduct a heuristic evaluation How to Conduct a Heuristic Evaluation by Jakob Nielsen Heuristic evaluation (Nielsen and Molich, 1990;

More information

How to re-open the black box in the structural design of complex geometries

How to re-open the black box in the structural design of complex geometries Structures and Architecture Cruz (Ed) 2016 Taylor & Francis Group, London, ISBN 978-1-138-02651-3 How to re-open the black box in the structural design of complex geometries K. Verbeeck Partner Ney & Partners,

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Final Project Report

Final Project Report 16.04.02 Final Project Report Document information Project Title HP Tool Repository of SESAR standard HP methods and tools Project Number 16.04.02 Project Manager DFS Deliverable Name 16.04.02 Final Project

More information

Enhanced Performance of Database by Automated Self-Tuned Systems

Enhanced Performance of Database by Automated Self-Tuned Systems 22 Enhanced Performance of Database by Automated Self-Tuned Systems Ankit Verma Department of Computer Science & Engineering, I.T.M. University, Gurgaon (122017) ankit.verma.aquarius@gmail.com Abstract

More information

THE CONTRAST ASSESS COST ADVANTAGE

THE CONTRAST ASSESS COST ADVANTAGE WHITEPAPER THE CONTRAST ASSESS COST ADVANTAGE APPLICATION SECURITY TESTING COSTS COMPARED WELCOME TO THE ERA OF SELF-PROTECTING SOFTWARE CONTRASTSECURITY.COM EXECUTIVE SUMMARY Applications account for

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

THE STATE OF IT TRANSFORMATION FOR RETAIL

THE STATE OF IT TRANSFORMATION FOR RETAIL THE STATE OF IT TRANSFORMATION FOR RETAIL An Analysis by Dell EMC and VMware Dell EMC and VMware are helping IT groups at retail organizations transform to business-focused service providers. The State

More information

THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES

THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES Mahmoud O. Elish Department of Computer Science George Mason University Fairfax VA 223-44 USA melish@gmu.edu ABSTRACT The use

More information

Kanban Size and its Effect on JIT Production Systems

Kanban Size and its Effect on JIT Production Systems Kanban Size and its Effect on JIT Production Systems Ing. Olga MAŘÍKOVÁ 1. INTRODUCTION Integrated planning, formation, carrying out and controlling of tangible and with them connected information flows

More information