Large-Scale Incremental OWL/RDFS Reasoning

Size: px
Start display at page:

Download "Large-Scale Incremental OWL/RDFS Reasoning"

Transcription

1 Large-Scale Incremental OWL/RDFS Reasoning over Fuzzy RDF Data Batselem Jagvaral, Lee Wangon, Hyun-Kyu Park, Myungjoong Jeon, Nam-Gee Lee, and Young-Tack Park School of Computer Science and Engineering Soongsil University (SSU) Seoul, South Korea Abstract Ontological RDF data are extracted from multiple sources on the web through mapping and alignment for various purposes, but extracting and reasoning about ontologies from different sources causes information ambiguity and uncertainty. A reasonable solution to this problem is to annotate extracted ontology data with truth values to determine the reliability of information. However, the recent growth in data has brought forth difficulties in ascertaining the credibility of numerous ontologies during OWL/RDFS reasoning. In this paper, we present a distributed and incremental reasoning approach for RDF data with uncertainty. We focused on RDFS and OWL pd* semantics and developed methods for incremental OWL reasoning with uncertainty. We also introduced parallel algorithms that resolve the scalable reasoning problem. To evaluate the efficiency of the proposed system, we conducted OWL/RDFS reasoning over fuzzy LUBM3000 and achieved a performance three times higher than that achieved with the fastest reasoning system. Keywords Uncertainty Reasoning; Incremental Reasoning, Ontology, OWL/RDFS; distributed computing;spark I. INTRODUCTION In the real world, manually extracting voluminous ontological data, such as RDF data, from the web is an unreasonable technique [1]. Ontology extraction involves the use of an automated knowledge extractor that determines plausible relations between concepts from diverse knowledge sources. The problem with such extractors is that they are prone to errors when they are used to extract large amounts of ontologies from multiple sources. They can, for example, provide uncertain results on whether a relation holds because of errors from the extractors themselves or errors from publication (e.g., An extractor is 20% uncertain about the date of Obama s birth.). Sources may also be equally untrustworthy, thus presenting the problem of mutually exclusive and conflicting claims by multiple sources. To represent this uncertainty or trustworthiness, which is sometimes referred to as provenance, an extractor assigns a fuzzy value or a numerical certainty to extracted relations (triples) that expend RDF ontology data with fuzzy logic [2]. The extracted fuzzy RDF data are then deployed to Semantic Web applications for a variety of purposes, such as ontological reasoning and querying. Over the past decade, ontology reasoning with fuzzy logic and uncertainty has been pursued by a very small research community. Several studies have argued for the need to annotate RDF data with fuzzy or truth values given that this approach improves data credibility [2,3], but relatively minimal research has been conducted to address the scalable incremental reasoning problem. Meanwhile, much successful work has been done on scalable ontology reasoning with OWL and RDFS semantics by using frameworks such as MapReduce and Hive [2]. These frameworks have been proven capable of handling large-scale reasoning [3,4]. Nevertheless, performing fuzzy reasoning over RDF data with OWL semantics poses challenges to a reasoner. In addition, reasoning systems receive new ontology data continuously and need to perform incremental reasoning over each update. These challenges have prompted us to argue that RDF data incrementally come with uncertainty, thereby necessitating uncertainty reasoning without re-inferring previously inferred knowledge [5]. Accordingly, in this paper, we developed a framework for scalable incremental reasoning with OWL and RDFS vocabulary over fuzzy RDF data on the basis of the Spark framework. Our focus was directed particularly toward parallelizing fuzzy RDFS and OWL pd* rules because pd* vocabulary is less computationally complex than OWL Full or DL vocabulary while offering a rich set of complete reasoning rules. Our approach was to harness the full power of the current trading distributed framework, namely Spark, [6] for performing efficient incremental uncertainty ontology reasoning over RDF data that scale up to millions of triples. The rest of the paper is organized as follows: We provide a brief instruction to the reasoning problem in Section II and III. We explain our algorithms to solve the problem in Section IV. We discussed experimental results are in Section V and finally conclude this paper in section VI. II. RELATED WORK Most of the methods developed for large-scale ontology reasoning are based on either MapReduce or massively parallel computing [3,4,5]. An example is Jacob s reasoning system, WebPIE [10], which deduces a quantitative ontology on the basis of the distributed processing framework of Hadoop MapReduce. WebPIE supports scalable ontology reasoning for RDFS and OWL Horst semantics. The authors compressed input data by using dictionary encoding that is intended to reduce data workload. Despite these advantages, WebPIE performance in iterative processing diminishes because intermediate reasoning results are written to a disk. Previously, [3] presented a fuzzy pd* OWL reasoner and proposed /17/$ IEEE 269 BigComp 2017

2 MapReduce to process forward inferencing over large-scale data by using fuzzy semantics pd* (i.e., an extension of OWL Horst semantics with fuzzy vagueness). Cichlid is also OWL pd* reasoning system designed for Spark framework [8]. In our experiment, we But these approaches need to re-compute previously inferred data whenever it receives a new set of data. Jacob also proposed in [5] incremental reasoning system using MapReduce for only RDF schema reasoning rules. III. BACKGROUND A. Fuzzy RDF and Fuzzy pd* Reasoning A fuzzy triple is expressed in the form (s, p, o)[ ], where (s, p, o) is a triple represented between brackets, and represents a fuzzy degree. We adopted the notion that underlies fuzzy logic methods in evaluating the degree of uncertainty in OWL axioms formalized in [2]. For example, the inference rule owl:symmetricproperty can support annotations for fuzzy values as follows: (p, type, SymmetricProp)[n], (v, p, w)[m] (w, p, v)[n m] where 1 and 2 are fuzzy values for the triples and represents the minimum combination function for the triangular norms in fuzzy logic which is logical AND operation. The entire set of fuzzy RDF pd* rules, as described in [2], is listed in Table I. TABLE I. FUZZY PD* REASONING RULES Fuzzy IF-THEN rules f-rdfp1: (p, type, FunctionalProperty)[n], (u, p, v)[m], (u, p, w)[l] (v, sameas, w)[n m l] f-rdfp2: (p, type, InverseFunctionalProperty)[n], (u, p, w)[m], (v, p, w)[l] (u, sameas, v)[n m l] f-rdfp3: (p, type, SymmetricProperty)[n], (v, p, w)[m] (w, p, v)[n m] f-rdfp4: (p, type, TransitiveP roperty)[n], (u, p, v)[m], (v, p, w)[l] (u, p, w)[n m l] f-rdfp5: (v, sameas, w)[n] (w, sameas, v)[n] f-rdfp6: (u, sameas, v)[n], (v, sameas, w)[m] (u, sameas, w)[n m] f-rdfp7ab: (p, inverseof, q)[n], (v, p, w)[m] (w, q, v)[n m] f-rdfp11: (u, p, v)[n], (u, sameas, u')[m], (v, sameas, v')[l] (u', p, v')[n m l] f-rdfp12a: (v, equivalentclass, w)[n] (v, subclassof, w)[n] f-rdfp12b: (v, subclassof, w)[n], (w, subclassof, v)[m] (v, equivalentclass, w)[n m] f-rdfp13a: (v, equivalentpropety, w)[n] (v, subpropertyof, w)[n] f-rdfp13b: (v, subpropertyof, w)[n], (w, subpropertyof, v)[m] (v, equivalentproperty, w)[n m] f-rdfp14a: (v, hasvalue, w)[n], (v, onproperty, p)[m], (u, p, w)[l] (u, type, v)[n m l] f-rdfp14b: (v, hasvalue, w)[n], (v, onproperty, p)[m], (u, type, v)[l] (u, p, w)[n m l] f-rdfp15: (v, somevaluesfrom, w)[n], (v, onproperty, p)[m], (u, p, x)[l], (x, type, w)[k] (u, type, v)[n m l k] f-rdfp16: (v, allvaluesfrom, w)[n], (v, onproperty, p)[m], (u, type, v)[l], (u, p, x)[k] (x, type, w)[n m l k] B. Resilient Distributed Dataset Resilient Distributed Datasets (RDDs) developed for Spark framework are fault-tolerant, parallel data sets that are distributed across multiple nodes [6]. They support parallel transformations such as map, filter, mappartition, broadcast and join. These functions are higher order functions that take functions as input parameters. For example, map transforms a given RDD set to a new RDD set by applying a user defined function to RDD elements. For more specific information, we suggest the reader to refer to [6]. C. Rule Dependency Graph To leverage the distributed computing, we devise a rules dependency strategy extended from [3] and [7]. Our approach differs from them in its iterative loop that only considers new inferred data on each iteration. Rules are categorized into four groups; namely schema, instance, sameas, and type. Schema rules involve RDF schema semantics [2] and instance rules are f-rdfp13, f-rdfp4, f-rdfp8ab, f-rdfp3. Type rules are rdf schema domain, range, and owl reasoning rdfp15, rdfp16, rdfp14a, rdfp14b. Sameas rules are f-rdfp1, f-rdfp2, f-rdfp5 and f-rdfp6. Fig. 1. Rule dependency graph. schema rules SPO instance rules Moreover, in pd* semantics, different rules may generate the same triples. For example, both f-rdfp1 and f-rdp2 rules can assert a duplicated triple with different fuzzy values. As a result, it increases the confidence in the truth of the triple based on multiple sources of evidence. In Fig. 1, we illustrate this process in the form of trustworthiness scheme where s donates input triples and c denotes a conclusion derived from a specific rule. Accordingly, distributed fuzzy reasoning consists of two steps: the first step is to execute rules and derive new triples and the second step is to eliminate duplicate with the same conclusion by calculating fuzzy values (i.e., removing duplication is performed by fuzzy logic s-norms which is denoted by logical OR operation). Each of them is addressed in the following section. f-rdfp5 type rules sameas rules s 1 s 2 s 3 s 4 f-rdfp1 c 1 c 2 c 1 c 1 Fig. 2. Applying fuzzy pd* reasoning f-dfp2 IV. METHODOLOGY The purpose of our methods is to reduce reasoning cost in a distributed setting while incrementally inferring new triples. In general, it is inefficient to re-compute a large set of inferred data when a new set of data comes. To solve this bottleneck, we introduce an incremental reasoning approach that receives 270

3 new data and performs reasoning over them without reinferring previous data. Fig. 3 depicts how the Spark workflow progresses for ontological reasoning. Each reasoning task consists of two steps: the first step is to get necessary triples from the triple store, execute a rule and then the second step is to compute fuzzy values for the duplicated triples. joining these relations has low selectivity and causes less communication bottleneck on the cluster. TABLE III. ALLVALUES REASONING Algorithm 2: AllValuesFrom Axiom Reasoning (f-rdfp16) function REASON-ALLVALUES(T, A) inputs: T, a RDD set of instance triples Q, a RDD set of type relations O, a RDD set of onproperty relations A, a RDD set of AllValues relations J Π vwp 1 2 A(v, w, 1) O(v, p, 2) B BROADCAST(J) T s GET-ALLVALUES-FROM-SPO(T, B) Q s GET-ALLVALUES-FROM-TYPE(Y, B) T Π upv 4 5 T s(u, x, 4) Q s(u, v, p, 5) b. GET-ALLVALUES-FROM-SPO and TYPE functions are designed to filter relations T and Y using broadcasted values B and are based on the same approach as Algorithm 1. c. symbol denotes join transformation in Spark. d. Πsymbol denotes map transformation in Spark. B. Incremental Reasoning We extended the existing approaches [3,7,8] to handle incremental reasoning. To illustrate our approach, we take the following example: Fig. 3. Reasoning flow in Spark A. Distributed OWL reasoning OWL reasoning rules that require a single join such as f- rdfp3 are implemented by Algorithm 1. The difference between our approach and [3,8] is that we broadcast annotated triples into local machines over the network and we utilized mappartition transformation to perform the single join operations. Consider a simple domain reasoning rule: (p, rdfs:domain, c)[n], (v, p, w)[m] (w, p, v)[n m] TABLE II. REASONING DOMAIN KNOLWEDGE Algorithm 1: RDFS Domain Axiom Reasoning function REASON-DOMAIN(T, B) inputs: T, a RDD set of triples B, broadcasted domain relations for each partition T part T in parallel do D GET-LOCAL-DATA(B) /* key-value dictionary */ for each t T part do if D.contains(t.pred) then c, 2 D.getByKey(t.pred) yield t.subj, rdf:type, c, t. 1 2 a. Parallel loop is implemented by mappartition transformation in Spark In algorithm 2, we demonstrate how we can perform the multijoin operation for OWL reasoning rules such as f-rdfp16 and f- rdfp15. The idea behind our approach is that schema relations A and O, are relatively small so that joining these relations first can reduce the computation cost greatly for the overall reasoning. After joining these relations, we select instance triples associated with A O using broadcast transformation. Then, after filtering type and instance triples using A O, both instance T and type Y relations are reduced in size so that P(X, Y), Q(X, Z) R(X, Y) where P and Q conclude the relation R. Suppose after inferring R, if Q is added, it is unnecessary to re-derive R relations. By using only P and Q relations, we can derive R and compute R U R. Based on this principle, we developed our incremental algorithm for OWL reasoning as shown in Algorithm 3. It aims to derive new inferences from T relations without having to re-compute previous inferences T and iterates until there is no new triple to derive. TABLE IV. MAIN ALGORITHM FOR OWL REASONING Algorithm 3: OWL Reasoning algorithm inputs: T, a set of instance triples, Q, a set of type triples S, a set of schema triples T SCHEMA-REASONING( T, S) U T Q 1 Q while Q 1 is not Ø do T 1 T while T 1 is not Ø do T 1 INSTANCE-REASONING(S, Q 1, T 1) T T U T 1 Q 2 Q 1 while Q 2 is not Ø do Q 2 TYPE-REASONING( Q 2, T) Q 1 Q 1 U Q 2 SAMEAS-REASONING( T) C. Handling Fuzzy Values and Network Shuffling During the reasoning, fuzzy values are handled as shown in the above algorithms but after the reasoning, it may happen that 271

4 different rules conclude the same triples with different truth values as described in Fig. 2. To group the same triples, we can use reducebykey transformation but it requires the RDD to fully shuffle through the network. On the other hand, when RDDs are pre-partitioned RDDs, the values on a single machine are computed locally and only finalized values are sent from the workers to the driver [7]. To apply this principle to our approach, we partitioned the RDD using the preserved hash partitioner and reduced the same triples locally to compute truth values. However, in advance, all RDDs need to be partitioned by the same partitioner. This can be accomplished by partitioning overall input triples before the reasoning process commences. Also, as defined in [7], join transformation called on two RDDs that are pre-partitioned with the same partitioner and cached on the same machine causes the join to be computed locally, with no shuffling across the network. In our multiple join algorithm, we apply this approach to avoid the high communication cost. V. EXPERIMENT To evaluate the efficiency of the proposed reasoner, we conducted OWL reasoning over LUBM [9] The LUBM dataset consists of University domain ontology and is widely used to evaluate ontology reasoning systems. To conduct experiments, we setup a Hadoop multi-node cluster using 5 worker nodes and one master node. Each compute node is equipped with 2.4GHz CPU with 24 core processors and 32GB main memory. To evaluate fuzzy reasoning, we assign arbitrary fuzzy values to LUBM instance triples. We show the throughputs of the reasoner in Fig. 4 and the incremental reasoning scalability Fig. 5. When we scale LUBM datasets up to 3000 Universities, the throughput remains quite stable. To evaluate the system, we run WebPIE on the same cluster. With annotated triples, our reasoner performs slower compared to the standard reasoning but overall it performs much faster than both WebPIE. Reasoning time (min) WebPIE Our approach Number of Universities Fig. 4. Fuzzy reasoning comparision with WebPIE using LUBM datasets. In Fig. 5, we show the reasoning comparison with Cichlid Spark based reasoner [8]. As we push 1k instance triples to the triple store, our reasoner performs OWL reasoning in a relatively few minutes while Cichlid s runtime increases two times higher than the current runtime. In addition, this experiment was conducted without computing fuzzy values. Reasoning time (min) Cichlid Update operation Our approach Number of Universities e. Update operation is to add 1k instance triples to the triple store using our approach. Fig. 5. OWL Reasoning comparision with Cichlid. VI. CONCLUSION In this paper, we present a scalable incremental and fuzzy OWL pd* semantic reasoning approach for a large-scale ontology that handles uncertainty annotations. We present methods to calculate fuzzy values and incremental reasoning approaches to prevent from deriving previously inferred data. To evaluate the efficiency of the proposed reasoner, we conducted OWL reasoning over LUBM3000 and achieved about three times higher throughput compared to that of WebPIE reasoning which employs MapReduce. ACKNOWLEDGMENT This work was also supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2016R1A2B ), Republic of Korea. REFERENCES [1] K. Ahmad, and L. Gillam, Automatic ontology extraction from unstructured texts, In Proc. of the 2005 OTM Confederated international conference on the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE, [2] C. Liu, G. Qi, H. Wang, and Y. Yu, "Fuzzy Reasoning over RDF Data Using OWL Vocabulary," Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM International Conference on, Lyon, 2011, pp [3] C. Liu, G. Qi, H. Wang and Y. Yu, Large Scale Fuzzy pd* Reasoning using MapReduce, In Proc. of the Semantic Web ISWC 2011, vol. 7031, pp , [4] J. Urbani, S. Kotoulas, E. Oren, and F.V. Harmelen, Scalable Distributed Reasoning using MapReduce,. In: Proc. of the Semantic Web - ISWC 09, vol. 5823, pp , [5] J. Urbani, A. Margara, F.V. Harmelen, and H. Bal, DynamiTE: Parallel Materialization of Dynamic RDF Data, In Proc. of the Semantic Weg ISWC 2013, vol. 8218, pp [6] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker and I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, In Proc. of the 9 th USENIX conference on Networked Systems Design and Implementation, pp. 2-2, [7] K. Jemin, and P. Young-Tack, Scalable OWL-Horst ontology reasoning using Spark, In. Proc. of BigComp 2015, pp , [8] R. Gu, S. Wang, F. Wang, C. Yuan, and Y. Huang, "Cichlid: Efficient Large Scale RDFS/OWL Reasoning with Spark," Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, Hyderabad, 2015, pp

5 [9] Y. Guo, Z. Pan, and J. Heflin, LUBM: A Benchmark for OWL Knowledge Base Systems, in Journal of Web Semantics 3(2), pp , [10] J. Urbani, S. Kotoulas, J. Maassen, F.V. Harmelen, and Henri Bal, OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, In Proc. of the 7th European Semantic Web Conference, vol. 6088, pp ,

Large Scale Fuzzy pd Reasoning using MapReduce

Large Scale Fuzzy pd Reasoning using MapReduce Large Scale Fuzzy pd Reasoning using MapReduce Chang Liu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,whfcarter,yyu}@apex.sjtu.edu.cn 2 Southeast University, China

More information

Scaling Parallel Rule-based Reasoning

Scaling Parallel Rule-based Reasoning Scaling Parallel Rule-based Reasoning Martin Peters 1, Christopher Brink 1, Sabine Sachweh 1, and Albert Zündorf 2 1 University of Applied Sciences Dortmund, Germany, Department of Computer Science {martin.peters

More information

Survey on Incremental MapReduce for Data Mining

Survey on Incremental MapReduce for Data Mining Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,

More information

Reasoning with Large Scale Ontologies. in Fuzzy pd Using MapReduce

Reasoning with Large Scale Ontologies. in Fuzzy pd Using MapReduce Reasoning with Large Scale Ontologies * in Fuzzy pd Using MapReduce Chang Liu, Shanghai Jiao Tong University, CHINA Guilin Qi, Southeast University, CHINA Haofen Wang and Yong Yu, Shanghai Jiao Tong University,

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Resilient Distributed Datasets

Resilient Distributed Datasets Resilient Distributed Datasets A Fault- Tolerant Abstraction for In- Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin,

More information

Scaling Parallel Rule-based Reasoning

Scaling Parallel Rule-based Reasoning University of Applied Sciences and Arts Dortmund Scaling Parallel Rule-based Reasoning Martin Peters 1, Christopher Brink 1, Sabine Sachweh 1 and Albert Zündorf 2 1 University of Applied Sciences and Arts

More information

WebPIE: A Web-scale parallel inference engine using MapReduce

WebPIE: A Web-scale parallel inference engine using MapReduce WebPIE: A Web-scale parallel inference engine using MapReduce Jacopo Urbani a, Spyros Kotoulas a, Jason Maassen a, Frank van Harmelen a, Henri Bal a a Department of Computer Science, Vrije Universiteit

More information

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica. Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica UC Berkeley Background MapReduce and Dryad raised level of abstraction in cluster

More information

Fast, Interactive, Language-Integrated Cluster Computing

Fast, Interactive, Language-Integrated Cluster Computing Spark Fast, Interactive, Language-Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica www.spark-project.org

More information

Introduction to MapReduce Algorithms and Analysis

Introduction to MapReduce Algorithms and Analysis Introduction to MapReduce Algorithms and Analysis Jeff M. Phillips October 25, 2013 Trade-Offs Massive parallelism that is very easy to program. Cheaper than HPC style (uses top of the line everything)

More information

Spark: A Brief History. https://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf

Spark: A Brief History. https://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf Spark: A Brief History https://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf A Brief History: 2004 MapReduce paper 2010 Spark paper 2002 2004 2006 2008 2010 2012 2014 2002 MapReduce @ Google

More information

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Jung H. Oh, Gyuho Eoh, and Beom H. Lee Electrical and Computer Engineering, Seoul National University,

More information

Shark: Hive on Spark

Shark: Hive on Spark Optional Reading (additional material) Shark: Hive on Spark Prajakta Kalmegh Duke University 1 What is Shark? Port of Apache Hive to run on Spark Compatible with existing Hive data, metastores, and queries

More information

2/4/2019 Week 3- A Sangmi Lee Pallickara

2/4/2019 Week 3- A Sangmi Lee Pallickara Week 3-A-0 2/4/2019 Colorado State University, Spring 2019 Week 3-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING SECTION 1: MAPREDUCE PA1

More information

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( )

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( ) Shark: SQL and Rich Analytics at Scale Yash Thakkar (2642764) Deeksha Singh (2641679) RDDs as foundation for relational processing in Shark: Resilient Distributed Datasets (RDDs): RDDs can be written at

More information

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications Spark In- Memory Cluster Computing for Iterative and Interactive Applications Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker,

More information

L3: Spark & RDD. CDS Department of Computational and Data Sciences. Department of Computational and Data Sciences

L3: Spark & RDD. CDS Department of Computational and Data Sciences. Department of Computational and Data Sciences Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences L3: Spark & RDD Department of Computational and Data Science, IISc, 2016 This

More information

Semantics. KR4SW Winter 2011 Pascal Hitzler 1

Semantics. KR4SW Winter 2011 Pascal Hitzler 1 Semantics KR4SW Winter 2011 Pascal Hitzler 1 Knowledge Representation for the Semantic Web Winter Quarter 2011 Pascal Hitzler Slides 5 01/20+25/2010 Kno.e.sis Center Wright State University, Dayton, OH

More information

Scale reasoning with fuzzy-el + ontologies based on MapReduce

Scale reasoning with fuzzy-el + ontologies based on MapReduce Scale reasoning with fuzzy-el + ontologies based on MapReduce Zhangquan Zhou 1 and Guilin Qi 1 and Chang Liu 2 and Pascal Hitzler 3 and Raghava Mutharaju 3 1 Southeast University, China {quanzz, gqig}@seu.edu.cn

More information

CSE 444: Database Internals. Lecture 23 Spark

CSE 444: Database Internals. Lecture 23 Spark CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei

More information

CDS. André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 de Données astronomiques de Strasbourg, 2SSC-XMM-Newton

CDS. André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 de Données astronomiques de Strasbourg, 2SSC-XMM-Newton Docker @ CDS André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 1Centre de Données astronomiques de Strasbourg, 2SSC-XMM-Newton Paul Trehiou Université de technologie de Belfort-Montbéliard

More information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Integrating Complex Financial Workflows in Oracle Database Xavier Lopez Seamus Hayes Oracle PolarLake, LTD 2 Copyright 2011, Oracle

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin,

More information

INCREMENTAL ONTOLOGY INFERENCE FOR SEMANTIC WEB BASED ON MAPREDUCE APPROACH

INCREMENTAL ONTOLOGY INFERENCE FOR SEMANTIC WEB BASED ON MAPREDUCE APPROACH INCREMENTAL ONTOLOGY INFERENCE FOR SEMANTIC WEB BASED ON MAPREDUCE APPROACH T. Revathi 1, U. Uma Devi 2 1 Senior Professor& Head, Dept of IT, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India

More information

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications Spark In- Memory Cluster Computing for Iterative and Interactive Applications Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker,

More information

QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases

QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases Jacopo Urbani, Frank van Harmelen, Stefan Schlobach, and Henri Bal Department of Computer Science, Vrije Universiteit Amsterdam,

More information

On the basis of Mapreduce Pattern an Incremental and Distributed Inference Method Intended for Large Scale Ontologies

On the basis of Mapreduce Pattern an Incremental and Distributed Inference Method Intended for Large Scale Ontologies On the basis of Mapreduce Pattern an Incremental and Distributed Inference Method Intended for Large Scale Ontologies Sharanabasavaraj Department of CSE AMC Engineering College Bangalore, India Nirmala

More information

Scalable RDF data compression with MapReduce

Scalable RDF data compression with MapReduce CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2013; 25:24 39 Published online 23 April 2012 in Wiley Online Library (wileyonlinelibrary.com)..2840 SPECIAL ISSUE

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

Knowledge Representation for the Semantic Web

Knowledge Representation for the Semantic Web Knowledge Representation for the Semantic Web Winter Quarter 2011 Pascal Hitzler Slides 4 01/13/2010 Kno.e.sis Center Wright State University, Dayton, OH http://www.knoesis.org/pascal/ KR4SW Winter 2011

More information

Processing Technology of Massive Human Health Data Based on Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,

More information

Improving Ensemble of Trees in MLlib

Improving Ensemble of Trees in MLlib Improving Ensemble of Trees in MLlib Jianneng Li, Ashkon Soroudi, Zhiyuan Lin Abstract We analyze the implementation of decision tree and random forest in MLlib, a machine learning library built on top

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Scalable Distributed Reasoning using MapReduce

Scalable Distributed Reasoning using MapReduce Scalable Distributed Reasoning using MapReduce Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen Department of Computer Science, Vrije Universiteit Amsterdam, the Netherlands Abstract.

More information

Simplified Approach for Representing Part-Whole Relations in OWL-DL Ontologies

Simplified Approach for Representing Part-Whole Relations in OWL-DL Ontologies Simplified Approach for Representing Part-Whole Relations in OWL-DL Ontologies Pace University IEEE BigDataSecurity, 2015 Aug. 24, 2015 Outline Ontology and Knowledge Representation 1 Ontology and Knowledge

More information

a Spark in the cloud iterative and interactive cluster computing

a Spark in the cloud iterative and interactive cluster computing a Spark in the cloud iterative and interactive cluster computing Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica UC Berkeley Background MapReduce and Dryad raised level of

More information

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia MapReduce & Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline - MapReduce: - - Resilient Distributed Datasets (RDD) - - Motivation Examples The Design and How it Works Performance Motivation

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect

Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect Alexander Agarkov and Alexander Semenov JSC NICEVT, Moscow, Russia {a.agarkov,semenov}@nicevt.ru

More information

Accelerating Spark RDD Operations with Local and Remote GPU Devices

Accelerating Spark RDD Operations with Local and Remote GPU Devices Accelerating Spark RDD Operations with Local and Remote GPU Devices Yasuhiro Ohno, Shin Morishima, and Hiroki Matsutani Dept.ofICS,KeioUniversity, 3-14-1 Hiyoshi, Kohoku, Yokohama, Japan 223-8522 Email:

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

Today: RDF syntax. + conjunctive queries for OWL. KR4SW Winter 2010 Pascal Hitzler 3

Today: RDF syntax. + conjunctive queries for OWL. KR4SW Winter 2010 Pascal Hitzler 3 Today: RDF syntax + conjunctive queries for OWL KR4SW Winter 2010 Pascal Hitzler 3 Today s Session: RDF Schema 1. Motivation 2. Classes and Class Hierarchies 3. Properties and Property Hierarchies 4. Property

More information

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics Min LI,, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, Alan Bivens IBM TJ Watson Research Center * A

More information

Benchmarking Reasoners for Multi-Ontology Applications

Benchmarking Reasoners for Multi-Ontology Applications Benchmarking Reasoners for Multi-Ontology Applications Ameet N Chitnis, Abir Qasem and Jeff Heflin Lehigh University, 19 Memorial Drive West, Bethlehem, PA 18015 {anc306, abq2, heflin}@cse.lehigh.edu Abstract.

More information

Distributed Implementation of BG Benchmark Validation Phase Dimitrios Stripelis, Sachin Raja

Distributed Implementation of BG Benchmark Validation Phase Dimitrios Stripelis, Sachin Raja Distributed Implementation of BG Benchmark Validation Phase Dimitrios Stripelis, Sachin Raja {stripeli,raja}@usc.edu 1. BG BENCHMARK OVERVIEW BG is a state full benchmark used to evaluate the performance

More information

Review of Fuzzy Logical Database Models

Review of Fuzzy Logical Database Models IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 8, Issue 4 (Jan. - Feb. 2013), PP 24-30 Review of Fuzzy Logical Database Models Anupriya 1, Prof. Rahul Rishi 2 1 (Department

More information

38050 Povo Trento (Italy), Via Sommarive 14 IWTRUST: IMPROVING USER TRUST IN ANSWERS FROM THE WEB

38050 Povo Trento (Italy), Via Sommarive 14   IWTRUST: IMPROVING USER TRUST IN ANSWERS FROM THE WEB UNIVERSITY OF TRENTO DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY 38050 Povo Trento (Italy), Via Sommarive 14 http://www.dit.unitn.it IWTRUST: IMPROVING USER TRUST IN ANSWERS FROM THE WEB Ilya

More information

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma

More information

Evolution From Shark To Spark SQL:

Evolution From Shark To Spark SQL: Evolution From Shark To Spark SQL: Preliminary Analysis and Qualitative Evaluation Xinhui Tian and Xiexuan Zhou Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese

More information

Forward Chaining Reasoning Tool for Rya

Forward Chaining Reasoning Tool for Rya Forward Chaining Reasoning Tool for Rya Rya Working Group, 6/29/2016 Forward Chaining Reasoning Tool for Rya 6/29/2016 1 / 11 OWL Reasoning OWL (the Web Ontology Language) facilitates rich ontology definition

More information

CS294 Big Data System Course Project Report Gemini: Boosting Spark Performance with GPU Accelerators

CS294 Big Data System Course Project Report Gemini: Boosting Spark Performance with GPU Accelerators Gemini: Boosting Spark Performance with GPU Accelerators Guanhua Wang Zhiyuan Lin Ion Stoica AMPLab EECS AMPLab UC Berkeley UC Berkeley UC Berkeley Abstract Compared with MapReduce, Apache Spark is more

More information

Ontology Merging: on the confluence between theoretical and pragmatic approaches

Ontology Merging: on the confluence between theoretical and pragmatic approaches Ontology Merging: on the confluence between theoretical and pragmatic approaches Raphael Cóbe, Renata Wassermann, Fabio Kon 1 Department of Computer Science University of São Paulo (IME-USP) {rmcobe,renata,fabio.kon}@ime.usp.br

More information

OSDBQ: Ontology Supported RDBMS Querying

OSDBQ: Ontology Supported RDBMS Querying OSDBQ: Ontology Supported RDBMS Querying Cihan Aksoy 1, Erdem Alparslan 1, Selçuk Bozdağ 2, İhsan Çulhacı 3, 1 The Scientific and Technological Research Council of Turkey, Gebze/Kocaeli, Turkey 2 Komtaş

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

A Framework for Performance Study of Semantic Databases

A Framework for Performance Study of Semantic Databases A Framework for Performance Study of Semantic Databases Xianwei Shen 1 and Vincent Huang 2 1 School of Information and Communication Technology, KTH- Royal Institute of Technology, Kista, Sweden 2 Services

More information

On Web-scale Reasoning

On Web-scale Reasoning On Web-scale Reasoning Jacopo Urbani ii Copyright 2013 by Jacopo Urbani VRIJE UNIVERSITEIT On Web-scale Reasoning ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam,

More information

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016 Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016 Introduction One trillion is a really big number. What could you store with one trillion facts?» 1000

More information

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,

More information

An Initial Investigation into Querying an Untrustworthy and Inconsistent Web

An Initial Investigation into Querying an Untrustworthy and Inconsistent Web An Initial Investigation into Querying an Untrustworthy and Inconsistent Web Yuanbo Guo and Jeff Heflin Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA18015, USA {yug2, heflin}@cse.lehigh.edu

More information

Integration of Machine Learning Library in Apache Apex

Integration of Machine Learning Library in Apache Apex Integration of Machine Learning Library in Apache Apex Anurag Wagh, Krushika Tapedia, Harsh Pathak Vishwakarma Institute of Information Technology, Pune, India Abstract- Machine Learning is a type of artificial

More information

Automated and Massive-scale CCNx Experiments with Software-Defined SmartX Boxes

Automated and Massive-scale CCNx Experiments with Software-Defined SmartX Boxes Network Research Workshop Proceedings of the Asia-Pacific Advanced Network 2014 v. 38, p. 29-33. http://dx.doi.org/10.7125/apan.38.5 ISSN 2227-3026 Automated and Massive-scale CCNx Experiments with Software-Defined

More information

High Utility Web Access Patterns Mining from Distributed Databases

High Utility Web Access Patterns Mining from Distributed Databases High Utility Web Access Patterns Mining from Distributed Databases Md.Azam Hosssain 1, Md.Mamunur Rashid 1, Byeong-Soo Jeong 1, Ho-Jin Choi 2 1 Database Lab, Department of Computer Engineering, Kyung Hee

More information

Partitioning OWL Knowledge Bases for Parallel Reasoning

Partitioning OWL Knowledge Bases for Parallel Reasoning Partitioning OWL Knowledge Bases for Parallel Reasoning Sambhawa Priya, Yuanbo Guo, Michael Spear and Jeff Heflin Department of Computer Science and Engineering, Lehigh University 19 Memorial Drive West,

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

DAta warehouses and the Web comprise enormous

DAta warehouses and the Web comprise enormous 1 Efficient Skew Handling for Outer Joins in a Cloud Computing Environment Long Cheng and Spyros Kotoulas Abstract Outer joins are ubiquitous in many workloads and Big Data systems. The question of how

More information

Adaptive Control of Apache Spark s Data Caching Mechanism Based on Workload Characteristics

Adaptive Control of Apache Spark s Data Caching Mechanism Based on Workload Characteristics Adaptive Control of Apache Spark s Data Caching Mechanism Based on Workload Characteristics Hideo Inagaki, Tomoyuki Fujii, Ryota Kawashima and Hiroshi Matsuo Nagoya Institute of Technology,in Nagoya, Aichi,

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

SQL-to-MapReduce Translation for Efficient OLAP Query Processing

SQL-to-MapReduce Translation for Efficient OLAP Query Processing , pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,

More information

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Information Retrieval System Based on Context-aware in Internet of Things. Ma Junhong 1, a *

Information Retrieval System Based on Context-aware in Internet of Things. Ma Junhong 1, a * Information Retrieval System Based on Context-aware in Internet of Things Ma Junhong 1, a * 1 Xi an International University, Shaanxi, China, 710000 a sufeiya913@qq.com Keywords: Context-aware computing,

More information

GPU-Accelerated Apriori Algorithm

GPU-Accelerated Apriori Algorithm GPU-Accelerated Apriori Algorithm Hao JIANG a, Chen-Wei XU b, Zhi-Yong LIU c, and Li-Yan YU d School of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b wei1517@126.com,

More information

D1.5.1 First Report on Models for Distributed Computing

D1.5.1 First Report on Models for Distributed Computing http://latc-project.eu D1.5.1 First Report on Models for Distributed Computing Project GA No. FP7-256975 Project acronym LATC Start date of project 2010-09-01 Document due date 2011-08-31 Actual date of

More information

A Spark Scheduling Strategy for Heterogeneous Cluster

A Spark Scheduling Strategy for Heterogeneous Cluster Copyright 2018 Tech Science Press CMC, vol.55, no.3, pp.405-417, 2018 A Spark Scheduling Strategy for Heterogeneous Cluster Xuewen Zhang 1, Zhonghao Li 1, Gongshen Liu 1, *, Jiajun Xu 1, Tiankai Xie 2

More information

Semantic Web. Ontology Pattern. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

Semantic Web. Ontology Pattern. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau Semantic Web Ontology Pattern Gerd Gröner, Matthias Thimm {groener,thimm}@uni-koblenz.de Institute for Web Science and Technologies (WeST) University of Koblenz-Landau July 18, 2013 Gerd Gröner, Matthias

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over

More information

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/22/2018 Week 10-A Sangmi Lee Pallickara. FAQs.

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/22/2018 Week 10-A Sangmi Lee Pallickara. FAQs. 10/22/2018 - FALL 2018 W10.A.0.0 10/22/2018 - FALL 2018 W10.A.1 FAQs Term project: Proposal 5:00PM October 23, 2018 PART 1. LARGE SCALE DATA ANALYTICS IN-MEMORY CLUSTER COMPUTING Computer Science, Colorado

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Semantic Web. Tahani Aljehani

Semantic Web. Tahani Aljehani Semantic Web Tahani Aljehani Motivation: Example 1 You are interested in SOAP Web architecture Use your favorite search engine to find the articles about SOAP Keywords-based search You'll get lots of information,

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko Shark: SQL and Rich Analytics at Scale Michael Xueyuan Han Ronny Hajoon Ko What Are The Problems? Data volumes are expanding dramatically Why Is It Hard? Needs to scale out Managing hundreds of machines

More information

New Developments in Spark

New Developments in Spark New Developments in Spark And Rethinking APIs for Big Data Matei Zaharia and many others What is Spark? Unified computing engine for big data apps > Batch, streaming and interactive Collection of high-level

More information

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri

More information

DBpedia-An Advancement Towards Content Extraction From Wikipedia

DBpedia-An Advancement Towards Content Extraction From Wikipedia DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting

More information

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:

More information

Evaluation of RDF Archiving strategies with Spark

Evaluation of RDF Archiving strategies with Spark Evaluation of RDF Archiving strategies with Spark Meriem Laajimi 1, Afef Bahri 2, and Nadia Yacoubi Ayadi 3 1 High Institute of Management Tunis, Tunisia laajimimeriem@yahoo.fr 2 MIRACL Laboratory, University

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

Stream Processing on IoT Devices using Calvin Framework

Stream Processing on IoT Devices using Calvin Framework Stream Processing on IoT Devices using Calvin Framework by Ameya Nayak A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Supervised

More information

Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples

Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples Jesse Weaver and James A. Hendler Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

Indexing Strategies of MapReduce for Information Retrieval in Big Data

Indexing Strategies of MapReduce for Information Retrieval in Big Data International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya

More information

Jans Aasman, Ph.D. CEO Franz Inc Optimizing Sparql and Prolog for reasoning on large scale diverse ontologies

Jans Aasman, Ph.D. CEO Franz Inc Optimizing Sparql and Prolog for reasoning on large scale diverse ontologies Jans Aasman, Ph.D. CEO Franz Inc Ja@Franz.com Optimizing Sparql and Prolog for reasoning on large scale diverse ontologies This presentation Triples and a Graph database (2 minutes, I promise) AllegroGraph

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

Evaluating DBOWL: A Non-materializing OWL Reasoner based on Relational Database Technology

Evaluating DBOWL: A Non-materializing OWL Reasoner based on Relational Database Technology Evaluating DBOWL: A Non-materializing OWL Reasoner based on Relational Database Technology Maria del Mar Roldan-Garcia, Jose F. Aldana-Montes University of Malaga, Departamento de Lenguajes y Ciencias

More information