Master thesis: Automatic Extraction of Design Decision Relationships from a Task Management System

Size: px
Start display at page:

Download "Master thesis: Automatic Extraction of Design Decision Relationships from a Task Management System"

Transcription

1 Master thesis: Automatic Extraction of Design Decision Relationships from a Task Management System Matthias Ruppel, 8 th of November 2017, Munich Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität München wwwmatthes.in.tum.de

2 Outline I. Introduction and Motivation II. Concepts Architectural Design Decision Quality Attributes III. OSS and NFR Dataset IV. Classification by Keywords V. Methodology VI. Results VII.Conclusion and Outlook Matthias Ruppel, 8 th of November 2017, Master thesis final presentation sebis 2

3 Introduction Motivation Many architectural design decisions are made during development & maintenance Documenting takes a lot of effort, time & costs Architectural design decisions are hard to capture Current design decisions may interfere with previous design decisions Implicitly taken, not explicitly captured & documented Rational / Cause / Concern is not evident in the documentation Matthias Ruppel, 8 th of November 2017, Master thesis final presentation sebis 3

4 Architectural Design Decision Definition A description of the choice and considered alternatives that (partially) realize one or more requirements. Alternatives consist of a set of architectural additions, subtractions and modifications to the software architecture, the rationale, and the design rules, design constraints and additional requirements. Source: Jansen, A. G. J. (2008). Architectural design decisions s.n. Source: Zimmermann et. al. (2009). Managing architectural decision models with dependency relations, integrity constraints, and production rules sebis 4

5 Non-Functional Requirements: ISO Types of Quality ISO classifies software quality in a structured set of characteristics and subcharacteristics. Each quality sub-characteristic is further divided into attributes. Source: SO/IEC :2001 Software engineering - Product quality - Part 1: Quality model. sebis 5

6 OSS and NFR Dataset NFR Dataset - Dataset of requirements of a software project, provided by PROMISE - Used by other scholars within text classification publications - 40% FR and 60% NFR - Potential issue: underrepresentation of certain Quality Attributes Data Extraction Data Curation Manual Labeling OSS Dataset OSS Dataset - Apache Spark and Apache Hadoop OSS - Public available Jira Issues - Complex and extensive open source frameworks provided for Scala, Java & Python - Limited documentation Usability (US) Security (SE) Scalability (SC) Portability (PO) Performance (PE) Operational (O) Maintainability (MN) Look & Feel (LF) Legal (L) Fault Tolerance (FT) Availability (A) Functinal (F) sebis 6

7 Classification by Keywords Quality Attribute Security Performance Keywords Confidentiality, integrity, completeness, accuracy, perturbation, virus, access, authorization, rule, validation, audit, biometrics, card, key, password, alarm, encryption, noise Space, time, memory, storage, response, throughput, peak, mean, index, compress, uncompress, runtime, perform, execute, dynamic, offset, reduce, fixing, early, late Results Source were puplication, which extracted keyword to predict quality attributes Dependent on Context Only for a few NFRs Poor performance on OSS and NFR dataset: Very low precision rate i.e. 1% (Usability with keywords from Slankas et al., (2013) Recall rate on NFR dataset is very high, with mostly 92% - 100% Keyword Matching Quality Attribute Design Decision Keyword Classification Source: Cleland-Huang et. Al. (2007). Automated classification of non-functional requirements sebis 7

8 Methodology: Text Classification Quality Attribute Feature Extraction remov authent test Machine Learning Algorithm Training Design Decision Features Quality Attribute Feature Extraction cooki token on Classifier Model Prediction Documents to Categorize Features Source: Adapted from Witten (2016). Practical Machine Learning Tools and Techniques sebis 8

9 Methodology: Feature Extraction and Selection Quality Attribute Feature Extraction remov authent test Machine Learning Algorithm Training Removing Digits & Punctuation Marks Stemming Removing Stop Words Tokenizing Text Feature Selection with Feature Extraction Information Gain Source: Own Illustration sebis 9

10 Methodology: Tokenization and Machine Learning Algorithms OSS Dataset NFR Dataset Machine Learning Algorithm SVM C4.5 Multinomial Naïve Bayes Tokenizing Text Bag of Words N-gram sebis 10

11 Methodology: Features Features OSS Dataset NFR Dataset Bag of words N-gram Bag of words N-gram remov,authent,test,add,support,upgrad,configur,unsaferow,perform,column,renam,auth,token,credenti,spee, cooki, password remov,add,authent,test,support,upgrad,perform,improv,unsaferow in, column, renam, unsaferow, support unsaferow, in,auth, support unsaferow,token,remove it, credenti, speed, improve perform,is based on, cooki, authentication mechan,the authent,http authent,password second,onli,us,no,access,with,interfac,avail,user,than,oper,minut,time,year,compli,author,easi,more,under,90 %,hour,player,allow,server,support,after,standard,respons,let,includ,updat,0,can,class,per,train,longer,regul, mainten,ensur,environ,successfulli,simultan,expect us,second,onli,user,the,shall,no,product shall b,access,with,interfac,interface with, avail, be avail, to,oper,minut,time,than,of,year,after,comply with, compli,author,updat,interface with th,easi,in und,under,allow,90%,90% of,shall interfac,hour,shall allow,shall be avail,be easi,to us,server,shall interface with,standard,and,the product must,product must,shall be easi,player,users shal,system shall let,5 second,response tim,shall let,let,be available for,available for, includ, by, 0, per, respons, train, longer, regul,class,displai,mainten,ensur,environ,for us,shall ensur,only author,shall ensure that,ensure that,easy to,be easy to,longer than,successfully,available for us,seconds th,using th,simultan,no mor,no more than,expect,expected to,have access to,have access,to successfulli,in under 5,be no mor,under 5,be no sebis 11

12 Results: Performance Evaluation NFR Dataset Bag of Words OSS Dataset Bag of Words 1 1 0,8 0,8 F-Measure 0,6 0,4 F-Measure 0,6 0,4 0,2 0,2 0 F A L LF MN O PE SC SE US FT PO 0 PO F PE MN FT O US SE A J4.8 NaiveBayesMult SVM J4.8 NaiveBayesMult SVM NFR Dataset N-gram OSS Dataset N-gram 1 1 0,8 0,8 F-Measure 0,6 0,4 F-Measure 0,6 0,4 0,2 0,2 0 F A L LF MN O PE SC SE US FT PO 0 PO F PE MN FT O US SE A J4.8 NaiveBayesMult SVM J4.8 NaiveBayesMult SVM sebis 12

13 Conclusion and Outlook - Quality Attributes (QAs) are often considered as the most important decision drivers and have a positive influence on the satisfaction of stakeholders - During the elicitation process, requirements are kept in various documents and different formats, and usually they are not properly categorized. Ø Information should kept on a central place - A framework should be used to capture design decisions Ø How could be this included into the development process of a real project? sebis 13

14 Matthias Ruppel Technische Universität München Faculty of Informatics Chair of Software Engineering for Business Information Systems Boltzmannstraße Garching bei München Tel Fax wwwmatthes.in.tum.de

Platform-Independent UI Models: Extraction from UI Prototypes and rendering as W3C Web Components

Platform-Independent UI Models: Extraction from UI Prototypes and rendering as W3C Web Components Platform-Independent UI Models: Extraction from UI Prototypes and rendering as W3C Web Components Marvin Aulenbacher, 19.06.2017, Munich Chair of Software Engineering for Business Information Systems (sebis)

More information

Creating Software Architecture Documentation for MediaWiki Software Master s Thesis Final Presentation , Uliana Bakhtina

Creating Software Architecture Documentation for MediaWiki Software Master s Thesis Final Presentation , Uliana Bakhtina Creating Software Architecture Documentation for MediaWiki Software Master s Thesis Final Presentation 14.09.2015, Uliana Bakhtina Software Engineering für betriebliche Informationssysteme (sebis) Fakultät

More information

Creating a Software Architecture Documentation for MediaWiki Software Master s Thesis Kick-Off Presentation , Uliana Bakhtina

Creating a Software Architecture Documentation for MediaWiki Software Master s Thesis Kick-Off Presentation , Uliana Bakhtina Creating a Software Architecture Documentation for MediaWiki Software Master s Thesis Kick-Off Presentation 23.03.2015, Uliana Bakhtina Software Engineering für betriebliche Informationssysteme (sebis)

More information

Guided Research: Intelligent Contextual Task Support for Mails

Guided Research: Intelligent Contextual Task Support for Mails Guided Research: Intelligent Contextual Task Support for Mails Simon Bönisch, 28.05.2018, Kick-off Presentation Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics

More information

A Model-Driven JSON Editor

A Model-Driven JSON Editor A Model-Driven JSON Editor Lucas Köhler Master s Thesis Kickoff, 10.07.2017, Munich Advisors: Adrian Hernandez-Mendez, Dr. Jonas Helming Chair of Software Engineering for Business Information Systems (sebis)

More information

Towards an EA View Template Marketplace

Towards an EA View Template Marketplace Towards an EA View Template Marketplace 29.06.2016, Prof. Dr. Florian Matthes Software Engineering für betriebliche Informationssysteme (sebis) Fakultät für Informatik Technische Universität München wwwmatthes.in.tum.de

More information

Implementing a Web Client for Social Content and Task Management Master s Thesis Final Presentation , Björn Michelsen

Implementing a Web Client for Social Content and Task Management Master s Thesis Final Presentation , Björn Michelsen Implementing a Web Client for Social Content and Task Management Master s Thesis Final Presentation 10.10.2016, Björn Michelsen Software Engineering für betriebliche Informationssysteme (sebis) Fakultät

More information

A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system

A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system Patrick Schäfer, 08.11.2017, Munich Advisor: Martin Kleehaus Chair of Software Engineering

More information

Technical Analysis of Established Blockchain Systems

Technical Analysis of Established Blockchain Systems Technical Analysis of Established Blockchain Systems Florian Haffke, 20.11.2017, Munich Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität

More information

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf, 23.05.2016, Munich Software Engineering for Business Information Systems (sebis) Department of

More information

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum.

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum. A concept for the design of learning resources for API of Content Management Platforms Guided research Kickoff Presentation Sirma Gjorgievska, 16.11.2015 Software Engineering for Business Information Systems

More information

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Master thesis - Kickoff presentation Michael Legenc Advisor: Daniel Braun Munich, 24.07.2017

More information

Development of a Social Extension for Real-Time Communication in CAD Software

Development of a Social Extension for Real-Time Communication in CAD Software Development of a Social Extension for Real-Time Communication in CAD Software Markus Müller, 2.11.2015 (Bachelor s Thesis, final presentation) Software Engineering for Business Information Systems (sebis)

More information

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Master thesis Final presentation Michael Legenc Advisor: Daniel Braun Munich, 08.01.2018

More information

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum.

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum. A Concept for the Design of Learning Resources for APIs of Content Management Platforms Guided research Final Presentation Sirma Gjorgievska, 23.05.2016 Software Engineering for Business Information Systems

More information

REST-based Data Integration Services for Software Engineering Domain

REST-based Data Integration Services for Software Engineering Domain REST-based Data Integration Services for Software Engineering Domain Fridolin Koch, Bachelor s Thesis Final Presentation Software Engineering for Business Information Systems (sebis) Department of Informatics

More information

Final Presentation Master s Thesis: Identification of Programming Patterns in Solidity

Final Presentation Master s Thesis: Identification of Programming Patterns in Solidity Final Presentation Master s Thesis: Identification of Programming Patterns in Solidity Franz Volland, 04 th June 2018, Scientific advisor: Ulrich Gallersdörfer Chair of Software Engineering for Business

More information

Master s Thesis Kickoff Semantic Analysis and Structuring of German Legal Documents using Named Entity Recognition and Disambiguation

Master s Thesis Kickoff Semantic Analysis and Structuring of German Legal Documents using Named Entity Recognition and Disambiguation Master s Thesis Kickoff Semantic Analysis and Structuring of German Legal Documents using Named Entity Recognition and Disambiguation Ingo Glaser, 10.04.2017 Chair of Software Engineering for Business

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Design and Implementation of a Bikesharing Service as part of an open Mobility-Ecosystem Master Thesis - Final Presentation Weidner, Lucas

Design and Implementation of a Bikesharing Service as part of an open Mobility-Ecosystem Master Thesis - Final Presentation Weidner, Lucas Design and Implementation of a Bikesharing Service as part of an open Mobility-Ecosystem Master Thesis - Final Presentation Weidner, Lucas 21.11.2016 Software Engineering for Business Information Systems

More information

Non Functional Requirement Traceability Automation-an Mobile Multimedia Approach

Non Functional Requirement Traceability Automation-an Mobile Multimedia Approach Journal of Computer Science 2012, 8 (11), 1803-1808 ISSN 1549-3636 2012 doi:10.3844/jcssp.2012.1803.1808 Published Online 8 (11) 2012 (http://www.thescipub.com/jcs.toc) Non Functional Requirement Traceability

More information

DIGIT.B4 Big Data PoC

DIGIT.B4 Big Data PoC DIGIT.B4 Big Data PoC RTD Health papers D02.02 Technological Architecture Table of contents 1 Introduction... 5 2 Methodological Approach... 6 2.1 Business understanding... 7 2.2 Data linguistic understanding...

More information

Bachelor s Thesis: Conceptualization and Implementation of a Rule-based Workbench for Textual Pattern Annotation

Bachelor s Thesis: Conceptualization and Implementation of a Rule-based Workbench for Textual Pattern Annotation Bachelor s Thesis: Conceptualization and Implementation of a Rule-based Workbench for Textual Pattern Annotation Georg Bonczek, 2017 Chair of Software Engineering for Business Information Systems (sebis)

More information

Unknown Malicious Code Detection Based on Bayesian

Unknown Malicious Code Detection Based on Bayesian Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 3836 3842 Advanced in Control Engineering and Information Science Unknown Malicious Code Detection Based on Bayesian Yingxu Lai

More information

Parts of Speech, Named Entity Recognizer

Parts of Speech, Named Entity Recognizer Parts of Speech, Named Entity Recognizer Artificial Intelligence @ Allegheny College Janyl Jumadinova November 8, 2018 Janyl Jumadinova Parts of Speech, Named Entity Recognizer November 8, 2018 1 / 25

More information

Java Archives Search Engine Using Byte Code as Information Source

Java Archives Search Engine Using Byte Code as Information Source Java Archives Search Engine Using Byte Code as Information Source Oscar Karnalim School of Electrical Engineering and Informatics Bandung Institute of Technology Bandung, Indonesia 23512012@std.stei.itb.ac.id

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Applications of Machine Learning on Keyword Extraction of Large Datasets

Applications of Machine Learning on Keyword Extraction of Large Datasets Applications of Machine Learning on Keyword Extraction of Large Datasets 1 2 Meng Yan my259@stanford.edu 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

More information

Social Content and Model Management using SocioCortex Thinking and working together

Social Content and Model Management using SocioCortex Thinking and working together Social Content and Model Management using SocioCortex Thinking and working together Florian Matthes,12.6.2015 Software Engineering for Business Information Systems (sebis) Department of Informatics Technische

More information

Stream Processing on IoT Devices using Calvin Framework

Stream Processing on IoT Devices using Calvin Framework Stream Processing on IoT Devices using Calvin Framework by Ameya Nayak A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Supervised

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Text Classification for Spam Using Naïve Bayesian Classifier

Text Classification for  Spam Using Naïve Bayesian Classifier Text Classification for E-mail Spam Using Naïve Bayesian Classifier Priyanka Sao 1, Shilpi Chaubey 2, Sonali Katailiha 3 1,2,3 Assistant ProfessorCSE Dept, Columbia Institute of Engg&Tech, Columbia Institute

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf, 21.11.2016, Munich Software Engineering for Business Information Systems (sebis) Department of

More information

Chapter 1 - The Spark Machine Learning Library

Chapter 1 - The Spark Machine Learning Library Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices

More information

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory

More information

Higher level data processing in Apache Spark

Higher level data processing in Apache Spark Higher level data processing in Apache Spark Pelle Jakovits 12 October, 2016, Tartu Outline Recall Apache Spark Spark DataFrames Introduction Creating and storing DataFrames DataFrame API functions SQL

More information

Detecting ads in a machine learning approach

Detecting ads in a machine learning approach Detecting ads in a machine learning approach Di Zhang (zhangdi@stanford.edu) 1. Background There are lots of advertisements over the Internet, who have become one of the major approaches for companies

More information

Dependent Types and Irrelevance

Dependent Types and Irrelevance Dependent Types and Irrelevance Christoph-Simon Senjak Technische Universität München Institut für Informatik Boltzmannstraße 3 85748 Garching PUMA Workshop September 2012 Dependent Types Dependent Types

More information

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8 Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions

More information

Security Policies and Procedures Principles and Practices

Security Policies and Procedures Principles and Practices Security Policies and Procedures Principles and Practices by Sari Stern Greene Chapter 3: Information Security Framework Objectives Plan the protection of the confidentiality, integrity and availability

More information

CS294-1 Final Project. Algorithms Comparison

CS294-1 Final Project. Algorithms Comparison CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we

More information

Cloud-Security: Show-Stopper or Enabling Technology?

Cloud-Security: Show-Stopper or Enabling Technology? Cloud-Security: Show-Stopper or Enabling Technology? Fraunhofer Institute for Secure Information Technology (SIT) Technische Universität München Open Grid Forum, 16.3,. 2010, Munich Overview 1. Cloud Characteristics

More information

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects Borg, Markus; Runeson, Per; Johansson, Jens; Mäntylä, Mika Published in: [Host publication title missing]

More information

RELATIONSHIP BETWEEN THE ISO SERIES OF STANDARDS AND OTHER PRODUCTS OF ISO/TC 46/SC 11: 1. Records processes and controls 2012

RELATIONSHIP BETWEEN THE ISO SERIES OF STANDARDS AND OTHER PRODUCTS OF ISO/TC 46/SC 11: 1. Records processes and controls 2012 RELATIONSHIP BETWEEN THE ISO 30300 SERIES OF STANDARDS AND OTHER PRODUCTS OF ISO/TC 46/SC 11: Records processes and controls White paper written by ISO TC46/SC11- Archives/records management Date: March

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Christoph Doblander. Joint work with: Christoph Goebel, Hans-Arno Jacobsen

Christoph Doblander. Joint work with: Christoph Goebel, Hans-Arno Jacobsen Smart Grid Simulation 3rd Colloquium of the Munich School of Engineering: Research Towards Innovative Energy Systems and Materials Garching, 04.07.2013 Christoph Doblander Joint work with: Christoph Goebel,

More information

Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network

Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network Marek Grochowski and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Grudzi adzka

More information

On the automatic classification of app reviews

On the automatic classification of app reviews Requirements Eng (2016) 21:311 331 DOI 10.1007/s00766-016-0251-9 RE 2015 On the automatic classification of app reviews Walid Maalej 1 Zijad Kurtanović 1 Hadeer Nabil 2 Christoph Stanik 1 Walid: please

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology Code of practice for information security management

ISO/IEC INTERNATIONAL STANDARD. Information technology Code of practice for information security management INTERNATIONAL STANDARD ISO/IEC 17799 First edition 2000-12-01 Information technology Code of practice for information security management Technologies de l'information Code de pratique pour la gestion

More information

A Heuristic Robust Approach for Real Estate Valuation in Areas with Few Transactions

A Heuristic Robust Approach for Real Estate Valuation in Areas with Few Transactions Presented at the FIG Working Week 2017, A Heuristic Robust Approach for Real Estate Valuation in May 29 - June 2, 2017 in Helsinki, Finland FIG Working Week 2017 Surveying the world of tomorrow From digitalisation

More information

Software Protection via Obfuscation

Software Protection via Obfuscation Software Protection via Obfuscation Ciprian Lucaci InfoSec Meetup #1 1 About me Software Protection via Obfuscation - Ciprian LUCACI 2 About me 2008-2012 # Bachelor Computer Science @ Politehnica Univerity

More information

Identifying Low-Quality YouTube Comments Alex Trytko and Stephen Young CS229 Final Project - Fall 2012

Identifying Low-Quality YouTube Comments Alex Trytko and Stephen Young CS229 Final Project - Fall 2012 Identifying Low-Quality YouTube Comments Alex Trytko and Stephen Young CS229 Final Project - Fall 2012 YouTube provides an unparalleled platform for sharing and viewing video content of every imaginable

More information

Author Prediction for Turkish Texts

Author Prediction for Turkish Texts Ziynet Nesibe Computer Engineering Department, Fatih University, Istanbul e-mail: admin@ziynetnesibe.com Abstract Author Prediction for Turkish Texts The main idea of authorship categorization is to specify

More information

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Modeling Issues Modeling Enterprises. Modeling

Modeling Issues Modeling Enterprises. Modeling Modeling Issues Modeling Enterprises SE502: Software Requirements Engineering Modeling Modeling can guide elicitation: It can help you figure out what questions to ask It can help to surface hidden requirements

More information

Natural Language Processing Is No Free Lunch

Natural Language Processing Is No Free Lunch Natural Language Processing Is No Free Lunch STEFAN WAGNER UNIVERSITY OF STUTTGART, STUTTGART, GERMANY ntroduction o Impressive progress in NLP: OS with personal assistants like Siri or Cortan o Brief

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Data Blocks: Hybrid OLTP and OLAP on compressed storage Data Blocks: Hybrid OLTP and OLAP on compressed storage Ben Brümmer Technische Universität München Fürstenfeldbruck, 26. November 208 Ben Brümmer 26..8 Lehrstuhl für Datenbanksysteme Problem HDD/Archive/Tape-Storage

More information

Chapter 4: Apache Spark

Chapter 4: Apache Spark Chapter 4: Apache Spark Lecture Notes Winter semester 2016 / 2017 Ludwig-Maximilians-University Munich PD Dr. Matthias Renz 2015, Based on lectures by Donald Kossmann (ETH Zürich), as well as Jure Leskovec,

More information

Data Mining for Causal Analysis of Software Defects

Data Mining for Causal Analysis of Software Defects Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X International Conference

More information

Policy-Based Context-Management for Mobile Solutions

Policy-Based Context-Management for Mobile Solutions Policy-Based Context-Management for Mobile Solutions Caroline Funk 1,Björn Schiemann 2 1 Ludwig-Maximilians-Universität München Oettingenstraße 67, 80538 München caroline.funk@nm.ifi.lmu.de 2 Siemens AG,

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

Alberta Reliability Standard Cyber Security Electronic Security Perimeter(s) CIP-005-AB-5

Alberta Reliability Standard Cyber Security Electronic Security Perimeter(s) CIP-005-AB-5 A. Introduction 1. Title: 2. Number: 3. Purpose: To manage electronic access to BES cyber systems by specifying a controlled electronic security perimeter in support of protecting BES cyber systems against

More information

Chapter 4 Requirements Elicitation

Chapter 4 Requirements Elicitation Object-Oriented Software Engineering Using UML, Patterns, and Java Chapter 4 Requirements Elicitation Outline Today: Motivation: Software Lifecycle Requirements elicitation challenges Problem statement

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Management. Software Quality. Dr. Stefan Wagner Technische Universität München. Garching 28 May 2010

Management. Software Quality. Dr. Stefan Wagner Technische Universität München. Garching 28 May 2010 Technische Universität München Software Quality Management Dr. Stefan Wagner Technische Universität München Garching 28 May 2010 Some of these slides were adapted from the tutorial "Clone Detection in

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology Open distributed processing Reference model: Architecture

ISO/IEC INTERNATIONAL STANDARD. Information technology Open distributed processing Reference model: Architecture INTERNATIONAL STANDARD ISO/IEC 10746-3 Second edition 2009-12-15 Information technology Open distributed processing Reference model: Architecture Technologies de l'information Traitement réparti ouvert

More information

In this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.

In this project, I examined methods to classify a corpus of  s by their content in order to suggest text blocks for semi-automatic replies. December 13, 2006 IS256: Applied Natural Language Processing Final Project Email classification for semi-automated reply generation HANNES HESSE mail 2056 Emerson Street Berkeley, CA 94703 phone 1 (510)

More information

MIDDLE EAST TECHNICAL UNIVERSITY ENGINEERING FACULTY DEPARTMENT OF COMPUTER ENGINEERING. Vitriol. Software Design Document GROUP MALLORN

MIDDLE EAST TECHNICAL UNIVERSITY ENGINEERING FACULTY DEPARTMENT OF COMPUTER ENGINEERING. Vitriol. Software Design Document GROUP MALLORN MIDDLE EAST TECHNICAL UNIVERSITY ENGINEERING FACULTY DEPARTMENT OF COMPUTER ENGINEERING Software Design Document GROUP MALLORN Merve Bozo Yaşar Berk Arı Sertaç Kağan Aydın Mustafa Orkun Acar Team Leader:

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E- Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Matrix Computations and " Neural Networks in Spark

Matrix Computations and  Neural Networks in Spark Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets

More information

CMPSC 311- Introduction to Systems Programming Module: Systems Programming

CMPSC 311- Introduction to Systems Programming Module: Systems Programming CMPSC 311- Introduction to Systems Programming Module: Systems Programming Professor Patrick McDaniel Fall 2015 WARNING Warning: for those not in the class, there is an unusually large number of people

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Processing of big data with Apache Spark

Processing of big data with Apache Spark Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT

More information

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Jaweria Kanwal Quaid-i-Azam University, Islamabad kjaweria09@yahoo.com Onaiza Maqbool Quaid-i-Azam University, Islamabad onaiza@qau.edu.pk

More information

Data Analytics with HPC. Data Streaming

Data Analytics with HPC. Data Streaming Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS Jonas Helming, Holger Arndt, Zardosht Hodaie, Maximilian Koegel, Nitesh Narayan Institut für Informatik,Technische Universität München, Garching, Germany {helming,

More information

Integration of Safety & Security in Automotive Electronic Systems

Integration of Safety & Security in Automotive Electronic Systems Automotive SPIN 12 November 2015 Integration of & in Automotive Electronic Systems John Favaro john.favaro@intecs.it 1 A Balance of Attributes We have come a long way Availability Dependability Reliability

More information

AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES

AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES K. P. M. L. P. Weerasinghe 149235H Faculty of Information Technology University of Moratuwa June 2017 AUTOMATED STUDENT S

More information

Web Service Recommendation Using Hybrid Approach

Web Service Recommendation Using Hybrid Approach e-issn 2455 1392 Volume 2 Issue 5, May 2016 pp. 648 653 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com Web Service Using Hybrid Approach Priyanshi Barod 1, M.S.Bhamare 2, Ruhi Patankar

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

MLI - An API for Distributed Machine Learning. Sarang Dev

MLI - An API for Distributed Machine Learning. Sarang Dev MLI - An API for Distributed Machine Learning Sarang Dev MLI - API Simplify the development of high-performance, scalable, distributed algorithms. Targets common ML problems related to data loading, feature

More information

Knowledge-based pattern recognition and visualization of error logs of time-based engine sensor data: Requirements engineering and tool-support

Knowledge-based pattern recognition and visualization of error logs of time-based engine sensor data: Requirements engineering and tool-support Knowledge-based pattern recognition and visualization of error logs of time-based engine sensor data: Requirements engineering and tool-support Viet Tiep Do, 09 February 2015 Software Engineering for Business

More information

BUILT FOR THE SPEED OF BUSINESS

BUILT FOR THE SPEED OF BUSINESS BUILT FOR THE SPEED OF BUSINESS 2 Pivotal MPP Databases and In-Database Analytics Shengwen Yang 2013-12-08 Outline About Pivotal Pivotal Greenplum Database The Crown Jewels of Greenplum (HAWQ) In-Database

More information

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING Volume 119 No. 16 2018, 937-948 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING K.Anusha

More information

INTEGRATED SECURITY SYSTEM FOR E-GOVERNMENT BASED ON SAML STANDARD

INTEGRATED SECURITY SYSTEM FOR E-GOVERNMENT BASED ON SAML STANDARD INTEGRATED SECURITY SYSTEM FOR E-GOVERNMENT BASED ON SAML STANDARD Jeffy Mwakalinga, Prof Louise Yngström Department of Computer and System Sciences Royal Institute of Technology / Stockholm University

More information

Digital Preservation: How to Plan

Digital Preservation: How to Plan Digital Preservation: How to Plan Preservation Planning with Plato Christoph Becker Vienna University of Technology http://www.ifs.tuwien.ac.at/~becker Sofia, September 2009 Outline Why preservation planning?

More information

Security analysis and assessment of threats in European signalling systems?

Security analysis and assessment of threats in European signalling systems? Security analysis and assessment of threats in European signalling systems? New Challenges in Railway Operations Dr. Thomas Störtkuhl, Dr. Kai Wollenweber TÜV SÜD Rail Copenhagen, 20 November 2014 Slide

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

CSE 444: Database Internals. Lecture 23 Spark

CSE 444: Database Internals. Lecture 23 Spark CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

ASC Chairman. Best Practice In Data Security In The Cloud. Speaker Name Dr. Eng. Bahaa Hasan

ASC Chairman. Best Practice In Data Security In The Cloud. Speaker Name Dr. Eng. Bahaa Hasan Regional Forum on Cybersecurity in the Era of Emerging Technologies & the Second Meeting of the Successful Administrative Practices -2017 Cairo, Egypt 28-29 November 2017 Best Practice In Data Security

More information