KNIME for the life sciences Cambridge Meetup
|
|
- Delphia McKenzie
- 5 years ago
- Views:
Transcription
1 KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016
2 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More about the company and the community Some highlights from recent releases Copyright 2014 KNIME.com AG 2
3 The KNIME Analytics Platform Copyright 2014 KNIME.com AG 3
4 Visual KNIME Workflows NODES perform tasks on data Inputs Status Nodes are combined to create WORKFLOWS Outputs Not Configured Idle Executed Error Copyright 2014 KNIME.com AG 4
5 Over 1000 native and embedded nodes included: Data Access MySQL, Oracle,... SAS, SPSS,... Excel, Flat,... Hive, Impala,... XML, JSON, PMML Text, Doc, Image,... Web Crawlers Industry Specific Community / 3rd Transformation Row, Column Matrix Text, Image Time Series Java Python Community / 3rd Analysis & Mining Statistics Data Mining Machine Learning Web Analytics Text Mining Network Analysis Social Media Analysis R, Weka, Python Community / 3rd Visualization R JFreeChart JavaScript Community / 3rd Deployment via BIRT PMML XML, JSON Databases Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd Copyright 2014 KNIME.com AG 5
6 Why is this important? Real world data analysis: Lots of heterogeneous data from multiple sources (data blending) Complex questions to ask of the data Need to apply multiple tools (tool blending) Copyright 2014 KNIME.com AG 6
7 The problem is that we don t have one of these for working with our data Copyright 2014 KNIME.com AG 7
8 The problem is that we don t have one of these for working with our data If all of your problems look like this: then this is the perfect tool: It s amazing how simple you can make complex things if you control the entire process Copyright 2014 KNIME.com AG 8
9 We tend to need a broader assortment of tools for our data photos/mtnee r_ man/ If we re lucky they are this well organized Copyright 2014 KNIME.com AG Copyright 2016 KNIME.com AG 9
10 but this is a lot more common photos/tilde -lifestyle -photog raphy/ / Copyright 2014 KNIME.com AG Copyright 2016 KNIME.com AG 10 10
11 Why is this important? Real world data analysis: Lots of heterogeneous data from multiple sources (data blending) Complex questions to ask of the data Need to apply multiple tools (tool blending) Things that would be great: We didn t have to spend half our time converting file formats We could figure out later what we did, repeat it, and share that with others This is where KNIME comes in Copyright 2014 KNIME.com AG 11
12 KNIME: the company 12
13 KNIME KNIME.com AG founded in 2008 Offices in Zurich (HQ), Konstanz, Berlin, and San Francisco 20+ employees Maintainer of the Open Source KNIME Analytics Platform comprehensive data loading, processing, analysis, modeling platform visual frontend open: to all sorts of data, other tools (R and Python, a.o.), various user personas 20 open source releases since 2006 open source. KNIME Commercial Extensions for Collaboration, Productivity, Performance 14 commercial product releases since 2008 Copyright 2014 KNIME.com AG 13
14 Broad Range of KNIME Application Areas & Customers Pharma Manufacturing Health Care Advanced Analytics Customer Intelligence Finance Retail Copyright 2014 KNIME.com AG 14
15 Happy users! Source: Copyright 2014 KNIME.com AG 15
16 KNIME Analytics Platform: Try it Now! 1. Download from 2. Browse the KNIME Learning Hub at 3. Download your free copy of the KNIME Beginner s Guide from: (use code: KNIME_Boston2016) 4. Visit us here or at our Forum: Copyright 2014 KNIME.com AG 16
17 The KNIME Ecosystem 17 17
18 KNIME Software KNIME commercial extensions to the platform for collaboration, productivity, performance Copyright 2014 KNIME.com AG 18
19 KNIME Server Copyright 2014 KNIME.com AG 19
20 KNIME Big Data Extensions (commercial license required!) KNIME Big Data Connectors Package required drivers/libraries for specific HDFS, Hive, Impala access Hive (Big Data Extension) Cloudera Impala (Big Data Extension) Extends the open source database integration KNIME Spark Executor Package required drivers to submit Spark jobs Wraps Spark DB manipulations and MLlib modules Copyright 2014 KNIME.com AG 20
21 Big Data Connectors Same mode of operation as the standard KNIME database connectors Operations are performed within the database Copyright 2014 KNIME.com AG 21
22 KNIME Spark Executor Based on Spark MLlib Scalable machine learning library Runs on Hadoop Algorithms for Classification (decision tree, naïve bayes, ) Regression (logistic regression, linear regression, ) Clustering (k-means) Collaborative filtering (ALS) Dimensionality reduction (SVD, PCA) Copyright 2014 KNIME.com AG 22
23 Familiar Usage Model Usage model and dialogs similar to existing nodes No coding required Copyright 2014 KNIME.com AG 23
24 The KNIME community 24
25 Openness and the community Very active user community (check the forums) >250 people at the 2016 KNIME Summit in Berlin The KNIME Analytics Platform is both open source and an open platform. Technology partners: provide and support nodes for their (usually commercial) softare. Some examples: Schrodinger, ChemAxon/InfoCom, CCG, Cresset We encourage the community to produce nodes (or sets of nodes) and share them with each other. Trusted Community Extensions for community contributions that meet a certain quality level. Copyright 2014 KNIME.com AG 25
26 Some of the community contributions: This is the subset more relevant to drug discovery Copyright 2014 KNIME.com AG 26
27 Highlights of recent additions in KNIME 3.1 and 3.2 Complete lists:
28 Streaming Default Execution Streaming Execution Row-wise Process, pass & forget Faster with less I/O overhead Concurrent execution Copyright 2014 KNIME.com AG 28
29 Streaming Pros and Cons Advantages Less I/O overhead (process, pass & forget) Parallelization Disadvantages No intermediate results, no interactive execution Not all nodes can be streamed Copyright 2014 KNIME.com AG 29
30 Trees / Forest / Ensembles Random forest node (simplification of the treeensemble node) Support of binary splits for nominal attributes Missing value handling Support of byte vector data (high-dimension count fingerprints) Code optimization Runtime Memory Copyright 2014 KNIME.com AG 30
31 Trees and Tree Ensembles: New nodes Gradient Boosting Also based on tree ensembles Boosting: Improving an existing model by adding a new model Shallow trees Random Forest Distance Distance measure induced by a random forest Based on proximity Copyright 2014 KNIME.com AG 31
32 Feature Selection Automated help for narrowing down the best set of features for a model Supports forward and backward selection Copyright 2014 KNIME.com AG 32
33 Deeplearning4j KNIME Integration Easy network architecture design Modular Layerwisedesign of networks Model Import/Export Caffe Import Beginner friendly Import pretrained networks Highly configurable Supports word2vec and doc2vec Copyright 2014 KNIME.com AG 33
34 Deeplearning4j KNIME Integration Copyright 2014 KNIME.com AG 34
35 Active Learning Labs Extension Involve user to construct training data set Workflow loop to query and label interesting data points Used user-labeled data set on remaining data Copyright 2014 KNIME.com AG 35
36 R Integration Rewrite of infrastructure Significantly faster Concurrent execution No change of usage model Copyright 2014 KNIME.com AG 36
37 MongoDB and JSON (I) MongoDB is a NoSQL database based on JSON Special set of nodes due to lack of a standard SQL interface Copyright 2014 KNIME.com AG 37
38 MongoDB and JSON (II) JSON nodes for working with JSON data Similar to the XML nodes Use combination of MongoDB and JSON nodes Copyright 2014 KNIME.com AG 38
39 Semantic Web/Linked Data Integration Access and manipulate semantic web resources e.g. DBpedia Execute semantic queries via SPARQL Usage model similar to database integration Copyright 2014 KNIME.com AG 39
40 Other cool stuff Workflow coach: suggests next nodes to use Copyright 2014 KNIME.com AG 40
41 Take homes Open platform based on open-source software backed by a commercial entity providing enterprise extensions and support Strong focus on data blending and tool blending Active and engaged community Great support for life sciences/chemistry from the community Copyright 2014 KNIME.com AG 41
42 Thanks! Enjoy the other talks. 42
43 14-16 September, 2016 San Francisco 43
Installation KNIME AG. All rights reserved. 1
Installation 1. Install KNIME Analytics Platform (from thumb drive) 2. Help > Install New Software > Add (> Archive): 00_InstallationFiles/CommunityContributions_trunk.zip https://update.knime.org/community-contributions/trunk
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationANALYSIS OF LARGE GRAPH DATA WITH GRADOOP AND KNIME
ANALYSIS OF LARGE GRAPH DATA WITH GRADOOP AND KNIME ALEXANDER KIPP (ROBERT BOSCH GMBH), STEFFEN DIENST, STEFAN KÜHNE (UNIVERSITÄT LEIPZIG), TOBIAS KÖTTER (KNIME) Bosch Smart Semantics Application fields
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationKNIME Big Data Training
KNIME Big Data Training education@knime.com Overview KNIME Analytics Platform 1 2 What is KNIME Analytics Platform? A tool for data analysis, manipulation, visualization, and reporting Based on the graphical
More informationGoing Big Data on Apache Spark. KNIME Italy Meetup
Going Big Data on Apache Spark KNIME Italy Meetup Agenda Introduction Why Apache Spark? Section 1 Gathering Requirements Section 2 Tool Choice Section 3 Architecture Section 4 Devising New Nodes Section
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationKNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland
KNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland Data Access ASCII (File/CSV Reader, ) Excel Web Services Remote Files (http, ftp, ) Other domain standards (e.g. Sdf) Databases Data
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More information7 Techniques for Data Dimensionality Reduction
7 Techniques for Data Dimensionality Reduction Rosaria Silipo KNIME.com The 2009 KDD Challenge Prediction Targets: Churn (contract renewals), Appetency (likelihood to buy specific product), Upselling (likelihood
More informationTackling Big Data Using MATLAB
Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate
More informationDistributed Computing with Spark and MapReduce
Distributed Computing with Spark and MapReduce Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Traditional Network Programming Message-passing between nodes (e.g. MPI) Very difficult to do at scale:» How
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationOracle Big Data Science
Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri
More informationHow to choose the right approach to analytics and reporting
SOLUTION OVERVIEW How to choose the right approach to analytics and reporting A comprehensive comparison of the open source and commercial versions of the OpenText Analytics Suite In today s digital world,
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationBig Data and FrameWorks; Perspectives to Applied Machine Learning
Big Data and FrameWorks; Perspectives to Applied Machine Learning Mehdi Habibzadeh PhD in Computer Science Outlines (Oct 2016) : Big Data and Challenges Review and Trends Math and Probability Concepts
More informationData Science with PostgreSQL
Balázs Bárány Data Scientist pgconf.de 2015 Contents Introduction What is Data Science? Process model Tools and methods of Data Scientists Business & data understanding Preprocessing Modeling Evaluation
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationOracle Big Data Science IOUG Collaborate 16
Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle
More informationDeploying, Managing and Reusing R Models in an Enterprise Environment
Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationSparkling Water. August 2015: First Edition
Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationUsing Existing Numerical Libraries on Spark
Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationR Language for the SQL Server DBA
R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationThe Top 10 New Features in KNIME 2.8. Rosaria Silipo KNIME.com AG, San Francisco
The Top 10 New Features in KNIME 2.8 Rosaria Silipo KNIME.com AG, San Francisco KNIME 2.8 KNIME 2.8 was out end of July 2013 Many New Features Documentation available at: http://tech.knime.org/whats-new-in-knime-28
More informationChapter 1 - The Spark Machine Learning Library
Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices
More informationRethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu
RethinkDB Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu Content Introduction System Features Data Model ReQL Applications Introduction Niharika Vithala What is a NoSQL Database Databases that
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationHigher level data processing in Apache Spark
Higher level data processing in Apache Spark Pelle Jakovits 12 October, 2016, Tartu Outline Recall Apache Spark Spark DataFrames Introduction Creating and storing DataFrames DataFrame API functions SQL
More informationKNIME User Training KNIME AG. Copyright 2017 KNIME AG
KNIME User Training KNIME AG Overview KNIME Analytics Platform 1 2 What is KNIME Analytics Platform? A tool for data analysis, manipulation, visualization, and reporting Based on the graphical programming
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationData Analytics and Machine Learning: From Node to Cluster
Data Analytics and Machine Learning: From Node to Cluster Presented by Viswanath Puttagunta Ganesh Raju Understanding use cases to optimize on ARM Ecosystem Date BKK16-404B March 10th, 2016 Event Linaro
More informationOracle Big Data Fundamentals Ed 1
Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationData Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens
Data Science and Open Source Software Iraklis Varlamis Assistant Professor Harokopio University of Athens varlamis@hua.gr What is data science? 2 Why data science is important? More data (volume, variety,...)
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationModernization and how to implement Digital Transformation. Jarmo Nieminen Sales Engineer, Principal
Modernization and how to implement Digital Transformation Jarmo Nieminen Sales Engineer, Principal jarmo.nieminen@progress.com 2 Reinvented 8000 years old tool...? Leveraxe!! 3 In this Digital Economy...
More informationKNIME Analytics Platform Course for Beginners
KNIME Analytics Platform Course for Beginners KNIME AG Overview KNIME Analytics Platform 1 2 What is KNIME Analytics Platform? A tool for data analysis, manipulation, visualization, and reporting Based
More informationIntegrating Advanced Analytics with Big Data
Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting
More informationAccelerating Spark Workloads using GPUs
Accelerating Spark Workloads using GPUs Rajesh Bordawekar, Minsik Cho, Wei Tan, Benjamin Herta, Vladimir Zolotov, Alexei Lvov, Liana Fong, and David Kung IBM T. J. Watson Research Center 1 Outline Spark
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More information1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions
1Z0-449 Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions Table of Contents Introduction to 1Z0-449 Exam on Oracle Big Data 2017 Implementation Essentials... 2 Oracle 1Z0-449
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More informationWhat is KNIME? workflows nodes standard data mining, data analysis data manipulation
KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and
More informationOracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationCERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)
CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program
More informationFrom Insight to Action: Analytics from Both Sides of the Brain. Vaz Balasingham Director of Solutions Consulting
From Insight to Action: Analytics from Both Sides of the Brain Vaz Balasingham Director of Solutions Consulting vbalasin@tibco.com Insight to Action from Both Sides of the Brain Value Grow Revenue Reduce
More informationOracle Big Data SQL High Performance Data Virtualization Explained
Keywords: Oracle Big Data SQL High Performance Data Virtualization Explained Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data SQL, SQL, Big Data, Hadoop, NoSQL Databases, Relational Databases,
More informationTPCX-BB (BigBench) Big Data Analytics Benchmark
TPCX-BB (BigBench) Big Data Analytics Benchmark Bhaskar D Gowda Senior Staff Engineer Analytics & AI Solutions Group Intel Corporation bhaskar.gowda@intel.com 1 Agenda Big Data Analytics & Benchmarks Industry
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationDistributed Machine Learning" on Spark
Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationBacktesting with Spark
Backtesting with Spark Patrick Angeles, Cloudera Sandy Ryza, Cloudera Rick Carlin, Intel Sheetal Parade, Intel 1 Traditional Grid Shared storage Storage and compute scale independently Bottleneck on I/O
More informationCIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench
CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been
More informationWindows Azure Overview
Windows Azure Overview Christine Collet, Genoveva Vargas-Solar Grenoble INP, France MS Azure Educator Grant Packaged Software Infrastructure (as a Service) Platform (as a Service) Software (as a Service)
More informationApache Spark and Scala Certification Training
About Intellipaat Intellipaat is a fast-growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More information