Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
|
|
- Virgil Daniels
- 5 years ago
- Views:
Transcription
1 Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing new. After all, before the term big data, airline reservation systems tracked millions of flight segments and bookings, and phone companies kept billions of call detail records. But now it is possible for small companies and individuals to access the same massive computational and storage resources using inexpensive commodity hardware and the cloud. Central to this data-ubiquity story is the open-source distributed computational framework called Apache Hadoop. Created at Yahoo, based on Google s MapReduce and Google File System publications, Hadoop allows large datasets to be stored and parallel-processed by spreading files across a large number of small commodity servers. Hadoop has a large following in both the commercial and open-source software communities. Reduction in the cost of hardware and linear scalability of Hadoop has resulted in an unprecedented amount of data being stored and analyzed to increase our understanding of the physical world, predict human behavior, and improve performance and security.
2 WHITEPAPER 2 DATA SCIENCE WITH HADOOP Hadoop is ideally suited for data science due to a number of important capabilities: Storing and processing extremely large datasets on inexpensive hardware (that can be scaled up as data volume increases and return on investment is proven) Storing data without having to conform it, a priori, to a particular data model Handling diverse and rapidly changing data streams Job tracking and management tools that break down complex analytic routines into simple map and reduce steps Hadoop presents a compelling opportunity for any organization that wants to base decisions on insights gained from mining detailed data. It makes petabytes of data available for in-depth analysis across hundreds if not thousands of CPUs while keeping costs under control either through scale-as-you-go commodity hardware or by leveraging the elasticity of the cloud. Furthermore, the MapReduce paradigm has become prevalent in research areas of machine learning. Increasingly researchers are attempting to adapt the sequential nature of learning and convex optimization theories to the parallelization paradigm of MapReduce. ADVANCED ANALYTICS AT SCALE FOR THE FEW The caveat, however, is that the user has to possess the expertise to program in one or more of Hadoop s highly technical languages like MapReduce, Apache Hive, Apache Pig, etc. Translating the tools and techniques of analytics into these frameworks represents a significant challenge. The result is that only a small number of Internet properties, social media, and ecommerce sites have forayed into using Hadoop for data science, while most organizations are still using it mainly for data transformations and the most basic analytics. To reap the benefits of Hadoop, these early adopters make substantial investments in teams of Java engineers and statisticians, and often contribute heavily to Hadoop-related open-source projects. While promising, the results often suffer from limitations in performance, ease of use, agility, or flexibility. SPOTFIRE MAKES DATA SCIENCE ON HADOOP TURN-KEY TIBCO Spotfire Data Science is the first native-hadoop data science application. It allows experienced and aspiring data scientists to leverage the parallel-processing capabilities of Hadoop using an intuitive web-based, drag-and-drop user interface. Spotfire Data Science eliminates the need to program complex statistical functions such as linear and logistic regressions, k-means clustering, decision trees, scatter plots, and so on. Instead, it allows them to concentrate on data analysis and model development. Spotfire Data Science handles the entire analytics lifecycle: data exploration, transformations, model building, model validation, and model deployment. Highly accurate predictive and descriptive models can be built with the Spotfire Data Science Workflow Editor in a matter of minutes, since the need to program is eliminated and data is processed where it resides. The Spotfire Data Science web-based application is designed for rapid, iterative, and collaborative model development. Users can start either with a blank canvas
3 WHITEPAPER 3 and then rapidly assemble an analytic workflow by dragging and dropping Hadoop files and various operators, or they can extend an existing analytic workflow created by one of their colleagues. Workflows are version controlled, and there are detailed logs available about each run, including the visual results of each operator as well as performance statistics. Spotfire Data Science has undergone an extensive amount of testing and validation to meet enterprise-level standards of performance and security in the context of the rapidly evolving Hadoop ecosystem of tools and technologies. A COMPLETE DATA SCIENCE ENVIRONMENT TRADITIONAL APPROACH TO MODELING The traditional approach to building and deploying models starts with a sample data extract from one or more databases or Hadoop clusters into a flat-file format. This limited dataset is then used for analysis and training of the model in a scripting-based tool such as R or SAS. The model parameters are then communicated via a specification document to the data engineer, who uses it to create scoring code (in Java, SQL, etc). Finally, the data is either scored directly in the data warehouse or, if it doesn t all reside in one place, it is scored using flat files. Final results are imported back into the database to drive the behavior of operational applications (for example, to determine the specific offer that should be discussed with a customer the next time she phones into a call center). AGILE APPROACH TIBCO s approach is radically different. We have done all the difficult programming so that the user does not have to. The user experience is as easy as drawing a process diagram. The algorithms that provide this powerful capability are uniformly designed with regard to data inputs, outputs, and exception reporting. In addition all operators are clearly documented so that there is no need to read code to understand how a particular algorithm is going to behave. We also ensure all programming logic is brought to where the data resides, and no data or model information is ever moved between environments. For practitioners who prefer more code-intensive and notebook style interfaces, Spotfire Data Science integrates directly with Jupyter Notebooks. Data scientists can create data pipelines in Python and store these as managed analytic assets within the platform so their work is never lost and is always associated with a dedicated project. In addition to providing a highly scalable and parallel analytics environment, Spotfire Data Science allows users to collaborate more effectively with their business counterparts, from defining the goals of a data science project, to operationalizing their results. STAGES OF DATA SCIENCE Intuitive and highly visual, Spotfire supports all the major phases of data science. The Spotfire Data Science Workflow Editor provides a rich pallet of operators allowing users to quickly create complete workflows that cover the typical progression of a data science project.
4 WHITEPAPER 4 CREATING ANALYTIC WORKFLOWS The process typically starts with the user browsing the files available on the Hadoop File System (HDFS). The user is then able to drag and drop an icon representing the HDFS file onto an analytic workflow. Spotfire will assist the user in applying structure to the file whether it is a delimited flat file, JSON, XML, Apache Log, etc. Once structure has been applied, a right-click menu exposes the various exploration operations available such as summary statistics, frequency analysis, box plots, etc. To gain a more in-depth understanding of complex datasets, and to identify patterns hidden in the data, the user can run an unsupervised algorithm like k-means clustering. The variable selection operator can help the user find the fields that have most influence on the quantity being analyzed. Spotfire provides common transformation operators like row/column filter, aggregations, pivots, etc. However the user can also directly inject Pig scripts for more complex transformations. The data can then be randomly sampled for model training and validation. Spotfire supports a comprehensive set of classic model types, including regressions, decision trees, time series, and clustering. With these the user can mine data for new insights: predicting events, segmenting customers, and optimizing campaigns. Once a model has been trained, Spotfire provides a number of tools for evaluating the accuracy of the model and comparing it with others. DEPLOYING MODELS Spotfire can export models in industry standard formats such as PMML and PFA, allowing users to operationalize their results on third-party platforms. Users can import PFA models to score against new data and utilize them in their Spotfire Data Science Workflows. Spotfire also provides a variety of standalone RESTful scoring engines that support PFA, a powerful option for those seeking to operationalize models in an efficient way. Spotfire Data Science also manages and version controls models so work is never lost between teams, and previous versions can be found easily.
5 WHITEPAPER 5 CONCLUSION Within all but a few organizations, the promise of data science on big data has yet to be realized. While platforms such as Hadoop have already demonstrated the power of parallel numerical processing applied to real-world problems, the techniques of data science are largely confined to separate silos of processing, accessible to a few highly-trained individuals, and rarely applied to anything but small samples of highly structured data. Nevertheless, early research indicates that most machine learning algorithms can be fully implemented within the Hadoop framework. Spotifre has gone one step further: making those cutting-edge implementations available to non-programmers and aspiring data scientists in a web-based, collaborative application that supports the analytics process from end to end. SYSTEM REQUIREMENTS & SELECTED PLATFORMS WEB REQUIREMENTS: Chrome Firefox SERVER REQUIREMENTS: Dedicated Server Quad Core CPU (Multiple recommended) 48GB of RAM or higher recommended 500GB Storage (RAID 1 mirroring) OPERATING SYSTEM: RHEL/CENTOS INTEGRATIONS: MADlib PMML Python (Jupyter Notebooks) R Tableau SUPPORTED HADOOP DISTRIBUTIONS: Cloudera CDH Hortonworks SUPPORTED DATA PLATFORMS AS DATA SOURCES: Greenplum Database Oracle Database (11g, Exadata) PostgreSQL SQL Server Teradata SUPPORTED DATA PLATFORMS AS ANALYTICAL SOURCES: Cloudera CDH Greenplum Database Hive Hortonworks MapR Oracle Database (11g, Exadata and SQL are stored for future use. The platform also offers an API extension for embedding Spotfire Data Science logic into different applications and processes. Pivotal HD Pivotal HAWQ PostgreSQl IBM Big Insights MapR Pivotal HAWQ Global Headquarters 3307 Hillview Avenue Palo Alto, CA TEL FAX TIBCO fuels digital business by enabling better decisions and faster, smarter actions through the TIBCO Connected Intelligence Cloud. From APIs and systems to devices and people, we interconnect everything, capture data in real time wherever it is, and augment the intelligence of your business through analytical insights. Thousands of customers around the globe rely on us to build compelling experiences, energize operations, and propel innovation. Learn how TIBCO makes digital smarter at , TIBCO Software Inc. All rights reserved. TIBCO, the TIBCO logo, and Spotfire are trademarks or registered trademarks of TIBCO Software Inc. or its subsidiaries in the United States and/or other countries. Apache, Hadoop, Hive, and Pig are trademarks of The Apache Software Foundation in the United States and/or other countries. All other product and company names and marks in this document are the property of their respective owners and mentioned for identification purposes only. 02/13/18
Ten Innovative Financial Services Applications Powered by Data Virtualization
Ten Innovative Financial Services Applications Powered by Data Virtualization DATA IS THE NEW ALPHA In an industry driven to deliver alpha, where might financial services firms find opportunities when
More informationHow to Accelerate Merger and Acquisition Synergies
How to Accelerate Merger and Acquisition Synergies MERGER AND ACQUISITION CHALLENGES Mergers and acquisitions (M&A) occur frequently in today s business environment; $3 trillion in 2017 alone. 1 M&A enables
More informationTIBCO Data Virtualization for the Energy Industry
TIBCO Data Virtualization for the Energy Industry USE CASES DESCRIBED: Offshore platform data analytics Well maintenance and repair Cross refinery web data services SAP master data quality TODAY S COMPLEX
More informationInformation empowerment for your evolving data ecosystem
Information empowerment for your evolving data ecosystem Highlights Enables better results for critical projects and key analytics initiatives Ensures the information is trusted, consistent and governed
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationUNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
More informationNew Approaches to Big Data Processing and Analytics
New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing
More informationDiving into Open Source Messaging: What Is Kafka?
Diving into Open Source Messaging: What Is Kafka? The world of messaging middleware has changed dramatically over the last 30 years. But in truth the world of communication has changed dramatically as
More informationTIBCO Cloud Integration Security Overview
TIBCO Cloud Integration Security Overview TIBCO Cloud Integration is secure, best-in-class Integration Platform as a Service (ipaas) software offered in a multi-tenant SaaS environment with centralized
More informationCisco Integration Platform
Data Sheet Cisco Integration Platform The Cisco Integration Platform fuels new business agility and innovation by linking data and services from any application - inside the enterprise and out. Product
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationSpotfire for the Enterprise: An Overview for IT Administrators
for the Enterprise: An Overview for IT Administrators This whitepaper is intended for those wanting information on TIBCO administration and deployment capabilities: its architecture, data connection, security,
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationSpotfire Advanced Data Services. Lunch & Learn Tuesday, 21 November 2017
Spotfire Advanced Data Services Lunch & Learn Tuesday, 21 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationTable 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti
Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationGain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources
More informationComposite Software Data Virtualization The Five Most Popular Uses of Data Virtualization
Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software, Inc. June 2011 TABLE OF CONTENTS INTRODUCTION... 3 DATA FEDERATION... 4 PROBLEM DATA CONSOLIDATION
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationThe Technology of the Business Data Lake. Appendix
The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationThe TIBCO Insight Platform 1. Data on Fire 2. Data to Action. Michael O Connell Catalina Herrera Peter Shaw September 7, 2016
The TIBCO Insight Platform 1. Data on Fire 2. Data to Action Michael O Connell Catalina Herrera Peter Shaw September 7, 2016 Analytics Journey with TIBCO Source: Gartner (May 2015) The TIBCO Insight Platform:
More informationAccelerate your SAS analytics to take the gold
Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more
More informationBig Data The end of Data Warehousing?
Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationIntroducing SAS Model Manager 15.1 for SAS Viya
ABSTRACT Paper SAS2284-2018 Introducing SAS Model Manager 15.1 for SAS Viya Glenn Clingroth, Robert Chu, Steve Sparano, David Duling SAS Institute Inc. SAS Model Manager has been a popular product since
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationProgress DataDirect For Business Intelligence And Analytics Vendors
Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationEXTRACT DATA IN LARGE DATABASE WITH HADOOP
International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0
More informationAccelerate AI with Cisco Computing Solutions
Accelerate AI with Cisco Computing Solutions Data is everywhere. Your data scientists are propelling your business into a future of data-driven intelligence. But how do you deploy and manage artificial
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationComposable Infrastructure for Public Cloud Service Providers
Composable Infrastructure for Public Cloud Service Providers Composable Infrastructure Delivers a Cost Effective, High Performance Platform for Big Data in the Cloud How can a public cloud provider offer
More informationSyncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET
SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital
More informationApache HAWQ (incubating)
HADOOP NATIVE SQL What is HAWQ? Apache HAWQ (incubating) Is an elastic parallel processing SQL engine that runs native in Apache Hadoop to directly access data for advanced analytics. Why HAWQ? Hadoop
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationThe Hadoop Paradigm & the Need for Dataset Management
The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex
More informationPutting it all together: Creating a Big Data Analytic Workflow with Spotfire
Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationBUILT FOR THE SPEED OF BUSINESS
BUILT FOR THE SPEED OF BUSINESS 2 Pivotal MPP Databases and In-Database Analytics Shengwen Yang 2013-12-08 Outline About Pivotal Pivotal Greenplum Database The Crown Jewels of Greenplum (HAWQ) In-Database
More informationDeploying, Managing and Reusing R Models in an Enterprise Environment
Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationFEATURES BENEFITS SUPPORTED PLATFORMS. Reduce costs associated with testing data projects. Expedite time to market
E TL VALIDATOR DATA SHEET FEATURES BENEFITS SUPPORTED PLATFORMS ETL Testing Automation Data Quality Testing Flat File Testing Big Data Testing Data Integration Testing Wizard Based Test Creation No Custom
More informationSpotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017
Spotfire: Brisbane Breakfast & Learn Thursday, 9 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication
More informationby Cisco Intercloud Fabric and the Cisco
Expand Your Data Search and Analysis Capability Across a Hybrid Cloud Solution Brief June 2015 Highlights Extend Your Data Center and Cloud Build a hybrid cloud from your IT resources and public and providerhosted
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationIan Choy. Technology Solutions Professional
Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationTalend Open Studio for Big Data. Getting Started Guide 5.3.2
Talend Open Studio for Big Data Getting Started Guide 5.3.2 Talend Open Studio for Big Data Adapted for v5.3.2. Supersedes previous Getting Started Guide releases. Publication date: January 24, 2014 Copyleft
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationQlik Sense Enterprise architecture and scalability
White Paper Qlik Sense Enterprise architecture and scalability June, 2017 qlik.com Platform Qlik Sense is an analytics platform powered by an associative, in-memory analytics engine. Based on users selections,
More informationVMware Cloud Operations Management Technology Consulting Services
VMware Cloud Operations Management Technology Consulting Services VMware Technology Consulting Services for Cloud Operations Management The biggest hurdle [that CIOs face as they move infrastructure and
More informationDriveScale-DellEMC Reference Architecture
DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationIBM Data Replication for Big Data
IBM Data Replication for Big Data Highlights Stream changes in realtime in Hadoop or Kafka data lakes or hubs Provide agility to data in data warehouses and data lakes Achieve minimum impact on source
More informationDATACENTER SERVICES DATACENTER
SERVICES SOLUTION SUMMARY ALL CHANGE React, grow and innovate faster with Computacenter s agile infrastructure services Customers expect an always-on, superfast response. Businesses need to release new
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationProvide Real-Time Data To Financial Applications
Provide Real-Time Data To Financial Applications DATA SHEET Introduction Companies typically build numerous internal applications and complex APIs for enterprise data access. These APIs are often engineered
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationApplication of machine learning and big data technologies in OpenAIRE system
Application of machine learning and big data technologies in OpenAIRE system Warsztaty Orange z cyklu Centrum Badawczo Rozwojowe zaprasza Mateusz Kobos, ICM, Univeristy of Warsaw Warszawa, 2017-05-10 OpenAIRE
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationBUSTED! 5 COMMON MYTHS OF MODERN INFRASTRUCTURE. These Common Misconceptions Could Be Holding You Back
BUSTED! 5 COMMON MYTHS OF MODERN INFRASTRUCTURE These Common Misconceptions Could Be Holding You Back 2 IT Is Facing a New Set of Challenges As technology continues to evolve, IT must adjust to changing
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationBig Data Specialized Studies
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationBuilding a Data Strategy for a Digital World
Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationFalling Out of the Clouds: When Your Big Data Needs a New Home
Falling Out of the Clouds: When Your Big Data Needs a New Home Executive Summary Today s public cloud computing infrastructures are not architected to support truly large Big Data applications. While it
More informationREDUCE TCO AND IMPROVE BUSINESS AND OPERATIONAL EFFICIENCY
SOLUTION OVERVIEW REDUCE TCO AND IMPROVE BUSINESS AND OPERATIONAL EFFICIENCY Drive Up Operational Efficiency and Drive Down TCO VMware HCI with Operations Management is the foundation for modern infrastructure,
More informationWhy Converged Infrastructure?
Why Converged Infrastructure? Three reasons to consider converged infrastructure for your organization Converged infrastructure isn t just a passing trend. It s here to stay. According to a recent survey
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationCisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr
Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for
More informationQuickPivot s Interact Coordinated, Dynamic Messaging
QuickPivot s Interact Coordinated, Dynamic Messaging Marketers are often saddled with conflicting or redundant marketing tools that don t make it easy for them to deliver consistent customer experiences.
More informationTECHNOLOGY SOLUTION EVOLUTION
JAR PLATFORM JORVAK TECHNOLOGY SOLUTION EVOLUTION 1990s Build Your Own Time to Production Present Time Highly Configurable Hybrid Platforms Universal Connectivity Application Screens Integrations/Reporting
More informationTIBCO Enterprise Runtime for R Performance Guide
TIBCO Enterprise Runtime for R Performance Guide http://www.tibco.com Global Headquarters 3303 Hillview Avenue Palo Alto, CA 94304 Tel: +1 650-846-1000 Toll Free: 1 800-420-8450 Fax: +1 650-846-1005 2014,
More informationTaming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems
1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationIBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse
IBM dashdb Local Using a software-defined environment in a private cloud to enable hybrid data warehousing Evolving the data warehouse Managing a large-scale, on-premises data warehouse environments to
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationCisco Start. IT solutions designed to propel your business
Cisco Start IT solutions designed to propel your business Small and medium-sized businesses (SMBs) typically have very limited resources to invest in new technologies. With every IT investment made, they
More informationWhat does SAS Data Management do? For whom is SAS Data Management designed? Key Benefits
FACT SHEET SAS Data Management Transform raw data into a valuable business asset What does SAS Data Management do? SAS Data Management helps transform, integrate, govern and secure data while improving
More informationHierarchy of knowledge BIG DATA 9/7/2017. Architecture
BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More information