Design Document. Version 1.1. (A Project sponsored by HadoopExpress.com, a Net Serpents enterprise) All rights reserved
|
|
- Jeremy Little
- 5 years ago
- Views:
Transcription
1 Design Document Product Intelligence on Medical Devices Version 1.1 (A Project sponsored by HadoopExpress.com, a Net Serpents enterprise) copyright 2016 Net Serpents LLC, 2001 Route 46, Parsippany NJ All rights reserved This document contains material protected under international and federal copyright laws and treaties. Any unauthorized reprint or use of material is strictly prohibited 1
2 Contents 1. Introduction 1.1. Scope 1.2. Overview 2. General Description 2.1 POC Perspective 2.2 Tools Used 2.3 General Constraints 2.4 Assumption 3. Design Details 3.1 Technology Architecture 3.2 Database Schema 3.3 R Shiny User Interface 4. Reusable Components 2
3 1. Introduction 1.1 Scope The Design document give us an overview on the architectural design and the prediction model to be implemented for product intelligence on medical devices using the MAUDE (Manufacturer and User Facility Device Experience) data available in FDA site. It also shows how big data components are used to store raw data, preprocess and build a predictive model. The Product Intelligence on Medical Devices shall also support in informed decision making based on the forecasting that was implemented as part of this scope. 2. General Description 2.1 Proof of Concept Perspective The datasets present in the local file system are moved to HDFS. Preprocessing, filtering and cleaning of data is done through shell scripting in HDFS and the cleaned datasets are moved to HIVE. Final dataset is prepared in HIVE comprising of the following columns MDR Report Key Device Event Key Brand Name Generic Name Manufacturer Name Manufacturer City 3
4 Manufacturer State Code Manufacturer Country Code Problem Description Number of Devices in Event Number of Patients in Event Adverse Event Flag Date of Event Event Location Remedial Action Event Type 2.2 Big Data Components Used 1. Apache Hadoop Distributed File System (HDFS) is a Java based filed system that provides scalable and reliable data storage 2. Apache HIVE, a database query interface from Apache Hadoop, which works like SQL. 3. RStudio, open source tool for statistical computing and graphics 2.3 General Constraints The process is to be automated and scalable as much as possible for reusability purpose. 2.4 Assumption The project is based on the idea of Post Market Surveillance of Medical Devices and the goal is to make this reality involving Big Data and Analytics capabilities. 3. Design Details 3.1 Technology Architecture 4
5 The MAUDE data from FDA site has been downloaded to file system using shell scripts and moved onto HDFS. Once the raw data is available in HDFS, On top of HDFS files HIVE tables have been created and data has been loaded into tables. Integration of tables has been done in HIVE along with filtering and preprocessing of all tables data. Once the final processed data is available in HIVE, Using R Streaming we are going to use the final table data into R for model building and visualization. 3.2 Database Schema There are totally four datasets downloaded, preprocessed and stored in hive. 5
6 Notes: foidevxxxx relates to device data for xxxx year mdrfoithru2015 relates to master event data Final data table is to be derived from the above two plus foi_devproblem and deviceproblemcodes MDR Report key should be used to join the tables Source Files MDRFOITHRU2015: This file contains following master event data data in the order presented. Bold columns to be extracted. 1. MDR_REPORT_KEY 2. EVENT_KEY 6
7 3. REPORT_NUMBER 4. REPORT_SOURCE_CODE 5. MANUFACTURER_LINK_FLAG_ 6. NUMBER_DEVICES_IN_EVENT 7. NUMBER_PATIENTS_IN_EVENT 8. DATE_RECEIVED 9. ADVERSE_EVENT_FLAG 10. PRODUCT_PROBLEM_FLAG 11. DATE_REPORT 12. DATE_OF_EVENT 13. REPROCESSED_AND_REUSED_FLAG 14. REPORTER_OCCUPATION_CODE 15. HEALTH_PROFESSIONAL 16. INITIAL_REPORT_TO_FDA 17. DATE_FACILITY_AWARE 18. REPORT_DATE 19. REPORT_TO_FDA 20. DATE_REPORT_TO_FDA 21. EVENT_LOCATION 22. DATE_REPORT_TO_MANUFACTURER 23. MANUFACTURER_CONTACT_T_NAME 24. MANUFACTURER_CONTACT_F_NAME 25. MANUFACTURER_CONTACT_L_NAME 26. MANUFACTURER_CONTACT_STREET_1 27. MANUFACTURER_CONTACT_STREET_2 28. MANUFACTURER_CONTACT_CITY 29. MANUFACTURER_CONTACT_STATE 30. MANUFACTURER_CONTACT_ZIP_CODE 31. MANUFACTURER_CONTACT_ZIP_EXT 32. MANUFACTURER_CONTACT_COUNTRY 33. MANUFACTURER_CONTACT_POSTAL 34. MANUFACTURER_CONTACT_AREA_CODE 35. MANUFACTURER_CONTACT_EXCHANGE 36. MANUFACTURER_CONTACT_PHONE_NO 37. MANUFACTURER_CONTACT_EXTENSION 38. MANUFACTURER_CONTACT_PCOUNTRY 39. MANUFACTURER_CONTACT_PCITY 40. MANUFACTURER_CONTACT_PLOCAL 41. MANUFACTURER_G1_NAME 42. MANUFACTURER_G1_STREET_1 43. MANUFACTURER_G1_STREET_2 44. MANUFACTURER_G1_CITY 45. MANUFACTURER_G1_STATE_CODE 46. MANUFACTURER_G1_ZIP_CODE 47. MANUFACTURER_G1_ZIP_CODE_EXT 48. MANUFACTURER_G1_COUNTRY_CODE 49. MANUFACTURER_G1_POSTAL_CODE 50. DATE_MANUFACTURER_RECEIVED 51. DEVICE_DATE_OF_MANUFACTURE 52. SINGLE_USE_FLAG 53. REMEDIAL_ACTION 7
8 54. PREVIOUS_USE_CODE 55. REMOVAL_CORRECTION_NUMBER 56. EVENT_TYPE 57. DISTRIBUTOR_NAME 58. DISTRIBUTOR_ADDRESS_1 59. DISTRIBUTOR_ADDRESS_2 60. DISTRIBUTOR_CITY 61. DISTRIBUTOR_STATE_CODE 62. DISTRIBUTOR_ZIP_CODE 63. DISTRIBUTOR_ZIP_CODE_EXT 64. REPORT_TO_MANUFACTURER 65. MANUFACTURER_NAME 66. MANUFACTURER_ADDRESS_1 67. MANUFACTURER_ADDRESS_2 68. MANUFACTURER_CITY 69. MANUFACTURER_STATE_CODE 70. MANUFACTURER_ZIP_CODE 71. MANUFACTURER_ZIP_CODE_EXT 72. MANUFACTURER_COUNTRY_CODE 73. MANUFACTURER_POSTAL_CODE 74. TYPE_OF_REPORT 75. SOURCE_TYPE 76. DATE_ADDED 77. DATE_CHANGED foidevxxxx.zip (Device Data) file where xxxx is the year contains following 45 fields, delimited by pipe ( ), one record per line: 1. MDR Report Key 2. Device Event key 3. Implant Flag -- D6, new added; Date Removed Flag -- D7, new added; 2006; if flag in M or Y, print Date U = Unknown A = Not available I = No information at this time M = Month and year provided only, day defaults to 01 Y = Year provided only, day defaulted to 01, month defaulted to January 5. Device Sequence No -- from device report table 6. Date Received (from mdr_document table) SECTION-D 7. Brand Name (D1) 8. Generic Name (D2) 9. Manufacturer Name (D3) 10. Manufacturer Address 1 (D3) 11. Manufacturer Address 2 (D3) 12. Manufacturer City (D3) 13. Manufacturer State Code (D3) 14. Manufacturer Zip Code (D3) 15. Manufacturer Zip Code ext (D3) 16. Manufacturer Country Code (D3) 8
9 17. Manufacturer Postal Code (D3) 18. Expiration Date of Device (D4) 19. Model Number (D4) 20. Catalog Number (D4) 21. Lot Number (D4) 22. Other ID Number (D4) 23. Device Operator (D5) 24. Device Availability (D10) Y = Yes N = No R = Device was returned to manufacturer * = No answer provided 25. Date Returned to Manufacturer (D10) 26. Device Report Product Code 27. Device Age (F9) 28. Device Evaluated by Manufacturer (H3) Y = Yes N = No R = Device not returned to manufacturer * = No answer provided BASELINE SECTION (for records prior to 2009) 29. Baseline brand name 30. Baseline generic name 31. Baseline model no 32. Baseline catalog no 33. Baseline other id no 34. Baseline device family 35. Baseline shelf life contained in label Y = Yes N = No A = Not applicable * = No answer provided 36. Baseline shelf life in months 37. Baseline PMA flag 38. Baseline PMA no 39. Baseline 510(k) flag 40. Baseline 510(k) no 41. Baseline preamendment 42. Baseline transitional 43. Baseline 510(k exempt flag 44. Baseline date) first marketed 45. Baseline date ceased marketing foidevproblem (Device Data for foidev problem) contains following 2 fields, delimited by pipe ( ), one record per line: 1. MDR Report Key 2. Device Problem Code -- (F10) new added; DEVICEPROBLEMCODES contains following 2 fields, delimited by pipe ( ), one record per line: 9
10 1. Device Problem Code 2. Problem Description 3.3 R Shiny User Interface The user interface is built using R shiny web application a very simple plain layout showcasing the visualization, classification tree and forecasted data. Screen shots have been provided below to demonstrate the user interface. Visualization For visualization of number of events reported in every state year wise, a thematic map Chloropleth is used. The shading in the map represents the range of events occurred in every state. 10
11 Forecasting 4. Reusable Components Shell Scripting To crawl MAUDE data from FDA site and load Data into Hadoop environment Big Data Storage and Preprocessing HDFS To Store the raw data which downloaded and Preprocessed Data which is integrated and to be used for reporting as well as model building. HIVE Created Hive tables corresponding to master data, Device data, Device problem codes data and Device problem data. Using the common key, tables have merged together to form final datasets to be used for model building. In the process of preparing final datasets we have created custom UDF s also in HIVE to get data in required format. 11
12 Analytics R For building a classification tree and forecasting model R Web Interface using Shiny To create Interactive Light Weight Web application which can be accessible on Intranet / Corporate LANs. 12
Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationMicrosoft Perform Data Engineering on Microsoft Azure HDInsight.
Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationIntroduction to Hive Cloudera, Inc.
Introduction to Hive Outline Motivation Overview Data Model Working with Hive Wrap up & Conclusions Background Started at Facebook Data was collected by nightly cron jobs into Oracle DB ETL via hand-coded
More informationBig Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka
Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationHadoop Online Training
Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationAbout Codefrux While the current trends around the world are based on the internet, mobile and its applications, we try to make the most out of it. As for us, we are a well established IT professionals
More informationTrafodion Enterprise-Class Transactional SQL-on-HBase
Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Introduction (Welsh for transactions) Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop Leveraging 20+
More informationJaql. Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata. IBM Almaden Research Center
Jaql Running Pipes in the Clouds Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center http://code.google.com/p/jaql/ 2009 IBM Corporation Motivating Scenarios
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationBig Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture
Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem
More informationHadoop, Yarn and Beyond
Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationThe Stratosphere Platform for Big Data Analytics
The Stratosphere Platform for Big Data Analytics Hongyao Ma Franco Solleza April 20, 2015 Stratosphere Stratosphere Stratosphere Big Data Analytics BIG Data Heterogeneous datasets: structured / unstructured
More informationUsing Hive for Data Warehousing
An IBM Proof of Technology Using Hive for Data Warehousing Unit 1: Exploring Hive An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted Rights - Use,
More informationTypical size of data you deal with on a daily basis
Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationData Analytics using MapReduce framework for DB2's Large Scale XML Data Processing
IBM Software Group Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing George Wang Lead Software Egnineer, DB2 for z/os IBM 2014 IBM Corporation Disclaimer and Trademarks
More informationCost-Benefit Analysis of Retrospective vs. Prospective Data Standardization
Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization Vicki Seyfert-Margolis, PhD Senior Advisor, Science Innovation and Policy Food and Drug Administration IOM Sharing Clinical Research
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationSTUDENT GRADE IMPROVEMENT INHIGHER STUDIES
STUDENT GRADE IMPROVEMENT INHIGHER STUDIES Sandhya P. Pandey Assistant Professor, The S.I.A college of Higher Education, Dombivili( E), Thane, Maharastra. Abstract: In India Higher educational institutions
More informationSyncsort Incorporated, 2016
Syncsort Incorporated, 2016 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress. This publication may not be reproduced in whole
More informationThis is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.
About the Tutorial Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and
More informationPackage mds. April 27, 2018
Type Package Title Medical Devices Surveillance Version 0.1.0 Maintainer Gary Chung Package mds April 27, 2018 A set of core functions for handling medical device event data in the
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationWearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life
Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life Ch.Srilakshmi Asst Professor,Department of Information Technology R.M.D Engineering College, Kavaraipettai,
More informationOracle 1Z Oracle Big Data 2017 Implementation Essentials.
Oracle 1Z0-449 Oracle Big Data 2017 Implementation Essentials https://killexams.com/pass4sure/exam-detail/1z0-449 QUESTION: 63 Which three pieces of hardware are present on each node of the Big Data Appliance?
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationMAPR TECHNOLOGIES, INC. TECHNICAL BRIEF APRIL 2017 MAPR SNAPSHOTS
MAPR TECHNOLOGIES, INC. TECHNICAL BRIEF APRIL 2017 MAPR SNAPSHOTS INTRODUCTION The ability to create and manage snapshots is an essential feature expected from enterprise-grade storage systems. This capability
More informationHadoop course content
course content COURSE DETAILS 1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is 2.X Architecture & How to set up Cluster 3. How to write complex MapReduce Programs 4. In-detail
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive
More informationInstructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e
ABSTRACT Pentaho Business Analytics from different data source, Analytics from csv/sql,create Star Schema Fact & Dimension Tables, kettle transformation for big data integration, MongoDB kettle Transformation,
More informationIBM BigInsights Security Implementation: Part 1 Introduction to Security Architecture
IBM BigInsights Security Implementation: Part 1 Introduction to Security Architecture Big data analytics involves processing large amounts of data that cannot be handled by conventional systems. The IBM
More informationChase Wu New Jersey Institute of Technology
CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationPig A language for data processing in Hadoop
Pig A language for data processing in Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Apache Pig: Introduction Tool for querying data on Hadoop
More informationAnalyzing Big Data with Microsoft R
Analyzing Big Data with Microsoft R 20773; 3 days, Instructor-led Course Description The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationBenchmarks Prove the Value of an Analytical Database for Big Data
White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...
More informationData warehousing on Hadoop. Marek Grzenkowicz Roche Polska
Data warehousing on Hadoop Marek Grzenkowicz Roche Polska Agenda Introduction Case study: StraDa project Source data Data model Data flow and processing Reporting Lessons learnt Ideas for the future Q&A
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationHortonworks Certified Developer (HDPCD Exam) Training Program
Hortonworks Certified Developer (HDPCD Exam) Training Program Having this badge on your resume can be your chance of standing out from the crowd. The HDP Certified Developer (HDPCD) exam is designed for
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationA Review Approach for Big Data and Hadoop Technology
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse
More informationAnalyze Big Data Faster and Store It Cheaper
Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas
More informationTIBCO Foresight Products
TIBCO Foresight Products Working with Health Level Seven (HL7) Transactions August 2017 Two-Second Advantage Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH
More informationCOURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER
ABOUT THIS COURSE The focus of this five-day instructor-led course is on creating managed enterprise BI solutions. It describes how to implement multidimensional and tabular data models, deliver reports
More informationInformation Management Fundamentals by Dave Wells
Information Management Fundamentals by Dave Wells All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks
More information##SQLSatMadrid. Project [Vélib by Cortana]
Project [Vélib by Cortana] BIG Thanks to SQLSatMadrid Sponsors Speakers Agenda Presentation of the Project Cortana Intelligent Suite Creation of the architecture Purpose of the Project Get a descriptive
More informationColumn Stores and HBase. Rui LIU, Maksim Hrytsenia
Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationHive SQL over Hadoop
Hive SQL over Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction Apache Hive is a high-level abstraction on top of MapReduce Uses
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationData Storage Infrastructure at Facebook
Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow
More informationCreating Connection With Hive. Version: 16.0
Creating Connection With Hive Version: 16.0 Copyright 2015 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or derived
More informationOverview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::
Module Title Duration : 20773A: Analyzing Big Data with Microsoft R : 3 days Overview The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationExpert Lecture plan proposal Hadoop& itsapplication
Expert Lecture plan proposal Hadoop& itsapplication STARTING UP WITH BIG Introduction to BIG Data Use cases of Big Data The Big data core components Knowing the requirements, knowledge on Analyst job profile
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationProject Requirements
Project Requirements Version 4.0 2 May, 2016 2015-2016 Computer Science Department, Texas Christian University Revision Signatures By signing the following document, the team member is acknowledging that
More informationTalend Open Studio for Data Quality. User Guide 5.5.2
Talend Open Studio for Data Quality User Guide 5.5.2 Talend Open Studio for Data Quality Adapted for v5.5. Supersedes previous releases. Publication date: January 29, 2015 Copyleft This documentation is
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationData Science Training
Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst
More informationOracle. Oracle Big Data 2017 Implementation Essentials. 1z Version: Demo. [ Total Questions: 10] Web:
Oracle 1z0-449 Oracle Big Data 2017 Implementation Essentials Version: Demo [ Total Questions: 10] Web: www.myexamcollection.com Email: support@myexamcollection.com IMPORTANT NOTICE Feedback We have developed
More informationNida Afreen Rizvi 1, Anjana Pandey 2, Ratish Agrawal 2 and Mahesh Pawar 2
P r e d i c t i o n m o d e l o f D i a b e t e s D r u g U s i n g H i v e a n d R Prediction model of Diabetes Drug Using Hive and R Nida Afreen Rizvi 1, Anjana Pandey 2, Ratish Agrawal 2 and Mahesh
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationBusiness Cockpit. Controlling the digital enterprise. Fabio Casati Hewlett-Packard
Business Cockpit Controlling the digital enterprise Fabio Casati Hewlett-Packard UC Berkeley, Oct 4, 2002 page 1 Managing Operational Systems Develop a platform for the semantic management of operational
More information