New in This Version. Numeric Filtergram

Similar documents
Enterprise Data Catalog Fixed Limitations ( Update 1)

Datameer for Data Preparation:

Spectrum Version Release Notes

How to view details for your project and view the project map

New Features and Enhancements in Big Data Management 10.2

Datawatch Monarch Release Notes Version July 9th, 2018

RELEASE NOTES. Version NEW FEATURES AND IMPROVEMENTS

Genesys Pulse. Known Issues and Recommendations

Enabling Secure Hadoop Environments

Cloudera JDBC Driver for Apache Hive

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Layout and display. STILOG IST, all rights reserved

GROUP CANVAS USER SIDE FUNCTIONS

Hortonworks Data Platform

Quick Install for Amazon EMR

CRM Connector for Salesforce

Oracle Big Data Connectors

iway Big Data Integrator New Features Bulletin and Release Notes

MindManager HTML5 Export Release Notes

See Types of Data Supported for information about the types of files that you can import into Datameer.

TIBCO LiveView Web New and Noteworthy

Perceptive Nolij Web. Administrator Guide. Version: 6.8.x

Sample Data. Sample Data APPENDIX A. Downloading the Sample Data. Images. Sample Databases

SmartView. User Guide - Analysis. Version 2.0

Compliance Document Manager User Guide

Price ƒ(x) Release 'Mojito' Release Notes Go-live Date:

Legal Notes. Regarding Trademarks KYOCERA MITA Corporation

ServiceWise/CustomerWise 10.1

User Guide. Issued July DocAve Backup for Salesforce User Guide

Unit 8: Working with Actions

SPSS Statistics 21.0 GA Fix List. Release notes. Abstract

Excel Basic 1 GETTING ACQUAINTED WITH THE ENVIRONMENT 2 INTEGRATION WITH OFFICE EDITING FILES 4 EDITING A WORKBOOK. 1.

akkadian Provisioning Manager Express

TexRAD Research Version Client User Guide Version 3.9

OBIEE. Oracle Business Intelligence Enterprise Edition. Rensselaer Business Intelligence Finance Author Training

Using Apache Zeppelin

User Guide. Web Intelligence Rich Client. Business Objects 4.1

Perceptive Nolij Web. Release Notes. Version: 6.8.x

Copyright 2018 MakeUseOf. All Rights Reserved.

GOOGLE APPS. If you have difficulty using this program, please contact IT Personnel by phone at

Welcome to Selector2GO Help

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Salesforce Console Implementation Guide

Business Office Specialist

Enterprise Data Catalog for Microsoft Azure Tutorial

ISILON ONEFS WITH HADOOP KERBEROS AND IDENTITY MANAGEMENT APPROACHES. Technical Solution Guide

Data Explorer: User Guide 1. Data Explorer User Guide

Reporter Tutorial: Intermediate

Working with Workbooks

SAS Data Explorer 2.1: User s Guide

Modern Requirements4TFS 2018 Update 3 Release Notes

Legal notice. Copyright. Disclaimer

Avigilon Control Center Web Client User Guide

Important Notice Cloudera, Inc. All rights reserved.

Enter your address and password in the appropriate box then click Login. This will open the tracking system and display your assets on the map

SAS Visual Analytics 8.2: Getting Started with Reports

Victaulic Tools for Revit

Authentication via Active Directory and LDAP

User s Manual. Version 5

127 Church Street, New Haven, CT O: (203) E: GlobalSearch ECM User Guide

All Excel Topics Page 1 of 11

Informatica Cloud Spring Complex File Connector Guide

Integration Service. Admin Console User Guide. On-Premises

This documentation explains features that are located in the Dashboards menu, this is located on the top navigation bar.

Desk Tracker User Guide

START GUIDE CDMNext V.3.0

Introduction Accessing MICS Compiler Learning MICS Compiler CHAPTER 1: Searching for Data Surveys Indicators...

KYOCERA Net Admin User Guide

Oracle Application Express 5 New Features

Talend Data Preparation Free Desktop. Getting Started Guide V2.1

Quark XML Author for FileNet 2.8 with BusDocs Guide

Exam Questions

CLOUD EXPLORER DATALOADER USER S GUIDE UC INNOVATION, INC. April 07, 2017

Open Mobile Portal 2.18 Release Notes

Tips & Tricks: MS Excel

Introduction to Cloudbreak

Integration Service. Admin Console User Guide. On-Premises

Computer Technology II

User Guide. Modified: November 25, 2013 Version 9.0

Navigating Viewpoint V6 Exploring the Viewpoint Main Menu

SharePoint List Booster Features

Appendix A Microsoft Office Specialist exam objectives

V-BOX Cloud Configuration

Mastering phpmyadmiri 3.4 for

DOCUMENT IMAGING REFERENCE GUIDE

08/10/2018. Istanbul Now Platform User Interface

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

How to Run the Big Data Management Utility Update for 10.1

Administrative Training Mura CMS Version 5.6

Administration 1. DLM Administration. Date of Publish:

NetAdvantage for ASP.NET Release Notes

MarkLogic Server. Security Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

Administration 1. DLM Administration. Date of Publish:

CALUMMA Management Tool User Manual

Lightpath Hosted Voice

Managing and Monitoring a Cluster

Oracle BDA: Working With Mammoth - 1

The MLS User Interface

What's New in Sitecore CMS 6.4

Installing Apache Knox

Transcription:

.

Table of Contents New in This Version... 4 Changed in This Version... 14 Upgrade Notes... 16 Supported Browsers, Processing Engines, Data Sources and Hadoop Distributions... 16 Resolved Issues... 17 Known Issues... 17

New in This Version Filtergram enhancements All Filtergrams have been redesigned to provide a unified interface with powerful new tools to dynamically filter your data with great precision. A summary of the new functionality is provided in these release notes. For a comprehensive explanation of the filtering actions and operations you can take for each filter type, see the application's online help. Numeric Filtergram

Make selections from the histogram to dynamically filter your data. Explore your selections and make additional refinements with the "Show Selected Items" option. All filtering actions you take continue to dynamically update your dataset.

View your entire range of data as a list. Continue to make very precise selections from the list. Your dataset continues to dynamically update with every selection you make. Search for values using the enhanced search function. Text Filtergram The new "Show Selected Items" option and the enhanced search capabilities allow you to explore your selections and make additional refinements. All filtering actions you take continue to dynamically update your dataset.

New Date/Time Filtergram In addition to the Filtergram enhancements, this release introduces a new Filtergram that displays date/time values as a Timeline histogram. All of the new functionality implemented for Numeric Filtergrams is also available for the Timeline histogram: Drag your mouse over values in the histogram to make single or multiple selections; the dataset dynamically filters to display your selection(s). Move your mouse over the x-axis and use the scroll wheel to zoom into your timeline selections. Data bins display to represent a granular view of the selected range. Mouse over any bin to view the number of values in a bin. Use the scroll wheel to zoom deeper. Turn on the Overview tool to view your zoomed locations. Use the "Show Selected Items" option to explore your selections and make additional refinements: manually add specific dates or date ranges. Toggle to "Exclude" and hide any specified dates or ranges while working with your data. View the entire range of date/time values as a list from which you can make very precise selections. Search for any values with enhanced search capability.

Additionally, the Date/Time filter has five different charts for filtering your date/time data. The Timeline is the default chart. The additional charts can be opened from the "available charts" tab, and you can work simultaneously with all open charts. The application's online help provides an explanation of all filtering actions and operations you can take with the Date/Time Filtergram.

Sampling Sampling on Import, Lookup and Append Sampling is now available when you import a base dataset into a Project and perform a Lookup or Append Step in a Project. This option allows you to sample a very large data source for initial discovery before bringing all of the data into a Project. This is particularly useful if your Administrator has source file size limits in place to protect cluster resources. Sampling as Project Step with new Sampling Tool In addition to sampling a data source, the new Sampling tool gives you the flexibility to filter down to a specific set of rows in your data and then sample on those rows. When your exploration is complete, you can easily remove the sampling operation by either muting or deleting it in the Steps panel. For both Sampling on Import/Lookup/Append and Sampling as a Step in your Project, the sampling operation can be based on a percentage of the dataset or a specific number of rows in the dataset. When sampling by percentage, you have the option to specify a column in the source file to use for generating the sample. In this case, only the data in the column is used for determining the sample.

Note: when performing any sampling operation, a "sampling seed" is provided to ensure that you can always repeat your sample. New grid tools Three new tools are available in the application's footer for working with data on the grid. The application s online help provides details for how to use each of these tools. Column Lineage A column's lineage can now be displayed in the Steps panel through "Lineage Mode." In Lineage Mode, all Steps that affected the column are displayed with an orange outline. The outlines allow you to quickly identify the Steps that affected the column or changed its data. If there are Steps in the Editor that did not affect the column, those Steps are grayed out, collapsed and labeled to note how many Steps are collapsed. New tools for working with Cluster + Edit

Two new tools provide visual queues to better recognize how the suggested value for a Cluster was derived: Fixed-width font setting: by default, Cluster values display in a variable-width font. Click this option to display Cluster values in a fixed-width font. The fixed-width option aligns all text characters, which allows you to more easily identify extra spaces within a Cluster value and differentiate characters across the Clusters. Highlight tools: highlighting allows you to easily recognize how the suggested Cluster replacement value was derived. The Additions tool highlights the characters that are in addition to all common characters. The Deletions tool indicates where deletions have been made in order to derive the common characters. Deletions are condensed into a red (x). The Additions and Deletions tools can be simultaneously enabled.

Data Library enhancements The following enhancements have been implemented for Data Library import and export functionality: Compressed files from HDFS can now be imported into the Data Library. The following compression formats are currently supported: bzip, gzip, deflate. When adding a dataset to the library, new options under the Advanced Settings tab allow you to strip smart quotes from a delimited file on import and parse a file that uses row separators other than the standard defaults for Value and Line Separators. When exporting a dataset from the Data Library, you can specify an alternative row separator other than the standard default value. Support for proxy connectivity from Cisco Data Prep to Salesforce: a new configuration enhancement allows Cisco Data Prep to connect to a cloud-based Salesforce server through a proxy server. Refer to the "Cisco Data Prep Installation" guide for configuration details.

Admin updates The following enhancements have been implemented for the administration pages: Users page: when logged in as "superuser" or "admin" a new Last Login column provides information indicating the last login date and time for each user. Roles page: a new Library permission has been introduced to separate the rights required to locally download a dataset versus exporting it to HDFS. Tenant configuration A tenant-based user session timeout option is now available. This option specifies the maximum number of seconds that a user session can be idle before the system automatically closes the session. The default setting is to never timeout users. New clusters.properties parameter to transform case-insensitive names from LDAP/SAML to HDFS If you use HDFS for import or export with the passthru option, there is a new parameter in the clusters.properties file: px.cluster.clustername.passthru.transform This parameter allows you to specify a function to transform case-insensitive names from LDAP/SAML to HDFS user names, which are case sensitive. Refer to the Server Administration Guide for details. Changed in This Version Nested XML and JSON Smart parsing for nested XML and JSON data now creates new columns for all of the nested data instead of dropping or writing it into a single row. Example: Nested XML data

Results prior to 1.2 release: Smart parsing results with this release: Important note: the order of columns in your source XML or JSON file is not always preserved during the import, and you may notice a different column order after the file is imported into the Data Library. This is a known issue and a fix is forthcoming. The current and recommended workaround is to bring the data into a Project and use the Columns tool to re-order the columns.

Upgrade Notes Resource Level Permissions are not enforced after export to Hive becaues the Cisco Data Prep application does not provision any Hive authorization platform. Supported Browsers, Processing Engines, Data Sources and Hadoop Distributions Browsers Mozilla Firefox: Extended Support Release (ESR) 38.6.1 for Mac and Windows Google Chrome: 48 for Mac and Windows The recommended resolution for the Cisco Data Prep application is 1024x768. Processing engines Apache Spark 1.4 CDH 5.4.X Spark 1.3 o YARN o standalone Supported data sources and export formats Sources o HDFS o JDBC o Salesforce o Local files: Microsoft:.xls,.xlsx,.xlsm Character-separated or fixed-field-length text:.txt,.csv,.tsv Structured text:.json,.xml Hadoop Avro :.avro Export formats o JSON o AVRO o XML o CSV; delimiter-separated; tab-separated o fixed-width Supported Hadoop Distributions Cloudera CDH 5.4.0 Hortonworks HDP 2.3.2

Resolved Issues Filtergrams: if you have more than one filter open for your dataset, the remove row operation now works when one of the filters is "inverted". Project automation setup: the Cancel button for "Import Dataset Set it up Now?" now correctly cancels the function for a dataset that cannot be automated. The user interface treats all numeric values as double precision (64-bit) IEEE 754 values. Integer values that exceed this supported range are no longer rounded in the user interface. A computed column expression with a negative number in the LEFT or RIGHT argument now returns a syntax error notifying user that negative numbers are not permitted. Previously, an "unexpected error" was issued and the Project's dataset did not refresh. When performing a split column operation on a Lookup Step, the "min" and "max" column values now correctly display. When working on a shared Project in which one user does not have permission to a dataset being used by the Project, an error message now displays in the Steps editor panel to notify the user of the permission issue. JDBC imports, particularly from Hive, are now significantly faster. Newly provisioned users in newly provisioned LDAP groups no longer see an error on initial login to the application. The keytab file used for connecting users to a Hadoop cluster can now successively login to the cluster to obtain new Kerberos authentication tickets. Known Issues Issue Rapid actions on the Steps Editor panel trigger an error and loss of local changes Erroneous error message when deleting datafiles for projects that no longer exist in Cisco Data Prep Description When performing rapid actions on Steps in a Project, for example quickly muting Steps, an error message is issued and the local changes cannot be saved. If you delete a Cisco Data Prep project and then later delete the datafile(s) associated with that project, you receive a message indicating the datafile(s) are currently in use even though you ve deleted the project:

Issue Arrays not properly published Regular expression syntax with backslash requires an additional backslash Numeric columns not properly cast after a transpose operation Description When publishing the results of an Array aggregation from a shape step, the Array column is not properly published. To publish an Array column, the column must be converted to text after the shape step. If you need to use a backslash (\) in a regular expression, you require an additional backslash (\\) one to escape the other. For example, RegEx \d is entered as: \\d After a transpose operation on a column with numeric values, the column type is being set to type text. Before mathematical operations can be conducted on this column, the column must be transformed back to a numeric type column. column remains marked as text data type transform to numeric Pipeline installation package has 4 different versions, each supporting a specific distribution. Run cisco-data-prep-pipeline on the correct distribution. cisco-data-prep-pipeline-cdh5.4.0-2.9.2-1.noarch.rpm ==> compatible with Cloudera CDH 5.4.x distribution cisco-data-prep-pipeline-cdh5.5.0-2.9.2-1.noarch.rpm ==> compatible with Cloudera CDH 5.5.x distribution cisco-data-prep-pipeline-db1.4.1-2.9.2-1.noarch.rpm ==> compatible with Databricks Spark 1.4.1 distribution cisco-data-prep-pipeline-db1.5.1-2.9.2-1.noarch.rpm ==> compatible with Databricks Spark 1.5.1 distribution For users upgrading pipeline to 1.2, first backup the pipeline/config files, then remove the existing cisco-dataprep-pipeline, and then install the new cisco-data-preppipeline rpm file.