Smarter Document Capture

Similar documents
Scanshare Sales Guide V1.2

FineReader Engine Overview & New Features in V10

Best practices for producing high quality PDF files

PDF/A for Scanned Documents

DOCUMENT NAVIGATOR SALES GUIDE ADD NAME. KONICA MINOLTA Document Navigator Sales Guide

TR 1288 Specifications for PDF & XML format Page 1 of 7

DjVu Technology Primer

PDF/A - The Basics. From the Understanding PDF White Papers PDF Tools AG

PDF Specification for IEEE Xplore (Part A-Core Requirements)

GUIDELINES FOR CREATION AND PRESERVATION OF DIGITAL FILES

PDF/arkivering PDF/A. Per Haslev Adobe Systems Danmark Adobe Systems Incorporated. All Rights Reserved.

halfile 2.1 New Application Features So What s with the trees? General Enhancements

ISO PDF/A -Standard Archive file format standard for long-term preservation

ivina BulletScan Manager

The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows

Perfect PDF 9 Premium

ABBYY FineReader 10. Professional Edition Corporate Edition Site License Edition. Small and medium-sized businesses or individual departments

Implementing a Standardized PDF/A Document Storage System with LEADTOOLS

Emerging Trends in Records Management Technology. Jessie Weston, CRA 2018 MISA Conference October 11-12, 2018

Sustainable File Formats for Electronic Records A Guide for Government Agencies

Perfect PDF & Print 9

Readiris Pro 9 for Mac, the New Release of I.R.I.S. Flagship Product Sets a New Standard for OCR on the Mac Platform

ABBYY FineReader 14 YOUR DOCUMENTS IN ACTION

Creating Searchable PDFs with Adobe Acrobat XI - Quick Start Guide

acrobat.txt last modified 9/27/2014 This file is from

I.R.I.S. is shipping IRISPdf Server 6.0, including ihqc, I.R.I.S. new & revolutionary high quality compression technology!

USER S GUIDE Software/Hardware Module: ADOBE ACROBAT 7

DOWNLOAD PDF EDITING TEXT IN A SCANNED FILE

Session Questions and Responses

The McGill Tribune Advertising Rate Card. WHY THE McGILL TRIBUNE? DISTRIBUTION LOCATIONS

Version 4.1 September 2017

Info Input Express Limited Edition

DIGITAL RECORDS MANAGEMENT GUIDELINES

Digitizing Historic Newspapers

Preparing PDF Files for ALSTAR

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved.

Managing the Transition to Digital Documentation Nuance Communications, Inc. All rights reserved.

SmartWorks MFP V4 Help File

ABBYY FineReader 14 Full Feature List

EFI MicroPress v8.2. The powerful and complete production workflow solution

Fast. Easy to use. Built for business. Two powerful new desktop document scanners join our award winning line-up ADS-2200 I ADS-2700W

Improved MRC engine: speed, compression rate, output quality. Dramatically improved quality of produced documents by the MRC engine.

{ Embedded Applets for Canon Devices }

FDA Portable Document Format (PDF) Specifications

White Paper: ABBYY Recognition Server Web Service API Example

How. Can Acrobat Help My Bar Association? Catherine Sanders Reach ABA Legal Technology Resource Center

This guideline cannot anticipate all operating systems and software versions, therefore general instructions are provided.

Scan Station 710/730 Release Notes:

Export out report results in multiple formats like PDF, Excel, Print, , etc.

Technology Special Interest Group Thursday, December 4, Tony Hanson Webmaster Technology Special Interest Group Leader

Overview. Finding information and help Adobe Acrobat. Where to find it and why to use it. When converting from Word to Acrobat

The Making of PDF/A. 1st Intl. PDF/A Conference, Amsterdam Stephen P. Levenson. United States Federal Judiciary Washington DC USA

2010 by Microtek International, Inc. All rights reserved.

INDIVIDUAL bizhub ENHANCEMENT

Create PDF s. Create PDF s 1 Technology Training Center Colorado State University

User Guide IRIScan Executive 4. IRIScan Executive 4 Button Manager

Aquaforest Searchlight Reference Guide

Introducing Brother s new range of ADS Network Scanners. Hardware that lasts, software that excels.

PDF/A in Healthcare. Dr. Bernd Wild. intarsys. Webinar: PDF/A in Healthcare Dr. Bernd Wild.

PDF solution comparison.

Aquaforest Searchlight Reference Guide

Features & Functionalities

1.1. Various Stages involved in Digitisation Process

Contents. A April 2017 i

Welcome to the PDF Xpansion SDK

900gt Series User Guide

INDIVIDUAL bizhub ENHANCEMENT

Chapter 9 Section 3. Digital Imaging (Scanned) And Electronic (Born-Digital) Records Process And Formats

SharePoint Archival Storage Strategies & Technologies January Porter-Roth Associates 1

PDF/A in the Product Lifecycle

What's new in DocuWare Version 6.7

Contact Us. ZySCAN Manual. For full contact details, visit the ZyLAB website -

**** Digitization. Pictures are important

PC & Web Print Solutions 2.0

FilesAnywhere Features List

NexStamp. Frequently Asked Questions. (click anywhere to continue) Trusted Digital Originals TM

Table of Contents. Introduction Legal Notices Installation and Setup System Requirements Installation Activation...

****** Release Note for Image Capture Plus ***** Copyright(C) , Panasonic Corporation All rights reserved.

****** Release Note for Image Capture Plus ***** Copyright(C) , Panasonic Corporation All rights reserved.

(Printer HP LaserJet Pro Model (M125a

Features & Functionalities

ID:webArchive User Manual

Xtractor Designer Advanced User Manual

Unicode. Standard Alphanumeric Formats. Unicode Version 2.1 BCD ASCII EBCDIC

Solutions for Mobile Working

Module Kofax Capture Review

Intelligent Document Solutions

WHY PDF? PDF Days Europe Key Value Prop and its implications for the future of PDF. Catherine Andersz CEO, PDFTron Systems Inc.

OPTIMIZING PDFS WITH ACROBAT PRO 8

Improved automatic restart and failed job recovery 64-bit support for improved memory utilisation

MEDIA RELATED FILE TYPES

BEST FILE FORMAT FOR HIGH RESOLUTION

AMScan Viewer User Guide Version 2.1

Copyright 2011, TeraMedica, Inc.

d-color d-color Code:

2013 OFFSHORE TECHNOLOGY CONFERENCE ADVERTISING SPECIFICATIONS Conference Preview Advertising

Aquaforest OCR SDK for.net Release Notes

Facts

SimpleView 5.1 MANUAL.

Using Smart Touch A-61829

Transcription:

Smarter Document Capture This presentation will begin at 2:00 PM EDT 1 PM Central, 12 PM Mountain, 11 AM Pacific Please check that the volume on your computer is on This presentation runs through voice over IP Until then, enjoy the sounds of silence

AIIM Presents: Smarter Document Capture Peggy Winton VP, AIIM Market Access Ari Gross CEO, CVISION Technologies Inc. Ralph Gammon editor, Document Imaging Report

About AIIM AIIM is the community focused on providing education, research, and best practices to help organizations find, control, and optimize their information for maximum value. Learn more about AIIM at www.aiim.org.

About AIIM We offer year-round programming in: Market Education Peer Networking Industry Advocacy & Research Professional Development & Training

Smarter Document Capture Ari Gross CEO CVISION Technologies Inc.

Smart Captured Documents Web-optimization :: On demand access Recognition :: OCR, ICR, bar codes, form coding PDF/A :: Reproducibility, long-term archiving Compression :: Image files at electronic file sizes Metadata :: Embed field info, Database independence Color Imaging :: Improved appearance & recognition

Compression Significant progress in compression technology Scanned files can be compressed as small as the original generated files Amenable to web hosting, email & backups Print on demand Word Document 921 KB Scanned TIFF 13,124 KB Standard PDF 13,058KB Compressed PDF 870 KB

Recognition OCR, Optical Character Recognition, recognize printed text ICR, Intelligent Character Recognition, recognize handwritten text Barcode, identify and recognize barcodes Form recognition, identify form type & extract relevant database fields

Metadata Metadata insertion supports document portability, i.e., platform independence Make documents self-aware, e.g., re-attach dead documents Consistent with ARMA & NARA recommendations Useful for encoding important document information, e.g., dbase field data, retention policy Automated insertion into document management system

Recognition (OCR, ICR): Advantage Color 450 Number of words 400 350 300 250 200 150 100 50 Metrics 36 invoices Green color invoices Blue bitonal (B&W) Words Recognition Color invoices - 4390 B&W invoices 2824 55% improvement 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Color vs. B&W Recognition Rates

Elements of Smart Captured Documents Smart captured documents increase the functionality of image files Smart documents include support for web-optimization, OCR, reproducibility, metadata, and auto-indexing PDF supports smart documents PDF/A is a restricted version of PDF (1.4), especially suited for document reproducibility & archiving Smart captured documents result in improved corporate ROI Smart captured documents are very compelling for Web-based, distributed database, and email applications

Smarter Document Capture Exploring next-generation document images Ralph Gammon editor, Document Imaging Report

Traditional Document Images Captured in centralized environments with high-speed scanners Black-and-white, TIFF, Group 4 compressed Meta data managed through document management systems Not considered a long-term archiving format

Editor of the Document Imaging Report since 1998 Who am I? Premier source of news and analysis in the document capture and imaging market Accept no advertising in print publication Paid subscription www.documentimagingreport.com Publisher RMG Enterprises

The Potential of Document Imaging Color scanners now available for the same price as black-and-white Distributed capture infrastructure in place Advanced compression methods increase usability of color images PDF/A approved as an ISO standard

PDF: A better file format? Stands for portable document format Introduced by Adobe in 1993 According to Adobe, more than 500 million free PDF readers have been downloaded In 2007, Adobe submitted the PDF specs to ISO for ratification as an international standard PDF/A (archiving) approved in 2005

PDF: A Versatile Document Format Supports both imaged and electronically-generated files Supports color and bi-tonal compression Group 4, JBIG2, JPEG, JPEG 2000 Supports image segmentation and layering Self-describing image format

PDF: A self-defining image format Provides structured container for carrying important document information Full-text OCR results for searchability Meta data such was when the document was created, who the author is, when it was scanned, what type of document it is, etc. Historically, this information has been kept in a database separate from the image If you change image management systems this information needs to be transitioned Meta data is not portable.

Capturing Meta Data Several options for capturing meta data Key entry Bar codes OCR/ICR/IDR Meta data entry increasingly automated Improvements in OCR/ICR Voting Database look-ups Better image quality Introduction of intelligent document recognition (IDR) More meta data means more options Data mining Records management Automated workflows

PDF Compression: Why Smaller is Better Cost of storage is falling, but still can be significant when talking about millions of document images When viewing on the Web, smaller files mean faster downloads and a better user experience In distributed scanning environments, smaller files are simpler to move around JBIG compression can create bi-tonal PDF files similar in size to the original electronically created files PDF color compression techniques can create file sizes up to 100 times smaller (and higher quality) than their JPEG counterparts

PDFs for Web Viewing PDF viewer is universal PDFs can be optimized for Web viewing This is helped by advanced compression that creates smaller files to download Also supports multi-page files and use of intelligent downloads

Why color images?

Why Color Document Images? Truer representation of the original Contains more information Adoption of color printing Better for Web viewing Improved recognition rates

Why not color scanning? Color file sizes can be very large Typical 300 dpi scanned color page represents 24 MB of raw data Even a JPEG compressed document images can be more than 10 times the size of a bi-tonal counterpart (400 KB for color vs. 40 KB for bi-tonal JPEG not optimized for image viewing

How advanced color compression works PDF supports MRC (Mixed Raster Content) Enables segmenting of document in layers Once segmented, those layers can be compressed separately Enables optimal compression of each layer and file sizes 10 to 100 times smaller

Lossy vs. Lossless Lossless can be a misnomer, as any color document captured in black-and-white is losing information Perceptually lossless images are those that when viewed from a certain distance appear identical to human observers. While compression formats like JBIG2 and JPEG are not technically lossless, they can also be classified as perceptually lossless Best practices call for users to adjust their advanced compression settings until they are satisfied that images for a certain type of document are perceptually lossless

PDF/A: long-term archiving format Designed so that a PDF/A file created today will be able to decoded by a PDF/A reader in perpetuity Internally contains all resources necessary to be rendered Contains provisions for meta data Approved as an ISO standard in 2005 Has yet to gain widespread adoption, but are starting to see some initiatives on the international and state gov. level Applicable across electronic files and images Applications available for testing validity of PDF/A files PDF Center for Competence dedicated to developing best practices around PDF/A adoption (www.pdfa.org)

Levels of image enhancement Basic: deskewing, despeckling, auto-crop, blank-page removal, analog color dropout More advanced: line removal, electronic color dropout, auto-rotation based on text, multi-streaming Most advanced: grayscale thresholding, color segmentation

Grayscale can be as good as color

Example of Grayscale Thresholding

Summary PDF represents a more versatile file format than TIFF or JPEG PDF represents a smarter, self-contained file format PDF/A represents an ISO certified long-term image file format Technological advances in the following areas have combined to make PDF a more attractive imaging format Compression Display Meta data capture Scanning

Questions? On the bottom left hand side of your screen, type your question in the white box and hit Submit Question Button.

Upcoming Webinars April 23 rd Implement Your ECM Roadmap in 2008 May 7 th Finding Content: The best information in the world is worthless if you can t access and use it. May 14 th Shop Smart: Critical Buying Decisions for Capture June 4 th Enterprise Report Management Can't be Overlooked June 18 th Get Rid of Your Paper! Or not. Register Today at www.aiim.org/webinars