Challenges Finding PDFs in SharePoint or Office 365

Similar documents
PDFelement Whitepaper. How Real Estate Professionals can exploit PDF Editors to create secure Digital Signatures and Untamperable e-contracts

PDFelement Whitepaper. How Banks & Financial Companies can Unlock Hidden Savings with Affordable PDF Editing Software

Session Questions and Responses

The Case for Digital Mailroom

Bring more control and added efficiency to your scanning and print environment

Aquaforest Searchlight Release Notes

Scanning. Technology

One platform for scanning, printing and copying management

Records Management in the Formerly Used Sites Remedial Action Program (FUSRAP)

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

General Data Protection Regulation: Knowing your data. Title. Prepared by: Paul Barks, Managing Consultant

DOCUMENT NAVIGATOR SALES GUIDE ADD NAME. KONICA MINOLTA Document Navigator Sales Guide

Don t just manage your documents. Mobilize them!

contentcrawler SharePoint 2010 & 2013 Versioning Guide

DOWNLOAD PDF EDITING TEXT IN A SCANNED FILE

OpenText RightFax Bar Code Routing

{ Embedded Applets for Canon Devices }

DOWNLOAD OR READ : THE BEST OF IT PDF EBOOK EPUB MOBI

ES Financials Tips & Techniques 14/03/2016

DOWNLOAD OR READ : WORD AND IMAGE IN ARTHURIAN LITERATURE PDF EBOOK EPUB MOBI

ABBYY FineReader 14 YOUR DOCUMENTS IN ACTION

Hello, and welcome to a searchsecurity.com. podcast: How Security is Well Suited for Agile Development.

Scanshare Sales Guide V1.2

CLOUDALLY EBOOK. Best Practices for Business Continuity

THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY

Chapter 9 Section 3. Digital Imaging (Scanned) And Electronic (Born-Digital) Records Process And Formats

CCMS Guidance for Producing Assessment Material Using Workflow Management. Assessment Tester Guidance (for ESM assessment material)

Ten Ways to Share Your Publications With the World: A Guide to Creating Accessible PDF Documents in Adobe Acrobat Professional 7.

AMScan Viewer User Guide Version 2.1

contentcrawler SharePoint 2007 Versioning Guide

MOBILedit Forensic Express

Receipt Gallery. Receipt Gallery Instructions. Page 1 of 11

Ten common PDF accessibility errors with solutions

Clearing Out Legacy Electronic Records

The rapid expansion of usage over the last fifty years can be seen as one of the major technical, scientific and sociological evolutions of

32% DATA LOSS SKYKICK CLOUD BACKUP FOR OFFICE 365 IN THE CLOUD OF COMPANIES WILL EXPERIENCE 6HRS. Easy set up, unlimited backup & one-click restore

Making a Business Case for Electronic Document or Records Management

DarkoKravos, PMP. Dodd Frank Title VII Recordkeeping. Record keeping changes impacting business and technology

Universal Access Tip Sheet: Creating Accessible PDF Files from a Scanned Document

THE COMPLETE GUIDE HADOOP BACKUP & RECOVERY

Lexis for Microsoft Office User Guide

Moving You Forward A first look at the New FileBound 6.5.2

Is SharePoint the. Andrew Chapman

Scan to PC Desktop Professional v9 vs. Scan to PC Desktop SE v9 + SE

DOWNLOAD OR READ : THE DECRYPTER AND THE MIND HACKER A CALLA CRESS TECHNO THRILLER CALLA CRESS TECHNO THRILLER SERIES PDF EBOOK EPUB MOBI

OpenText Fax Servers and Microsoft Office 365

Leverage SharePoint with PSIcapture

A great deal more than... Copy. Fax. Print. Scan. Product Summary

Cyber Risk Program Maturity Assessment UNDERSTAND AND MANAGE YOUR ORGANIZATION S CYBER RISK.

ADOBE 9A Adobe Acrobat Professional 8.0 ACE.

Virto Incoming Fax Service for SharePoint Release User and Installation Guide

OPTIMIZATION MAXIMIZING TELECOM AND NETWORK. The current state of enterprise optimization, best practices and considerations for improvement

Industry Training and Certification. Professionalize, network, learn

Upload and Attach Receipts

TÜV SÜD Industrie Service GmbH. Maximising efficiency of power stations and plants.

SUGGESTED SOLUTION IPCC MAY 2017EXAM. Test Code - I M J

Visual EstiTrack - Chapter 9 9 QUALITY CHAPTER

DIGITAL SIGNATURES The entire organisation benefits

Figure 1. Ideal statement uniform legible font, minimal graphics, clean background

Financial Services Case Study

LepideAuditor. Data Discovery and Classification

Records management workflows

Information Management Platform Release Date Version Highlights compared to previous version

Website ADA Compliance Made Easy: How to Respond to Legal Demand Letters or Avoid Them, Altogether.

ProServeIT Corporation Century Ave. Mississauga, ON L5N 6A4 T: TF: F: W: ProServeIT.

Classify Collate Cross-reference Index Distribute Retrieve Report Control

Interview with Resolver Systems

Summation 6.3 Release Notes

Desktop Virtualization: What Windows Managers Should Know

Managing the Transition to Digital Documentation Nuance Communications, Inc. All rights reserved.

Management software PageScope Suite PageScope Suite The workflow accelerator

Fast. Easy to use. Built for business. Two powerful new desktop document scanners join our award winning line-up ADS-2200 I ADS-2700W

Folder Export Connector

Implementing a Standardized PDF/A Document Storage System with LEADTOOLS

Learn Html Pdf Converter Software Windows Xp With Key

Aquaforest Searchlight Reference Guide

And FlexCel is much more than just an API to read or write xls files. On a high level view, FlexCel contains:

SAP. Modeling Guide for PPF

Creating Searchable PDFs with Adobe Acrobat XI - Quick Start Guide

Office 365 Business The Microsoft Office you know, powered by the cloud.

IBM Infoprint Color 1354 and IBM M30 MFP Option

Tips for Effective Patch Management. A Wanstor Guide

Terms in the glossary are listed alphabetically. Words highlighted in bold are defined in the Glossary.

TOP Server with Telemetry Systems. Real Time and EFM Data Collection Best Practices

Apprenticeships CYBER SECURITY ADVANCED TO TECHNICAL MODERN APPRENTICESHIP FROM NQ-LEVEL TO SKILLED SECURITY ENGINEER

Microsoft Office 365 Business Plans

Building an ASP.NET Website

Office 365: Fact Sheet

Accessibility 101. Things to Consider. Text Documents & Presentations: Word, PDF, PowerPoint, Excel, and General D2L Accessibility Guidelines.

celerate GDPR compliance h the use of new technologies oni Papanikolaou orate, External & Legal Affairs Director soft Greece, Cyprus & Malta

Corporate Information Security Policy

WHITE PAPER- Managed Services Security Practices

PS 176 Removable Media Policy

Aquaforest Searchlight Reference Guide

Convert Scanned Document To Free Software

Disaster Recovery Planning: Weighing your customer s options

Canon ir isend Setup Guide.

Overview of Archiving. Cloud & IT Services for your Company. EagleMercury Archiving

Construction IC User Guide

DjVu Technology Primer

Transcription:

Challenges Finding PDFs in SharePoint or Office 365 Ensure Your Documents are Fully Text Searchable with Searchlight

W hy Can't I Find That PDF?I So you have just spent half an hour searching for an important document that you know was stored in SharePoint. Or maybe your colleague asked you to find a contract in O365, but you just cannot find it? Yep, we?ve been there? and so have countless others. There are estimated to be trillions of PDF files currently in existence and many of them are important documents that reside in SharePoint collections. Worryingly, we estimate that in a typical organization, some 20% of PDF documents cannot be located by SharePoint text search for a variety of reasons. Many types of documents are not searchable without special processing. For example: Scanned TIFF Files Image PDF Files Faxes As well as being pretty annoying, if you cannot identify these unsearchable documents, you cannot take corrective action. This ebook will share the most common reasons why you?can?t find that PDF? in SharePoint or O365 whilst also showing you how you can. 1. Som e PDFs ar e Im age-on ly PDFs that originated as scanned documents, faxes or other images will be Image-Only and not contain any text for the SharePoint indexer to index unless they have been through an OCR process and a text layer added to the image. To check whether a particular PDF is Image-Only you can try to select and copy what appears to be text, or try searching for text? if you cannot do this then you are looking at an image PDF.

2. Par t ially Im age-on ly PDFs To make things more complex, some PDFs may be partially Image-Only ie. they have non-searchable sections that are purely images along with some text area. 3. Passw or d-pr ot ect ed PDFs Surprisingly, password protected PDFs often make their way into SharePoint. As the indexer cannot open the document to extract it isn?t possible for the contents to be added to the search index. 4. Size Lim it s Be wary with documents that run into many hundreds of pages? SharePoint indexing does have limits. Our tests on O365 showed that O365 will index less than 2MB of text. In our test case this corresponded to around 400 pages of text. 5. Vect or Im ages Some PDFs such as the one shown may appear to contain text but in fact the?text? is rendered by drawing lines so the document actually contains no searchable text. This is common in architectural diagrams.

Th e Bu sin ess Cost s Now you have a clearer idea of why you can?t find that PDF, it is also good to have an understanding of the cost of having unsearchable documents and they are often not realised until it?s already caused a massive problem. This leads to a number of worrying legal, decision-making and employee impacts. We have outlined the main ones our customers are faced with; which ones could apply to you? Legal Im pact Compliance audits, freedom of information requests, and legal discovery mandates require organisations to recover all of the relevant electronically stored information, information that is often required at short notice. Can you be sure that you can retrieve all of the relevant documents in time, and then do you even know if you have retrieved them all. Could there be vital documents that are not searchable and thus cannot be found. Is it a risk you are willing to take? Decision M ak in g Im pact Business decisions are a daily occurrence, some are small but some have more vital implications on company operations. The majority of more important decisions will need to be thoroughly researched and backed up by documentation usually stored in SharePoint or O365. If you had not seen that document about X when searching about the X case and made a decision? was this a fully informed decision? This is a massive risk with huge implications. Em ployee t im e an d cost You have already spent half an hour looking for that PDF, but what about your 400 colleagues in your building? How long have they spent? Maybe longer. Some may have even had to spend time recreating documents because they cannot find the one they were looking for. The presents a massive opportunity cost of your and their time, not to mention the financial cost to the business.

Th e Solu t ion Good news. There is a solution that will provide both corrective and preventative action to these business issues. Without manually opening these PDFs one by one and reading them, it is virtually impossible to determine which documents are fully searchable without an automated tool. To make these documents text searchable, they need to be transformed into a format that can be searched and indexed by the SharePoint crawler. This is where Searchlight comes in. Searchlight is able to audit SharePoint document stores, identify image-only PDFs and turn them into searchable PDFs using Optical Character Recognition (OCR), thus allowing the SharePoint crawler to index them. St ep 1 : Au dit Before it is possible to transform a document library to searchable, it is necessary to identify the unsearchable PDFs. Searchlight will perform an Audit on the document library in order to determine which documents are candidates for processing by examining each document?s searchability status and the document library?s processing settings. Searchlight identifies how many of your documents are: - Non-Searchable (scans, faxes, TIFFs and image PDFs) Partially Searchable Fully Searchable Non-searchable due to file errors The searchability status determines the process method used due to the conversion rules. The reasons as to why you cannot find the PDF mentioned earlier, each have a different conversion role, meaning the process method will be different for a partially searchable or error.

St ep 2 : M ak e Sear ch able Once the document library has been audited and the unsearchable documents have been identified, Searchlight?s Optical Character Recognition (OCR) technology will create a text version of the file contents. This allows a searchable PDF to be created by merging the original page images with a hidden text layer. St ep 3 : M on it or Unsearchable documents will be consistently added to your SharePoint or O365, meaning that there is not a?one time? solution. Therefore, Searchlight ensures that document stores are automatically monitored to deal with new and updated documents. The service controls the execution of all job runs in Searchlight. It is used by the scheduler and enables the monitoring and processing of document libraries at regular time intervals without interfering with other work being performed on the machine it is installed. For M or e In f or m at ion Abou t Aqu af or est Sear ch ligh t Please visit aquaforest.com or contact Neil Pitman by email at neil.pitman@aquaforest.com.

About was established in 2001 to provide High Performance PDF, OCR and Sharepoint products to a world-wide market. are experts in Searchable PDFs. Thousands of organizations rely on solutions as part of their document workflow processes. As a Company we are passionate about what we do, the software and solutions that we provide. Our teams are dedicated to delivering high quality products backed up by outstanding support and customer service. Please visit www.aquaforest.com for further information about our products and services. Over 2,000 Organizat ions Rely on Soft ware