SOLUTION OVERVIEW DATA CATALOGS FOR DATA RATIONALIZATION
Intrductin Hw big f a prblem is data redundancy? If yu are like mst cmpanies, it is much bigger than yu wuld care t admit. Fr mst businesses data has becme like flies t flypaper. When yu cnsider that fr every prductin database, yu als tend t spawn a develpment, test, QA, staging, experimental and reprting database, it desn t take lng fr that single database t result in 6x mre in ttal vlume. That additinal vlume requires additinal database licenses, plus strage, as well as dat base administratrs t manage the entire thing. In fact, if we think abut the data prliferatin pyramid, the multiplicatin factr is almst 50x. And nce cst f strage began its steep descent in 2005, and clud prviders came alng and began t prvide lts f space at lw cst, rganizatins used that as an excuse t cntinue t hard data, even if they weren t actually using it. SOLUTION OVERVIEW DATA CATALOGS FOR DATA RATIONALIZATION WATERLINE DATA 2017 2
The Hidden Cst f Big Data In 2016, Veritas surveyed 10,000 rganizatins and 83% f IT decisin makers said their rganizatins were data harders. This is neither a surprise nr a bad thing because the cst f strage has becme relatively cheap. But while it is cheap, it still isn t free. Let s d the math! 83% f rganizatins are data harders In ne example, a cnsumer retail cmpany has ver 8 petabytes f data currently csting them abut $1 per gigabyte (this is a very cnservative number that des nt cnsider lts f factrs including database sftware csts, peple csts r strage management sftware), r $8 millin t stre. They als estimate that apprximately 30% f that data (2.4 petabytes) is redundant and can be eliminated, resulting in savings f $2.4 millin. S, what is stpping them? They knw at a high level they have lts f duplicate data, but at a granular level they dn t knw exactly what infrmatin is the riginal surce, what is replicated and what can be eliminated. They have n data map tracking all their data, its freshness r its prvenance. The result is that they hard everything, keeping redundant legacy develpment and QA data arund fr ld prjects that have lng since been cancelled. They maintain data marts that were created fr a specific ne-time analysis, but then never delete them, all the time cntinuing t pay the $1 per GB fr duplicate data that is n lnger in use. And these strage csts dn t even cver all the ancillary csts f redundant dark data. Much f that dark data culd als pse security, cmpliance and ther crprate risks that are hard t calculate. Imagine if sme f that data vilated the upcming EU Gverning Data Prtectin Regulatin (GDPR), als knwn as the right fr a cnsumer t be frgtten. That regulatin carries a fine up t 4% f wrldwide revenues. The implicatin is that redundant, dark-data can n lnger be left sitting arund. The physical cst and the ptential risk is becming t high. SOLUTION OVERVIEW DATA CATALOGS FOR DATA RATIONALIZATION WATERLINE DATA 2017 3
Using a Data Catalg t Ratinalize Data The reslutin t this issue is cnceptually simple. By implementing Waterline Smart Data Catalg, yur rganizatin can dcument the lcatin, quality and lineage f data lcated in relatinal surces, in the clud r in Hadp clusters. A smart data catalg autmatically tags the data fields acrss different data sets with cmmn business terms, dcuments its verall quality, existing data lineage and inferred missing lineage. During this prcess, it als identifies where yu have high levels f redundant data that can be ratinalized. the catalg identifies high levels f duplicate data fr ratinalizatin SOLUTION OVERVIEW DATA CATALOGS FOR DATA RATIONALIZATION WATERLINE DATA 2017 4
Hw Des It Wrk? The Waterline Data Catalg is built n the premise that there are three parts t the data catalging prcess. The first is t help data prfessinals discver, rganize and curate the data; the secnd is t allw gvernance prfessinals leverage tagged data fr cmpliance reprting and access cntrl and the third is t expse the newly rganized data fr business prfessinals t use. Discver Organize Curate Cmpliance Reprting Access Cntrl Search Rate & Cllabratin DATA PROFESSIONALS GOVERANCE PROFESSIONALS BUSINESS PROFESSIONALS Discver: Autmatically and incrementally fingerprint data and infer data lineage at scale by analyzing actual data values Organize: Uses machine learning t autmatically tag and match data fingerprints t glssary terms Curate: Human reviewers accept r reject tags, machine learning fine tunes the tagging prcess and imprves the matching algrithm Cmpliance: Map yur cmpliance plicies t yur data assets: acceptable use, legal hlds, expiry. Reprting: Simplify nging mandated reprting as the catalg uncvers dark data and catalgs it dynamically Access cntrl: Autmate data access cntrl via tag based security Search: Search fr data thrugh the Waterline GUI r thrugh integratin via 3rd party applicatins Rate & Cllabrate: Users cllabrate t create subjective crwdsurced ratings/reviews which cmbined with bjective prfiling metadata prvides users with a view int data quality and usefulness. SOLUTION OVERVIEW DATA CATALOGS FOR DATA RATIONALIZATION WATERLINE DATA 2017 5
Benefits fr Data Ratinalizatin Autmating the end t end data catalging prcess with Waterline Smart Data Catalg significantly reduces the excess cst f redundant data: Ratinalizes excess data, reducing the cst ft Redundant database sftware licensing & supprt Excess server & strage cst Excess data center perating expenses Reduces business risk due t: Data stred fr gvernment cmpliances rules and regulatins Database perfrmance degradatin Questinable data quality Cnclusin Data ratinalizatin presents a large cst savings pprtunity fr mst IT rganizatins. By autmating the underlying prcess fr inventrying, tagging and curating data and data lineage, redundant data can be identified, ratinalized and the savings can be easily harvested. The bttm line is that data redundancy is a much larger hidden cst than mst rganizatins realize. The flip side hwever is that this is a cst that rganizatins are already bearing. This means it is als a surce t mine fr savings and budget. By eliminating the excess strage, database and assciated csts f data redundancy, there is a significant vein f gld t mine in cst savings that can be used t fund ther data related prjects. waterlinedata.cm Sales Technical Supprt Crprate Headquarters sales@waterlinedata.cm Visit the Supprt Center 201 San Antni Circle Suite 260 help@waterlinedata.cm Muntain View CA 94040 SOLUTION OVERVIEW DATA CATALOGS FOR DATA RATIONALIZATION WATERLINE DATA 2017 (650) 946-2104 6