ConceptualModelingforETLProcesses

Size: px
Start display at page:

Download "ConceptualModelingforETLProcesses"

Transcription

1 ConceptualModelingforETLProcesses PanosVassiliadis,AlkisSimitsis,SpirosSkiadopoulos 1 1 NationalTechnicalUniversityofAthens,Dept.ofElectricalandComputerEng., ComputerScienceDivision,IroonPolytechniou9,15773,Athens,Greece {pvassil,asimi,spiros}@dbnet.ece.ntua.gr 1. INTRODUCTION RELATEDWORKANDSTATEOFTHEART CONCEPTUALMODEL CONCEPTSANDATTRIBUTES TRANSFORMATIONS,CONSTRAINTSANDNOTES PART-OFANDCANDIDATERELATIONSHIPS PROVIDERRELATIONSHIPSANDSERIALCOMPOSITIONOFTRANSFORMATIONS METHODOLOGYFORTHEUSAGEOFTHECONCEPTUALMODEL INSTANTIATIONANDSPECIALIZATIONLAYERS CONCLUSIONS...23 ACKNOWLEDGMENTS...24 REFERENCES...24 p.1of25

2 ConceptualModelingforETLProcesses PanosVassiliadis,AlkisSimitsis,SpirosSkiadopoulos 1 1 NationalTechnicalUniversityofAthens,Dept.ofElectricalandComputerEng., ComputerScienceDivision,IroonPolytechniou9,15773,Athens,Greece {pvassil,asimi,spiros}@dbnet.ece.ntua.gr Abstract.Extraction-Transformation-Loading(ETL)toolsarepiecesofsoftwareresponsiblefor theextractionofdatafromseveralsources,theircleansing,customizationandinsertionintoadata warehouse. Research has only recently dealt with the above problem and has not established commonly accepted models, formalisms and techniques for tasks like the job definition, scheduling,monitoringanderrorhandlinginanetlenvironment.inthispaper,wefocusonthe problemofthedefinitionofetlactivitiesandprovideformalfoundationsfortheirrepresentation. The proposed model is (a) customized for the tracing of inter-attribute relationships and the respectiveetlactivitiesintheearlystagesof adata warehouseproject;(b)enrichedwitha 'palette'ofasetoffrequentlyusedetlactivities,liketheassignmentofsurrogatekeys,thecheck fornullvalues,etc;and(c)constructedinacustomizableandextensible manner,sothatthe designercanenrichitwithhisownre-occurringpatternsforetlactivities. 1. Introduction Extraction-Transformation-Loading(ETL)toolsisacategoryofspecializedtoolswiththe taskofdealingwithdatawarehousehomogeneity,cleaningandloadingproblems.[shty98] reportsthatetlanddatacleaningtoolsareestimatedtocostatleastonethirdofeffortand expensesinthebudgetofthedatawarehousewhile[dema97]mentionsthatthisnumbercan riseupto80%ofthedevelopmenttimeinadatawarehouseproject.[inmo97]mentionsthat theetlprocesscosts55%ofthetotalcostsofdatawarehouseruntime.still,duetothe complexityandthelonglearningcurveofthesetools,manyorganizationsprefertoturntoinhousedevelopmenttoperformetlanddatacleaningtasks.infact,whiledatawarehouse expensesareexpectedtocomeupto14billiondollarsworldwide,projectedsalesforetl anddatacleaningtoolsareexpectedtorisetoonly(!)300milliondollars.thus,itisapparent thatthedesign,developmentanddeploymentofetlprocesses,whichiscurrentlyperformed p.2of25

3 in an ad-hoc, in house fashion, needs modeling, design and methodological foundations. Unfortunately,asweshallshowinthesequel,theresearchcommunityhasalotofworktodo toconfrontthisshortcoming.intherestofthepaper,wewillnotdiscriminatebetweenthe tasksofetlanddatacleaningandadoptthenameetlforboththesekindsofactivities. InFig.1.1weabstractlydescribethegeneralframeworkforETLprocesses.Inthebottom layerwedepictthedatastoresthatareinvolvedintheoverallprocess.ontheleftside,wecan observetheoriginaldataproviders.typically,dataprovidersarerelationaldatabasesand files.thedatafromthesesourcesareextracted(asshownintheupperleftpartoffig.1.1)by extraction routines, which provide either complete snapshots or differentials of the data sources.then,thesedataarepropagatedtothedatastagingarea(dsa)wheretheyare transformedandcleanedbeforebeingloadedtothedatawarehouse.thedatawarehouseis depictedintherightpartofthedatastorelayerandcomprisesthetargetdatastores,i.e.,(a) facttablesforthestorageofinformationand(b)dimensiontableswiththedescriptionofthe multidimensional,roll-uphierarchiesofthestoredfacts.eventually,theloadingofthecentral warehouseisperformedthroughtheloadingactivitiesdepictedontheupperrightpartofthe figure. Extract Transform &Clean Load Sources DSA DW Fig.1.1TheenvironmentofExtract-Transform-Loadprocesses In [KRRT98] one can find a detailed description of the different aspects of the ETL process,including:thejobdefinitionoftheprocess;time/eventbasedscheduling;monitoring (i.e., on-line information about the progress/status of the process); logging (i.e., off-line information, presented at the end of the execution); error handling (for incoming data p.3of25

4 violatingconsistencyconstraintsorbusinessrules);crashrecoveryandstop/startcapabilities (e.g.,committingatransactioneveryfewrows)andmiscellaneousotherfeatures. Inthispaper,wefocusontheconceptualpartofthedefinitionoftheETLprocess.More specifically,wearedealingwiththeearlieststagesofthedatawarehousedesign.duringthis period, the data warehouse designer is concerned with two tasks which are practically executedinparallel.thefirstofthesetasksinvolvesthecollectionofrequirementsfromthe partoftheusers.thesecondtask,whichisofequalimportanceforthesuccessofthedata warehousingproject,involvestheanalysisofthestructureandcontentoftheexistingdata sources and their intentional mapping to the common data warehouse model. Related literature[krrt98]andpersonalexperience[vass00]suggestthatthedesignofanetl processaimstowardstheproductionofacrucialdeliverable:themappingoftheattributesof thedatasourcestotheattributesofthedatawarehousetables. Theproductionofthisdeliverableinvolvesseveralinterviewsthatresultintherevisionand redefinition of original assumptions and mappings; thus it is imperative that a simple conceptualmodelisemployedinorderto(a)facilitatethesmoothredefinitionandrevision effortsand(b)serveasthemeansofcommunicationwiththerestoftheinvolvedparties. Webelievethattheformalmodelingofthestartingconceptsofadatawarehousedesign processhasnotbeenadequatelydealtbytheresearchcommunity.tothisend,inthispaper we propose a conceptual model for this task, with a particular focus on (a) the interrelationshipsofattributesandconceptsand(b)thenecessarytransformationsthatneedto takeplaceduringtheloadingofthewarehouse.thelatterpartisdirectlycapturedinthe proposedmetamodelasafirstclasscitizen;weemploytransformationsasagenerictermfor therestructuringofschemaandvaluesorfortheselectionandeventhetransformationof data.attributeinterrelationshipsarecapturedthroughproviderrelationshipsthatmapdata provider attributes at the sources to data consumers in the warehouse. Apart from these fundamental relationships, the proposed model is able to capture constraints and transformationcomposition,too.duetothenatureofthedesignprocess,inthispaper,we presentthefeaturesoftheconceptualmodelinasetofdesignsteps,whichleadtothebasic p.4of25

5 target,i.e.,theattributeinterrelationships.thesestepsconstitutethemethodologyforthe designoftheconceptualpartoftheoveralletlprocess. Theproposedmodelischaracterizedbydifferentinstantiationandspecializationlayers. Thegenericmetamodelthatweproposeinvolvesasmallsetofgenericconstructsthatare powerful enough to capture all cases. We call these entities themetamodel layer in our architecture.moreover,weintroduceaspecializationmechanismthatallowstheconstruction ofa palette offrequentlyusedetlactivities(e.g.,transformationslikethesurrogatekey transformation or checks for null values, primary key violations, etc.). This set of ETLspecific constructs, constitute a subset of the larger metamodel layer, which we call the TemplateLayer.TheconstructsintheTemplatelayerarealsometa-classes,buttheyarequite customizedfortheregularcasesofetlprocesses.alltheentities(datastores,inter-attribute mappings,transformations,etc.)thatadesignerusesinhisparticularscenarioareinstancesof theentitiesinthemetamodellayer.forthecommonetltransformations,theemployed instancescorrespondtotheentitiesofthetemplatelayer. Morespecifically,ourcontributionliesin: 1.Theproposalofanovelconceptualmodelwhichiscustomizedforthetracingofinterattribute relationships and the respective ETL activities in the early stages of a data warehouseproject. 2.Theconstructionoftheproposedmodelinacustomizableandextensiblemanner,sothat thedesignercanenrichitwithhisownre-occurringpatternsforetlactivities. 3.The introduction of a palette of a set of frequently used ETL activities, like the assignmentofsurrogatekeys,thecheckfornullvalues,etc. This paper is organized as follows. In Section 2, we present related work. Section 3 presents the conceptual model for ETL processes. In Section 4, we demonstrate the methodology for the usage of the conceptual model. In Section 5, we introduce the instantiationandthespecializationlayersfortherepresentationofetlprocesses.finally,in Section6weconcludeourresults. p.5of25

6 2.RelatedWorkandStateoftheArt Inthissectionwewillreviewrelatedworkinthefieldsofconceptualmodelingfordata warehousing and ETL in general. For lack of space, we refer the interested reader to [VVS+01]foranextendeddiscussionoftheissuesthatwebrieflypresentinthissection. Conceptual models for data warehouses. The front end of the data warehouse has monopolizedtheresearchontheconceptualpartofdatawarehousemodeling.infact,mostof theworkonconceptualmodeling,inthefieldofdatawarehousing,hasbeendedicatedtothe capturing of the conceptual characteristics of the star schema of the warehouse and the subsequentdatamartsandaggregations(see[tsoi01]forabroaderdiscussion).research effortscanbegroupedinfourmajortrends,including:(a)dimensionalmodeling[kimb97, KRRT98], (b) (extensions of) standard E/R modeling [SBHD98, CDL+98, CDL+99, HuLV00, MoKo00, TrBC99] (c) UML modeling [NgTW00, TrPG00] and(d) sui-generis models [GoMR98, GoRi98, Tsoi01] without a clear winner. The supporters of the dimensional modeling method argue that the model is characterized by its minimality, understandability(especiallybytheend-users)anditsdirectmappingtologicalstructures. ThesupportersoftheE/RandUMLmethodsmodelsbasetheirargumentsonthepopularity oftherespectivemodelsandtheavailablesemanticfoundationsforthewell-formednessof datawarehouseconceptualschemata.thesui-generismodelsareempoweredbytheirnovelty andadaptationtotheparticularitiesoftheolapsetting. ConceptualmodelsforETL.Therearefewattemptsaroundthespecificproblemofthis paper, although we are not aware of any other approach that concretely deals with the specifics of ETL activities ina conceptual setting.we can mention [BaFM99] as afirst attempt to clearly separate the data warehouse refreshment process from its traditional treatment as a view maintenance or bulk loading process. Still, the proposed model is informalandthefocusisonprovingthecomplexityoftheeffort,ratherthantheformal modelingoftheactivitiesthemselves.[cdl+98,cdl+99]introducethenotionofintermodel assertions,inordertocapturethemappingsbetweenthesourcesandthedatawarehouse. However,anytransformationisdereferencedforthelogicalmodelwhereacoupleofgeneric operatorsareemployedtoperformthistask.intermsofindustrialapproaches,themodelthat stemsfrom[krrt98]wouldbeaninformaldocumentationoftheoveralletlprocess. p.6of25

7 RelatedworkonETLlogicalandphysicalaspects.Finally,apartfromthecommercial ETLtools[Arde01,Data01,Micr01,ETI01,Orac01]therealsoexistresearcheffortsincluding [GFSS00, RaHe01, Mong00, BoDS00, RaHa00, VQVJ01, VVS+01, VaSS02, LWGG00]. Themanagementofqualityindatawarehousesisdiscussedextensivelyin[JJQV99,JLVV00, JeQJ98]. Wewouldliketostressthat,inthispaper,wedonotproposeanotherprocess/workflow model;thus,wedonotintendtocoverthecompositeworkflowofetlactivitiesforthe populationofthewarehouse.therearetwobasicreasonsforthisapproach.first,inthe conceptual model for the ETL process, the focus is on documenting/formalizing the particularitiesofthedatasourceswithrespecttothedatawarehouseandnotinprovidinga technicalsolutionfortheimplementationoftheprocess.secondly,theetlconceptualmodel is constructed in the early stages of the data warehouse project during which, the time constraintsoftheprojectrequireaquickdocumentationoftheinvolveddatastoresandtheir relationships,ratherthananin-depthdescriptionofacompositeworkflow(seealsosection5 forthis).inthissense,ourapproachiscomplementarytotheaforementionedlogicalmodels, sinceitinvolvesanearlierstageofthedesignprocess.werefertheinterestedreaderto [VaSS02]foraformalmodelingofthisworkflow,whichisbeyondthescopeofthispaper. 3. ConceptualModel ThepurposeofthissectionistopresenttheconceptualmodelforETLactivities.Ourgoalis tospecifythehighlevel,user-orientedentitieswhichareusedtocapturethesemanticsofthe ETLprocess.First,wewillpresentthegraphicalnotationandthemetamodelofourproposal. Then,wewilldetailandformallydefinealltheentitiesofthemetamodel.Throughoutallthe section,wewillclarifytheintroducedconceptsthroughtheirapplicationtoamotivating examplethatwillsupportthediscussion. p.7of25

8 concept attribute transformation Note ETL_constraint activecanditate partof provider 1:1 provider N:M serial composition candidate 1... candidate n {XOR} target Fig.3.1NotationfortheconceptualmodelingofETLactivities InFig.3.1wegraphicallydepictthedifferententitiesoftheproposedmodel.Wedonot employstandardumlnotationforconceptsandattributes,forthesimplereasonthatweneed totreatattributesasfirstclasscitizensofourmodel.thus,wedonotembedattributesinthe definitionoftheirencompassingentity,likeforexample,aumlclassorarelationaltable. Wetrytobeorthogonaltotheconceptualmodelswhichareavailableforthemodelingofdata warehousestarschemata;infact,anyoftheproposalsforthedatawarehousefrontendcanbe combinedwithourapproach,whichisspecificallytailoredforthebackendofthewarehouse. Fig.3.2TheproposedmetamodelasaUMLdiagram InFig.3.2wedepictthebasicentitiesoftheproposedmetamodelinaUMLdiagram.All theconstructsofourconceptualmodelwhichareintroducedintherestofthissectionreferto theentitiesoffig.3.2. p.8of25

9 Database Concept Attributes S 1 S 1.PARTSUPP PKEY, SUPPKEY, QTY, COST S 2 S 2.PARTSUPP PKEY, SUPPKEY, DEPARTMENT, DATE, QTY, COST DW DW.PARTSUPP PKEY, SUPPKEY, DATE, QTY, COST Fig.3.3Theschemataofthesourcedatabasesandofthedatawarehouse TomotivatethediscussionwewillintroduceanexampleinvolvingtwosourcedatabasesS 1 ands 2 aswellasacentraldatawarehousedw.theavailableconceptsforthesedatabasesare listedinfig.3.3,alongwiththeirattributes.thescenarioinvolvesthepropagationofdata fromtheconceptpartsuppofsources 1 aswellasfromtheconceptpartsuppofsources 2 to the data warehouse. Table DW.PARTSUPP stores daily(date) information for the available quantity(qty)andcost(cost)ofparts(pkey)persupplier(suppkey).weassumethatthe firstsupplieriseuropeanandthesecondisamerican,thusthedatacomingfromthesecond sourceneedtobeconvertedtoeuropeanvaluesandformats. Fig.3.4Thediagramoftheconceptualmodelforourmotivatingexample InFig.3.4wedepictthefullfledgeddiagramofourmotivatingexample.Intherestofthis section,wewillexplaineachpartofit. p.9of25

10 3.1 ConceptsandAttributes Attributes.Agranularmoduleofinformation.Theroleofattributesisthesameasinthe standarder/dimensionalmodels.asinstandardermodeling,attributesaredepictedwith ovalshapes. Concepts.Aconceptrepresentsanentityinthesourcedatabasesorinthedatawarehouse. Conceptinstancesarethefilesinthesourcedatabases,thedatawarehousefactanddimension tablesandsoon.aconceptisformallydefinedbyanameandafinitesetofattributes.in termsoftheermodel,aconceptisageneralizationofentitiesandrelationships;depending ontheemployedmodel(dimensionalmodelorerextension)allentitiescomposedofasetof attributesaregenerallyinstancesofclassconcept. Asmentionedin[VQVJ01],wecantreatseveralphysicalstoragestructuresasfinitelistsof fields,includingrelationaldatabases,cobolorsimpleasciifiles,multidimensionalcubes and dimensions. Concepts are fully capable of modeling this kind of structures, possibly throughageneralization(isa)mechanism.letustakeolapstructuresasanexample.we shouldnotethattheinterdependenciesoflevelsandvalues,whicharethecoreofallthe approachesmentionedinsection2,arenotrelevantforthecaseofetl;thus,employing simply concepts is sufficient for the problem of ETL modeling. Still, we can refine the generic Concept structure to subclasses pertaining to the characteristics of any of the aforementioned approaches (e.g., classes Fact Table and Dimension), achieving thusa homogeneouswaytotreatolapandetlmodeling. Inourmotivatingexampleonecanobserveseveralconcepts.TheconceptsS1.PARTSUPP, S2.PARTSUPP and DW.PARTSUPP are depicted in Fig. 3.4., along with their respective attributes. 3.2 Transformations,ConstraintsandNotes Transformations.Inourframework,transformationsareabstractionsthatrepresentparts,or fullmodulesofcode,executingasingletask.twolargecategoriesoftransformationsinclude (a)filteringordatacleaningoperations,likethecheckforprimaryorforeignkeyviolations and (b) transformation operations, during which the schema of the incoming data is transformed(e.g.,aggregation).formally,atransformationisdefinedby(a)afinitesetof p.10of25

11 input attributes; (b) a finite set of output attributes and (c) a symbol that graphically characterizesthenatureofthetransformation.atransformationisgraphicallydepictedasa hexagontaggedwithitscorrespondingsymbol. InourmotivatingexampleofFig.3.4,onecanobserveseveraltransformations.Notethe onespertinenttothemappingofs1.partsupptodw.partsupp.onecanobserveasurrogate keyassignmenttransformation(sk),afunctionapplicationcalculatingthesystemdate(f)and a Not Null(NN) check for attributecost. We will elaborate more on the functionality of transformationsonceproviderrelationships(thatactuallyemploythem)areintroduced. ETLConstraints.Thereareseveraloccasionswhenthedesignerwantstoexpressthefact thatthedataofacertainconceptfulfillseveralrequirements.forexample,thedesignermight wishtoimposeaprimarykeyornullvalueconstraintovera(setof)attribute(s).thisis achievedthroughtheapplicationofetlconstraints,whichareformallydefinedasfollows: (a) a finite set of attributes, over which the constraint is imposed and (b) a single transformation,whichimplementstheenforcementoftheconstraint.note,thatdespitethe similarityinthename,etlconstraintsaredifferentmodelingelementsfromthewellknown UMLconstraints.AnETLconstraintisgraphicallydepictedasasetofsolidedgesstarting fromtheinvolvedattributesandtargetingthefacilitatortransformation.inourmotivating example, observe that we apply a Primary Key ETL constraint to DW.PARTSUPP for the attributespkey,suppkey,date. Notes.ExactlyasinUMLmodeling,notesareinformaltagstocaptureextracommentsthat thedesignerwishestomakeduringthedesignphaseorrenderumlconstraintsattachedtoan elementorsetofelements[4].asinuml,notesaredepictedasrectangleswithadog-eared corner.inourframework,notesareusedfor: Simplecommentsexplainingdesigndecisions. Explanationsofthesemanticsoftheappliedtransformations.Forexample,inthecaseof relationalselections/joinsthisinvolvesthespecificationoftherespectiveselection/join condition,whereasinthecaseoffunctionsthiswouldinvolvethespecificationofthe functionsignatures. p.11of25

12 TracingofruntimeconstraintsthatrangeoverdifferentaspectsoftheETLprocess,such asthetime/eventbasedscheduling,monitoring,logging,errorhandling,crashrecovery etc. Forexample,intheupperpartofFig.5wecanobservearuntimeconstraintspecifyingthat theoverallexecutiontimefortheloadingofdw.partsupp(thatinvolvestheloadingof S1.PARTSUPPandS2.PARTSUPP)cannottakelongerthan4hours. 3.3 Part-OfandCandidateRelationships Part-ofRelationships.Webringuppart-ofrelationshipsinordertoemphasizethefactthata conceptiscomposedofasetofattributes.ingeneral,standardermodelingdoesnottreat thiskindofrelationshipasafirst-classcitizenofthemodel;umlmodelingontheother hand, hides attributes inside classes and treats part-of relationships with a much broader meaning.wedonotwishtoredefineumlpart-ofrelationships,butrathertoemphasizethe relationshipofaconceptwithitsattributes(sinceweneedattributesasfirstclasscitizensin the inter-attribute mappings). Naturally, we do not preclude the usage of the part-of relationship for other purposes, as in standard UML modeling. As usually, a part-of relationshipisdenotedbyanedgewithasmalldiamondatthesideofthecontainerobject. Candidate relationships. In the case of data warehousing, it is most common a phenomenon,especiallyintheearlystagesoftheproject,tohavemorethanonecandidate sourcefiles/tablesthatcouldpopulateatarget,datawarehousetable.thus,asetofcandidate relationshipscapturesthefactthatacertaindatawarehouseconceptcanbepopulatedbymore thanonecandidatesourceconcepts.formally,acandidaterelationshipcomprises(a)asingle candidateconceptand(b)asingletargetconcept.candidaterelationshipsaredepictedwith bolddottedlinesbetweenthecandidatesandthetargetconcepts.wheneverexactlyoneof themcanbeselected,weannotatethesetofcandidaterelationshipsforthesameconceptwith auml{xor}constraint. Activecandidaterelationships.Anactivecandidaterelationshipdenotesthefactthatoutof asetofcandidates,acertainonehasbeenselectedforthepopulationofthetargetconcept. Thus,anactivecandidaterelationshipisaspecializationofcandidaterelationships,withthe p.12of25

13 same structure and refined semantics. We denote an active candidate relationship with a directedbolddottedarrowfromtheprovidertowardsthetargetconcept. Forthepurposeofourmotivatingexample,letusassumethatsourceS 2 hasmorethanone productionsystems(e.g.,cobolfiles),whicharecandidatesfors2.partsupp.assumethat theavailablecandidates(depictedintheleftupperpartoffig.3.4)are: Aconcept AnnualPartSupp s(practicallyrepresentingafilef1),thatcontainsthefull annualhistoryaboutpartsuppliers;itisusedbasicallyforreportingpurposesandcontains asupersetoffieldsthantheonesrequiredforthepurposeofthedatawarehouse. AconceptRecentPartSupp s(practicallyrepresentingafilef2),containingonlythedata ofthelastmonth;itusedon-linebyend-usersfortheinsertionorupdateofdataaswellas for some reporting applications. The diagram also shows that RecentPartSupp swas eventuallyselectedastheactivecandidate;anotecapturesthedetailsofthisdesignchoice. 3.4 ProviderRelationshipsandSerialCompositionofTransformations Providerrelationships.Aproviderrelationshipmapsasetofinputattributestoasetofoutput attributesthrougharelevanttransformation.inthesimple1:1case,providerrelationships capturethefactthataninputattributeinthesourcesidepopulatesanoutputattributeinthe data warehouse side. If the attributes are semantically and physically compatible, no transformationisrequired.ifthisisnotthecasethough,wepassthismappingthroughthe appropriatetransformation(e.g.,europeantoamericandataformat,notnullcheck,etc.). Ingeneral,itispossiblethatsomeformofschemarestructuringtakesplace;thus,the formaldefinitionofproviderrelationshipscomprises(a)afinitesetofinputattributes;(b)a finitesetofoutputattributes;(c)anappropriatetransformation(i.e.,onewhoseinputand outputattributescanbemappedonetoonetotherespectiveattributesoftherelationship).in the case of N:M relationships the graphical representation obscures the linkage between providerandtargetattributes.tocompensateforthisshortcoming,weannotatethelinkofa provider relationship with each of the involved attributes with a tag, so that there is no disambiguityfortheactualproviderofatargetattribute. Inthe1:1case,aproviderrelationshipisdepictedwithasolidbolddirectedarrowfrom theinputtowardstheoutputattribute,taggedwiththeparticipatingtransformation.inthe general N:Mcase, a provider relationshipis graphically depicted as asetofsolid arrows p.13of25

14 startingfromtheprovidersandtargetingtheconsumers,allpassingthroughthefacilitator transformation. Finally,weshouldalsomentionasyntacticsugaradd-onofourmodel.Itcanbethecase wherea certain providerrelationship involves allthe attributes of a set ofconcepts. For example,inthecaseofarelationalunionoperation,alltheattributesoftheinputandthe outputconceptswouldparticipateinthetransformation.toavoidoverloadingthediagram withtoomanyrelationships,weemployasyntacticsugarnotationmappingtheinputtothe outputconcepts(insteadofattributes).thiscanalsobetreatedasazoomin/outoperationon thediagramperse:atthecoarselevel,onlyconceptsaredepictedandanoverviewofthe model is given; at the detailed level, the inter-concept relationships are expanded to the respectiveinter-attributerelationships,presentingthedesignerwithalltheavailabledetail. Letusexaminenow,therelationshipbetweentheattributesofconcepts S1.PARTSUPP, S2.PARTSUPP and DW.PARTSUPP, as depictedin Fig Forstarters, we will ignorethe aggregation that takes place for the rows of source S2 and focus on the elementary transformations. AttributePKeyisdirectlypopulatedfromitshomonymousattributeinS1andS2,througha surrogatekey(sk)transformation.surrogatekeyassignmentiscommontacticsindata warehousing, employed in order to replace the keys of the production systems with a uniform key 1. For example, it could be the case that the part Steering Wheel has PKEY=30 for sources1, PKEY=40 for sources2, while at the same time sources2 has PKEY=30forpart Automobile Door.Theseconflictscanbeeasilyresolvedbyaglobal replacementmechanismthroughtheassignmentofauniformsurrogatekey. AttributeSuppKeyispopulatedfromthehomonymousattributesinthesources. Attribute Date is directly populated from its homonymous attribute in S2, through an American-To-EuropeanDatetransformationfunction.Atthesametime,thedateforthe rows coming from source S1, is determined through the application of a Sysdate() function (since S1.PARTSUPP does not have a corresponding attribute). Observe the 1 Ingeneral,thebasicreasonsforasurrogatekeytransformationarebothperformanceandsemantichomogeneity. Textualattributesarenotthebestcandidatesforindexedkeysandthusneedtobereplacedbyintegerkeys.At thesametime,differentproductionsystemsmightusedifferentkeysforthesameobject,orthesamekeyfor differentobjects,resultingintheneedforaglobalreplacementofthesevaluesinthedatawarehouse. p.14of25

15 functionappliedfortherowscomingfromsources1:itusesasinputalltheattributesof S1.PARTSUPP(inordertodeterminethattheproducedvalueisanewattributeoftherow), passesthroughafunctionapplicationtransformationcalculatingthesystemdate,which,in turns,isdirectedtoattributedw.date. Attribute Qtyisdirectlypopulatedfromitshomonymousattributesinthetwosources, withouttheneedforanytransformation. AttributeCostispopulatedfromitshomonymousattributesinthetwosources.Asfaras source S2 is concerned, we need a $2 transformation in order to convert the cost to Europeanvalues.AsfarassourceS1isconcerned,weapplyaNotNull(NN)check,to avoidloadingthedatawarehousewithrowshavingnocostfortheirparts. Notealsothattherecanbeinputattributes,likeforexampleS2.PARTSUPP.Department, whichareignoredduringtheetlprocess. Transformation Serial Composition. It is possible that we need to combine several transformationsinasingleproviderrelationship.forexample,wewouldpossiblyliketo groupincomingdatawithrespecttoasetofattributes,havingensuredatthesametimethat nonullvaluesareinvolvedinthisoperation.inthiscase,wewouldneedtoperformanotnull checkforeachoftheattributesandpropagateonlythecorrectrowstotheaggregation.in ordertomodelthissetting,weemployaserialcompositionoftheinvolvedtransformations. Theproblemwouldbetherequirementthatatransformationhasasetofattributesasinputs and attributes; thus, simply connecting two transformations would be inconsistent. To compensatethisshortcoming,weemployaserialtransformationcomposition.formally,a serialtransformationcompositioncomprises(a)asingleinitiatingtransformationand(b)a singlesubsequenttransformation.serialtransformationcompositionsaredepictedassolid boldlinesconnectingthetwoinvolvedtransformations. Arathercomplexpartofthemotivatingexampleistheaggregationthattakesplaceforthe rowsofsources2.practically,sources2capturesinformationforpartsuppliersaccordingto theparticulardepartmentofthesupplierorganization.loadingthisdatatothedatawarehouse thatignoresthiskindofdetailrequirestheaggregationofdataperpkey,suppkey,dateand thesummationofcostandqty.thisisperformedfromtheaggregationtransformationγ. Still,alltheaforementionedelementarytransformationsarenotignoredorbanished;onthe p.15of25

16 contrary, they are linked to the aggregation transformation through the appropriate serial compositionrelationships.notealsothetagsontheoutputoftheaggregationtransformation, determining their providers (e.g., S2.PARTSUPP.PKey for DW.PARTSUPP.PKey and SUM[S2.PARTSUPP.Qty]forDW.PARTSUPP.Qty). 4. Methodologyfortheusageoftheconceptualmodel Inthissection,wewillgiveanoverviewoftheassumedsequenceofstepsthatadesigner follows,duringtheconstructionofthedatawarehouse.eachstepofthismethodologywillbe presentedintermsofourmotivatingexample.asalreadymentioned,theultimategoalofthis designprocessistheproductionofinter-attributemappings,alongwithanyrelevantauxiliary information. Step1:Identificationoftheproperdatastores.Thefirstthingthatadesignerfacesduring therequirementsandanalysisperiodofthedatawarehouseprocessistheidentificationof relativedatasources.assumethatforaparticularsubsetofthedatawarehouse,wehave identified the concept DW.PARTSUPPwhichisafacttableofhowpartswheredistributed according to their respective suppliers. Assume also that we have decided that for completenessreasons,weneedtopopulatethetargettablewithdatafrombothsourcess 1 and S 2, i.e., we need the union of the data of these two sources. Then, the diagram for the identifieddatasourcesandtheirinterrelationshipsisdepictedinfig.4.1a. S1.PartSupp Duetoacccuracy andsmallsize (<updatewindow) S1.PartSupp S2.PartSupp U Necessaryproviders: S1andS2 DW.PartSupp Annual PartSupp s Recent PartSupp s {XOR} S2.PartSupp U Necessaryproviders: S1andS2 DW.PartSupp (a) (b) Fig.4.1(a)Identificationoftheproperdatastores;(b)Candidatesandactivecandidates forthemotivatingexample p.16of25

17 Step 2: Candidates and active candidates for the involved data stores. As already mentionedbefore,duringtherequirementsandanalysisstage,thedesignercanpossiblyfind outthatmorethanonedatastorescouldbecandidatestopopulateacertainconcept.theway thedecisionistakenisoutsidethescopeofthispaper(althoughourpersonalexperience suggeststhat,atleastthesize,thedataqualityandtheavailabilityofthesourcesplayamajor roleinthiskindofdecision).forthepurposeofourmotivatingexample,letusassumethat sources 2 hasmorethanoneproductionsystems(e.g.,cobolfiles),whicharecandidatesfor concepts2.partsupp.assumethattheavailablesourcesare: Aconcept AnnualPartSupp s(practicallyrepresentingafilef1),thatcontainsthefull annualhistoryaboutpartsuppliers;itisusedbasicallyforreportingpurposesandcontains asupersetoffieldsthantheonesrequiredforthepurposeofthedatawarehouse. AconceptRecentPartSupp s(practicallyrepresentingafilef2),containingonlythedata ofthelastmonth;itusedon-linebyend-usersfortheinsertionorupdateofdataaswellas forsomereportingapplications. Thecandidateconceptsforconcept S2.PartSupp andthe (eventuallyselected)active candidate RecentPartSupp s, along with the tracing of the decision for the choice, are depictedinfig.4.1b. Clearly,onceadecisionontheactivecandidatehasbeenreached,asimplified working copy ofthescenariothateliminatesallothercandidatescanalsobeproduced.asshownin Fig.4.2,therearetwowaystoachievethis(a)byignoringalltheinformationoncandidates forthetargetconceptand(b)byreplacingthetargetwiththeactivecandidate. activecanditate activecanditate candidate 1 target target... candidate n {XOR} candidate 1 active target candidate... {XOR} candidate n (a) (b) Fig.4.2Workingcopytransformation(a)ignoringcandidates;(b)replacetargetswith activecandidates. p.17of25

18 Step3:Attributemappingbetweentheprovidersandtheconsumers.Themostdifficult task of thedata warehouse designer isto determine the mapping of theattributes of the sourcestotheonesofthedatawarehouse.thistaskinvolvesseveraldiscussionswiththe sourceadministratorstoexplainthecodesandimplicitrulesorvalues,whicharehiddenin thedataandthesourceprograms.moreover,itinvolvesquiteafew datapreview attempts (intheformofsampling,orsimplecountingqueries)todiscoverthepossibleproblemsofthe provideddata.naturally,thisprocessisinteractiveanderror-prone.thesupportthatatool canprovidethedesignerwith,liesmainlyintheareaofinter-attributemapping,withfocuson thetracingoftransformationandcleaningoperationsforthismapping. Foreachtargetattribute,asetofproviderrelationshipsmustbedefined.Insimplecases, theproviderrelationshipsaredefineddirectlyamongsourceandtargetattributes.inthecases where a transformation is required, we pass the mapping through the appropriate transformation.inthecaseswheremorethanonetransformationsarerequired,wecaninserta sequenceoftransformationsbetweentheinvolvedattributes,alllinkedthroughcomposition relationships.etlconstraintsarealsospecifiedinthispartofthedesignprocess.anexample ofinter-attributerelationshipfortheconceptss1.partsupp,s2.partsuppanddw.partsupp, isdepictedinfig.4.3. S2.PARTSUPP DW.PARTSUPP PK S1.PARTSUPP PKey SK PKey SK PKey SuppKey Qty γ S 2.PKey S 2.S u ppk ey SuppKey Date f SuppKey Date f SUM S (S UM( 2.Q ty) S 2.C o st) Qty Qty Department Cost NN Cost Cost f American to $2 European Date Date = SysDate() Fig.4.3AttributemappingforthepopulationofrecordsetDW.PartSupp p.18of25

19 Step4:Annotatingthediagramwithruntimeconstraints.Apartforthejobdefinitionforan ETL scenario, which specifies how the mapping from sources to the data warehouse is performed,alongwiththeappropriatetransformations,thereareseveralotherparametersthat possiblyneedtobespecifiedfortheruntimeenvironment.thiskindofruntimeconstraints include: - Time/Eventbasedscheduling.Thedesignerneedstodeterminethefrequencyofthe ETL process, so that data are fresh and the overall process fits within the refreshmenttimewindow. - Monitoring. On-line information about the progress/status of the process is necessary,sothattheadministratorcanbeawareofwhatsteptheloadison,itsstart time,duration,etc.filedumps,notificationmessagesontheconsole, ,printed pages,orvisualdemonstrationcanbeemployedforthispurpose. - Logging.Off-lineinformation,presentedattheendoftheexecutionofthescenario, on the outcome of the overall process. All the relevant information (e.g., the previoustracingstuff)isshown,byemployingtheaforementionedtechniques. - Exceptionhandling.Therow-wisetreatmentofdatabase/businessrulesviolationis necessaryfortheproperfunctionoftheetlprocess.informationlikewhichrows are problematic, or how many rows are acceptable (acceptance rate, that is), characterizesthisaspectoftheetlprocess. - Error handling. Crash recovery and stop/start capabilities (e.g., committing a transaction every few rows) are absolutely necessary for the robustness and the efficiencyoftheprocess. To capture all this information, we adopt notes, tagged to the appropriate concepts, transformationsorrelationships.infig.4.4wedepictthediagramoffig.4.2b,annotated witha runtime constraintspecifying thatthe overall executiontime for the loading of DW.PARTSUPP(thatinvolvestheloadingofS1.PARTSUPPandS2.PARTSUPP)cannottake longerthan4hours. p.19of25

20 Duetoacccuracy andsmallsize (<updatewindow) S1.PartSupp {Duration<4h} Annual PartSupp s U DW.PartSupp S2.PartSupp Recent PartSupp s {XOR} Necessaryproviders: S1andS2 Fig.4.4AnnotationofFig.4.2(b)withconstraintsontheruntimeexecutionoftheETL scenario 5.InstantiationandSpecializationLayers WebelievethatthekeyissueintheconceptualrepresentationofETLactivitieslies(a)onthe identificationofasmallsetofgenericconstructsthatarepowerfulenoughtocaptureallcases and(b)onanextensibilitymechanismthatallowstheconstructionofa palette offrequently usedtypes(e.g.,fordatastoresandactivities). Fig.5.1TheframeworkforthemodelingofETLactivities OurmetamodelingframeworkisdepictedinFig.5.1.ThelowerlayerofFig.5.1,namely SchemaLayer,involvesaspecificETLscenario.AlltheentitiesoftheSchemalayerare instances of the classes Concept, Attribute, Transformation, ETL Constraint and Relationship.Thus,asonecanseeontheupperpartofFig.5.1,weintroduceameta-class layer,namelymetamodellayerinvolvingtheaforementionedclasses.thelinkagebetween the Metamodel and the Schema layers is achieved through instantiation ( instanceof ) relationships.themetamodellayerimplementstheaforementionedgenericitydesideratum: p.20of25

21 thefiveclasseswhichareinvolvedinthemetamodellayeraregenericenoughtomodelany ETLscenario,throughtheappropriateinstantiation. Still,wecandobetterthanthesimpleprovisionofameta-andaninstancelayer.Inorder tomakeourmetamodeltrulyusefulforpracticalcasesofetlprocesses,weenrichitwitha set of ETL-specific constructs, which constitute a subset of the larger metamodel layer, namelythetemplatelayer.theconstructsinthetemplatelayerarealsometa-classes,but theyarequitecustomizedfortheregularcasesofetlprocesses.thus,theclassesofthe Templatelayerasspecializations(i.e.,subclasses)ofthegenericclassesoftheMetamodel layer(depictedas IsA relationshipsinfig.5.1).throughthiscustomizationmechanism,the designercanpicktheinstancesoftheschemalayerfromamuchricherpaletteofconstructs; inthissetting,theentitiesoftheschemalayerareinstantiations,notonlyoftherespective classesofthemetamodellayer,butalsooftheirsubclassesinthetemplatelayer. IntheexampleofFig.5.1theconceptDW.PARTSUPPmustbepopulatedfromacertain sources 2.Severaloperationsmustinterveneduringthepropagation:forexample,asurrogate key assignment and an aggregation take place in the scenario. Moreover, there are two candidatessuitablefortheconcepts 2.PARTSUPP;outofthem,exactlyone(Candidate 2 )is eventually selected for the task. As one can observe, the concepts that take part in this scenarioareinstancesofclassconcept(belongingtothemetamodellayer)andspecificallyof its subclass ER Entity (assuming that we adopt an ER model extension). Instances and encompassingclassesarerelatedthroughlinksoftypeinstanceof.thesamemechanism applies to all the transformations of the scenario, which are (a) instances of class Transformationand(b)instancesofoneofitssubclasses,depictedinFig.5.1.Relationships donotescapetheruleeither:observehowtheproviderlinksfromtheconcepts2.partsupp towardstheconceptdw.partsupparerelatedtoclassprovider Relationshipthroughthe appropriateinstanceoflinks. AsfarastheclassConceptisconcerned,intheTemplatelayerwecanspecializeitto severalsubclasses,dependingontheemployedmodel.inthecaseoftheermodel,wehave thesubclasseser Entity,andER Relationship,whereasinthecaseoftheDimensional Model,wecanhavesubclassesasFact TableorDimension. p.21of25

22 Filters - Selection (σ) - Not null (NN) - Primary key violation (PK) - Foreign key violation (FK) - Unique value (UN) - Domain mismatch (DM) Unarytransformations - Push - Aggregation (γ) - Projection (π) - Function application (f) - Surrogate key assignment (SK) - Tuple normalization (N) - Tuple denormalization (DN) Binarytransformations - Union (U) - Join (><) - Diff ( ) - Update Detection ( UPD) Compositetransformations - Slowly changing dimension (Type 1,2,3) (SDC-1/2/3) - Format mismatch (FM) - Data type conversion (DTC) - Switch (σ*) - Extended union (U) Fileoperations - EBCDIC to ASCII conversion (EB2AS) - Sort file (Sort) Transferoperations - Ftp (FTP) - Compress/Decompress (Z/dZ) - Encrypt/Decrypt (Cr/dCr) Fig.5.2Templatetransformations,alongwiththeirsymbols,groupedbycategory Following the same framework, class Transformation is further specialized to an extensiblesetofreoccurringpatternsofetlactivities,depictedinfig.5.2.wenowpresent eachoftheaforementionedclassesinmoredetail.asonecanseeonthetopsideoffig.5.2, wegroupthetemplateactivitiesinsixmajorlogicalgroups.wedonotdepictthegroupingof activitiesinsubclassesinfig.5.1,inordertoavoidoverloadingthefigure;instead,wedepict thespecializationofclasstransformationtothreeofitssubclasseswhoseinstancesappear intheemployedscenariooftheschemalayer. Thefirstgroup,namedFilters,provideschecksfortherespectofacertaincondition.The semanticsofthesefiltersaretheobvious(startingfromagenericselectionconditionand proceedingtothecheckfornullvalues,primaryorforeignkey violation,etc.).the secondgroupoftemplateactivitiesiscalledunarytransformationsandexceptforthemost generic pushactivity(which simply propagates data from the provider tothe consumer), consists ofthe classical aggregation and function application operations along with threedatawarehousespecifictransformations(surrogate keyassignment,normalization and denormalization) 2.ThethirdgroupconsistsofclassicalBinaryOperations,suchas 2 Normalization/Denormalization. It is common for source data to be long denormalized records, possibly involvingmorethanahundredattributes.thisisduetothefactthatbaddatabasedesignorclassicalcobol tacticsledtothegatheringofallthenecessaryinformationforanapplicationinasingletable/file.althougha datawarehouseisalsocharacterizedbyacarefullyselectedsetofdenormalizedstructures,sometimesitis imperative to normalize somehow the input data. Consider, for example, a table of the form R(KEY,TAX,DISCOUNT,PRICE), which we would like to transform to a table of the form R (KEY,CODE,AMOUNT). For example, the input tuple t[key,30,60,70] is transformed into the tuples t1[key,1,30]; t2[key,2,60]; t3[key,3,70]. The operator NORMALIZATION performs this kind of transformation.denormalization,atthesametimeistheinverseoperation,particularlyusefulinthecase ofhighlynormalizedproductionsystemsthatactassources.inthissituation,wewouldliketodenormalizethe sourcedata(e.g.,considertheinverseoftheaforementionedexample)inordertoachievebetterperformancein theansweringofaggregatequeriesinthedatawarehouse. p.22of25

23 union, join and difference ofconceptsaswellaswithaspecialcaseofdifference involvingthedetection of updates. Moreover, there is a set of Composite/Special-Purpose transformations, which can be derivedfromsimplecombinationsoftemplateactivities.forexample,slowly changing dimensionisatechniqueforextractingthechangesinthesourcesthatpopulatedimension tables.also,formatmismatchisaspecialcaseofselectionandfunctionapplication(inthe casewhenanattributeshouldfulfillacertainregularexpression),datatypeconversionis alsoaspecialcaseoffunctionapplication,aswitchisthecombinationofseveralselections andextended unionisacombinationofbinaryunionstoproducetherespectiven-ary operator. Except for the aforementioned template activities, which mainly refer to logical transformations, we can also consider the case of physical operators that refer to the applicationofphysicaltransformationstowholefiles/tables.mainly,wediscussinter-concept physical operations like Transfer Operations (ftp, compress/decompress, encrypt/decrypt)andfileoperations(ebcdic to ASCII,sort file). Summarizing,theMetamodellayerisasetofgenericentities,abletorepresentanyETL scenario.atthesametime,thegenericityofthemetamodellayeriscomplementedwiththe extensibilityofthetemplatelayer,whichisasetof built-in specializationsoftheentitiesof thetemplatelayer,specificallytailoredforthemostfrequentelementsofetlscenarios. Moreover,apartfromthis built-in,etl-specificextensionofthegenericmetamodel,ifthe designerdecidesthatseveral patterns,notincludedinthepaletteofthetemplatelayer, occur repeatedly in his data warehousing projects, he/she can easily fit them into the customizabletemplatelayerthroughaspecializationmechanism. 6.Conclusions Extraction-Transformation-Loading(ETL)toolsarepiecesofsoftwareresponsibleforthe extractionofdatafromseveralsources,theircleansing,customizationandinsertionintoa data warehouse. In this paper, wehave focused onthe problem ofthe definition of ETL activitiesandprovidedfoundationsfortheirconceptualrepresentation.morespecifically,we p.23of25

24 haveproposedanovelconceptualmodel,whichiscustomizedforthetracingofinter-attribute relationshipsandtherespectiveetlactivitiesintheearlystagesofadatawarehouseproject. The proposed model is constructedin a customizable andextensible manner,so thatthe designercanenrichitwithhisownre-occurringpatternsforetlactivities,while,atthesame time,wealsooffera'palette'ofasetoffrequentlyusedetlactivities,liketheassignmentof surrogatekeys,thecheckfornullvalues,etc. As far as future work is concerned, the first objective is the linkage of the proposed conceptualmodeltoitslogicalandphysicalcounterparts,withparticularfocus(a)onthe relationship of the ETL activities to the underlying data stores; (b) the capturing of the compositeworkflowoftheetlscenarioand(c)theoptimizationofitsexecution[vass02]. Acknowledgments This research has been partially funded by the European Union's Information Society TechnologiesProgramme(IST)underprojectEDITH(IST ). References [Arde01] ArdentSoftware.DataStageSuite.Availableathttp:// [BaFM99] M.Bouzeghoub,F.Fabret,M.Matulovic.ModelingDataWarehouseRefreshmentProcessasa WorkflowApplication.InProc.Intl.WorkshoponDesignandManagementofDataWarehouses (DMDW 99),Heidelberg,Germany,(1999). [BoDS00] V. Borkar, K. Deshmuk, S. Sarawagi. Automatically Extracting Structure from Free Text Addresses.BulletinoftheTechnicalCommitteeonDataEngineering,23(4),(2000). [BoJR98] G. Booch, I. Jacobson, J. Rumbaugh. The Unified Modeling Language User Guide. Addison- WesleyPubCo;ISBN: ;1stedition(October30,1998). [CDL+98] D.Calvanese,G.DeGiacomo,M.Lenzerini,D.Nardi,andR.Rosati.Informationintegration: Conceptualmodelingandreasoningsupport.InProc.ProceedingsoftheInternationalConference oncooperativeinformationsystems(coopis),newyork,usa,pp ,(august1998) [CDL+99] D.Calvanese,G.DeGiacomo,M.Lenzerini,D.Nardi,R.Rosati.Aprincipledapproachtodata integration and reconciliation in data warehousing. In Proc. Intl. Workshop on Design and ManagementofDataWarehouses(DMDW 99),Heidelberg,Germany,(1999). [Data01] DataMirrorCorporation.TransformationServer.Availableathttp:// [Dema97] M. Demarest. The politics of data warehousing. Available at [ETI01] EvolutionaryTechnologiesIntl.ETI*EXTRACT.Availableathttp:// [GFSS00] H.Galhardas,D.Florescu,D.ShashaandE.Simon.Ajax:AnExtensibleDataCleaningTool.In Proc.ACMSIGMODIntl.Conf.OntheManagementofData,pp.590,Dallas,Texas,(2000). [GoMR98] M.Golfarelli,D.Maio,S.Rizzi.TheDimensionalFactModel:aConceptualModelforData Warehouses.InvitedPaper,InternationalJournalofCooperativeInformationSystems,vol.7,n. 2&3,1998. [GoRi98] M.Golfarelli,S.Rizzi:MethodologicalFrameworkforDataWarehouseDesign.InACMFirst InternationalWorkshoponDataWarehousingandOLAP(DOLAP 98),pp.3-9,November1998, Bethesda,Maryland,USA. [HuLV00] B.Husemann,J.Lechtenborger,G.Vossen.Conceptualdatawarehousemodeling.InProc.2 nd Intl. WorkshoponDesignandManagementofDataWarehouses(DMDW),pp ,Stockholm, Sweden(2000). [Inmo97] B. Inmon. The Data Warehouse Budget. DM Review Magazine, January Available at p.24of25

25 [JeQJ98] [JJQV99] [JLVV00] [Kimb97] [KRRT98] [LWGG00] [Micr01] [MoKo00] [Mong00] [NgTW00] [Orac01] [RaHa00] [RaHe01] [SBHD98] [ShTy98] [TrBC99] [TrPG00] [Tsoi01] [Vass00] [VaSS02] [VQVJ01] [VVS+01] M.A. Jeusfeld, C. Quix, M. Jarke: Design and Analysis of Quality Information for Data Warehouses. In Proc. of the 17 th Intl. Conf. On Conceptual Modeling (ER 98), pp , Singapore(1998). M.Jarke,M.A.Jeusfeld,C.Quix,P.Vassiliadis:Architectureandqualityindatawarehouses:An extendedrepositoryapproach.informationsystems,24(3): (1999).apreviousversion appearedinproc.10 th Conf.ofAdvancedInformationSystemsEngineering(CAiSE 98),Pisa, Italy(1998). M.Jarke,M.Lenzerini,Y.Vassiliou,P.Vassiliadis(eds.).FundamentalsofDataWarehouses. Springer,(2000). R.Kimball.ADimensionalModelingManifesto.DBMSMagazine.August1997. R.Kimbal,L.Reeves,M.Ross,W.Thornthwaite.TheDataWarehouseLifecycleToolkit:Expert Methods for Designing, Developing, and Deploying Data Warehouses. John Wiley & Sons, February1998. W. Labio, J.L. Wiener, H. Garcia-Molina, V. Gorelik. Efficient Resumption of Interrupted Warehouse Loads. In Proceedings of the 2000 ACM SIGMOD International Conference on ManagementofData(SIGMOD2000),pp.46-57,Dallas,Texas,USA(2000). MicrosoftCorp.MSDataTransformationServices.Availableatwww.microsoft.com/sq D.L.Moody,M.A.R.Kortink:Fromenterprisemodelstodimensionalmodels:amethodologyfor datawarehouseanddatamartdesign.proceedingsofthesecondintl.workshopondesignand ManagementofDataWarehouses,DMDW2000,Stockholm,Sweden,June5-6,2000.Availableat A.Monge.MatchingAlgorithmsWithinaDuplicateDetectionSystem.BulletinoftheTechnical CommitteeonDataEngineering,23(4),(2000). T.B.Nguyen,AMinTjoa,R.R.Wagner.AnObjectOrientedMultidimensionalDataModelfor OLAP.Proc.oftheFirstInternationalConferenceonWeb-AgeInformationManagement(WAIM- 00),Shanghai,China,June2000. OracleCorp.Oracle9i WarehouseBuilderUser sguide,release9.0.2.november2001. E.Rahm,H.Do.DataCleaning:ProblemsandCurrentApproaches.BulletinoftheTechnical CommitteeonDataEngineering,23(4),(2000). V.Raman,J.Hellerstein.Potter'sWheel:AnInteractiveDataCleaningSystem.InProceedingsof 27th International Conference on Very Large Data Bases (VLDB), pp , Roma, Italy, (2001). C.Sapia,M.Blaschka,G.Höfling,B.Dinter:ExtendingtheE/RModelfortheMultidimensional Paradigm.InERWorkshops1998,pp LectureNotesinComputerScience1552Springer 1999 C.Shilakes,J.Tylman.EnterpriseInformationPortals.EnterpriseSoftwareTeam.Availableat N.Tryfona,F.Busborg,J.G.B.Christiansen.starER:AConceptualModelforDataWarehouse Design.InACMSecondInternationalWorkshoponDataWarehousingandOLAP(DOLAP 99), pp.3-8,november1999,kansascity,missouri,usa. J.C.Trujillo,M.Palomar,J.Gómez:ApplyingObject-OrientedConceptualModelingTechniques to the Design of Multidimensional Databases and OLAP Applications. Proc. of the First InternationalConferenceonWeb-AgeInformationManagement(WAIM-00)pp.83-94,Shanghai, China,June2000. A.Tsois.MAC:ConceptualdatamodelingforOLAP.InProc.3rdIntl.WorkshoponDesignand ManagementofDataWarehouses(DMDW),pp ,Interlaken,Switzerland(2001). P.Vassiliadis.Gulliverinthelandofdatawarehousing:practicalexperiencesandobservationsofa researcher.inproc.2 nd Intl.WorkshoponDesignandManagementofDataWarehouses(DMDW), pp ,Stockholm,Sweden(2000). P.Vassiliadis,A.Simitsis,S.Skiadopoulos.ModelingETLactivitiesasgraphs.InProc.ofDesign andmanagementofdatawarehouses(dmdw'2002)4thinternationalworkshopinconjunction withcaise 02,pp.52-61,Toronto,Canada,May27,2002. P.Vassiliadis,C.Quix,Y.Vassiliou,M.Jarke.DataWarehouseProcessManagement.Information Systems,vol.26,no.3,pp ,June2001. P.Vassiliadis,Z.Vagena,S.Skiadopoulos,N.Karayannidis,T.Sellis.ARKTOS:Towardsthe modeling,design,controlandexecutionofetlprocesses.informationsystems,26(8),pp ,december2001,elsevierscienceltd. p.25of25

Conceptual modeling for ETL

Conceptual modeling for ETL Conceptual modeling for ETL processes Author: Dhananjay Patil Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 08/26/04 Email: erg@evaltech.com Abstract: Due to the

More information

On the Logical Modeling of ETL Processes

On the Logical Modeling of ETL Processes On the Logical Modeling of ETL Processes Panos Vassiliadis, Alkis Simitsis, Spiros Skiadopoulos National Technical University of Athens {pvassil,asimi,spiros}@dbnet.ece.ntua.gr What are Extract-Transform-Load

More information

From conceptual design to performance optimization of ETL workflows: current state of research and open problems

From conceptual design to performance optimization of ETL workflows: current state of research and open problems The VLDB Journal (2017) 26:777 801 DOI 10.1007/s00778-017-0477-2 REGULAR PAPER From conceptual design to performance optimization of ETL workflows: current state of research and open problems Syed Muhammad

More information

Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates

Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates Alkis Simitsis 1, Panos Vassiliadis 2, Manolis Terrovitis 1, and Spiros Skiadopoulos 1 1 National Technical University

More information

Logical Optimization of ETL Workflows

Logical Optimization of ETL Workflows Logical Optimization of ETL Workflows 1001 11001 111001 1001 1100011001 100011 100011 011001 10010 100101 10010 0101001 1001 1001 1001001 101001 010101001 010101001 1001001 1001001 1001001 1001001 1001

More information

Modeling ETL Activities as Graphs

Modeling ETL Activities as Graphs Modeling ETL Activities as Graphs Panos Vassiliadis, Alkis Simitsis, Spiros Skiadopoulos National Technical University of Athens, Dept. of Electrical and Computer Eng., Computer Science Division, Iroon

More information

MODELING THE PHYSICAL DESIGN OF DATA WAREHOUSES FROM A UML SPECIFICATION

MODELING THE PHYSICAL DESIGN OF DATA WAREHOUSES FROM A UML SPECIFICATION MODELING THE PHYSICAL DESIGN OF DATA WAREHOUSES FROM A UML SPECIFICATION Sergio Luján-Mora, Juan Trujillo Department of Software and Computing Systems University of Alicante Alicante, Spain email: {slujan,jtrujillo}@dlsi.ua.es

More information

A Data Warehouse Engineering Process

A Data Warehouse Engineering Process A Data Warehouse Engineering Process Sergio Luján-Mora and Juan Trujillo D. of Software and Computing Systems, University of Alicante Carretera de San Vicente s/n, Alicante, Spain {slujan,jtrujillo}@dlsi.ua.es

More information

A Data warehouse within a Federated database architecture

A Data warehouse within a Federated database architecture Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1997 Proceedings Americas Conference on Information Systems (AMCIS) 8-15-1997 A Data warehouse within a Federated database architecture

More information

Managing Changes to Schema of Data Sources in a Data Warehouse

Managing Changes to Schema of Data Sources in a Data Warehouse Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Managing Changes to Schema of Data Sources in

More information

Blueprints and Measures for ETL Workflows

Blueprints and Measures for ETL Workflows Blueprints and Measures for ETL Workflows Panos Vassiliadis 1, Alkis Simitsis 2, Manolis Terrovitis 2, Spiros Skiadopoulos 2 1 University of Ioannina, Dept. of Computer Science, Ioannina, Hellas pvassil@cs.uoi.gr

More information

Proceed Requirements Meta-Model For Adequate Business Intelligence Using Workflow

Proceed Requirements Meta-Model For Adequate Business Intelligence Using Workflow International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 2320-9364, ISSN (Print): 2320-9356 Volume 1 Issue 5 ǁ Sep. 2013 ǁ PP.46-50 Proceed Requirements Meta-Model For Adequate

More information

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehousing and OLAP Technologies for Decision-Making Process Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Data Warehouse Design Using Row and Column Data Distribution

Data Warehouse Design Using Row and Column Data Distribution Int'l Conf. Information and Knowledge Engineering IKE'15 55 Data Warehouse Design Using Row and Column Data Distribution Behrooz Seyed-Abbassi and Vivekanand Madesi School of Computing, University of North

More information

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online):

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online): IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online): 2321-0613 Tanzeela Khanam 1 Pravin S.Metkewar 2 1 Student 2 Associate Professor 1,2 SICSR, affiliated

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered

More information

Updates through Views

Updates through Views 1 of 6 15 giu 2010 00:16 Encyclopedia of Database Systems Springer Science+Business Media, LLC 2009 10.1007/978-0-387-39940-9_847 LING LIU and M. TAMER ÖZSU Updates through Views Yannis Velegrakis 1 (1)

More information

ETL Work Flow for Extract Transform Loading

ETL Work Flow for Extract Transform Loading Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 6, June 2014, pg.610

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

A MAS Based ETL Approach for Complex Data

A MAS Based ETL Approach for Complex Data A MAS Based ETL Approach for Complex Data O. Boussaid, F. Bentayeb, J. Darmont Abstract : In a data warehousing process, the phase of data integration is crucial. Many methods for data integration have

More information

Towards Development of Solution for Business Process-Oriented Data Analysis

Towards Development of Solution for Business Process-Oriented Data Analysis Towards Development of Solution for Business Process-Oriented Data Analysis M. Klimavicius Abstract This paper proposes a modeling methodology for the development of data analysis solution. The Author

More information

Open Work of Two-Hemisphere Model Transformation Definition into UML Class Diagram in the Context of MDA

Open Work of Two-Hemisphere Model Transformation Definition into UML Class Diagram in the Context of MDA Open Work of Two-Hemisphere Model Transformation Definition into UML Class Diagram in the Context of MDA Oksana Nikiforova and Natalja Pavlova Department of Applied Computer Science, Riga Technical University,

More information

Blueprints for ETL workflows

Blueprints for ETL workflows Blueprints for ETL workflows Panos Vassiliadis 1, Alkis Simitsis 2, Manolis Terrovitis 2, Spiros Skiadopoulos 2 1 University of Ioannina, Dept. of Computer Science, 45110, Ioannina, Greece pvassil@cs.uoi.gr

More information

Comparative Analysis of Architectural Views Based on UML

Comparative Analysis of Architectural Views Based on UML Electronic Notes in Theoretical Computer Science 65 No. 4 (2002) URL: http://www.elsevier.nl/locate/entcs/volume65.html 12 pages Comparative Analysis of Architectural Views Based on UML Lyrene Fernandes

More information

Towards the integration of security patterns in UML Component-based Applications

Towards the integration of security patterns in UML Component-based Applications Towards the integration of security patterns in UML Component-based Applications Anas Motii 1, Brahim Hamid 2, Agnès Lanusse 1, Jean-Michel Bruel 2 1 CEA, LIST, Laboratory of Model Driven Engineering for

More information

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Chapter 4. The Relational Model

Chapter 4. The Relational Model Chapter 4 The Relational Model Chapter 4 - Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations and relations in the relational model.

More information

8) A top-to-bottom relationship among the items in a database is established by a

8) A top-to-bottom relationship among the items in a database is established by a MULTIPLE CHOICE QUESTIONS IN DBMS (unit-1 to unit-4) 1) ER model is used in phase a) conceptual database b) schema refinement c) physical refinement d) applications and security 2) The ER model is relevant

More information

MAC: Conceptual Data Modeling for OLAP

MAC: Conceptual Data Modeling for OLAP MAC: Conceptual Data Modeling for OLAP Aris Tsois Knowledge and Database Systems Laboratory, NTUA atsois@dblab.ece.ntua.gr Nikos Karayannidis Knowledge and Database Systems Laboratory, NTUA nikos@dblab.ece.ntua.gr

More information

Integrating SysML and OWL

Integrating SysML and OWL Integrating SysML and OWL Henson Graves Lockheed Martin Aeronautics Company Fort Worth Texas, USA henson.graves@lmco.com Abstract. To use OWL2 for modeling a system design one must be able to construct

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

The GOLD Model CASE Tool: an environment for designing OLAP applications

The GOLD Model CASE Tool: an environment for designing OLAP applications The GOLD Model CASE Tool: an environment for designing OLAP applications Juan Trujillo, Sergio Luján-Mora, Enrique Medina Departamento de Lenguajes y Sistemas Informáticos. Universidad de Alicante. Campus

More information

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting. DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie,

More information

A Tutorial on Agent Based Software Engineering

A Tutorial on Agent Based Software Engineering A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far A Tutorial on Agent Based Software Engineering Qun Zhou December, 2002 Abstract Agent oriented software

More information

An Agent Modeling Language Implementing Protocols through Capabilities

An Agent Modeling Language Implementing Protocols through Capabilities An Agent Modeling Language Implementing Protocols through Capabilities Nikolaos Spanoudakis 1,2 1 Technical University of Crete, Greece nikos@science.tuc.gr Pavlos Moraitis 2 2 Paris Descartes University,

More information

Arktos: towards the modeling, design, control and execution of ETL processes

Arktos: towards the modeling, design, control and execution of ETL processes Information Systems 26 (2001) 537 561 Arktos: towards the modeling, design, control and execution of ETL processes Panos Vassiliadis*, Zografoula Vagena, Spiros Skiadopoulos, Nikos Karayannidis, Timos

More information

An Ontological Analysis of Metamodeling Languages

An Ontological Analysis of Metamodeling Languages An Ontological Analysis of Metamodeling Languages Erki Eessaar and Rünno Sgirka 2 Department of Informatics, Tallinn University of Technology, Estonia, eessaar@staff.ttu.ee 2 Department of Informatics,

More information

Foundations of Data Warehouse Quality (DWQ)

Foundations of Data Warehouse Quality (DWQ) DWQ Foundations of Data Warehouse Quality (DWQ) v.1.1 Document Number: DWQ -- INRIA --002 Project Name: Foundations of Data Warehouse Quality (DWQ) Project Number: EP 22469 Title: Author: Workpackage:

More information

DSS based on Data Warehouse

DSS based on Data Warehouse DSS based on Data Warehouse C_13 / 19.01.2017 Decision support system is a complex system engineering. At the same time, research DW composition, DW structure and DSS Architecture based on DW, puts forward

More information

Fausto Giunchiglia and Mattia Fumagalli

Fausto Giunchiglia and Mattia Fumagalli DISI - Via Sommarive 5-38123 Povo - Trento (Italy) http://disi.unitn.it FROM ER MODELS TO THE ENTITY MODEL Fausto Giunchiglia and Mattia Fumagalli Date (2014-October) Technical Report # DISI-14-014 From

More information

DKMS Brief No. Five: Is Data Staging Relational? A Comment

DKMS Brief No. Five: Is Data Staging Relational? A Comment 1 of 6 5/24/02 3:39 PM DKMS Brief No. Five: Is Data Staging Relational? A Comment Introduction In the data warehousing process, the data staging area is composed of the data staging server application

More information

Constructing Object Oriented Class for extracting and using data from data cube

Constructing Object Oriented Class for extracting and using data from data cube Constructing Object Oriented Class for extracting and using data from data cube Antoaneta Ivanova Abstract: The goal of this article is to depict Object Oriented Conceptual Model Data Cube using it as

More information

Domain-Driven Development with Ontologies and Aspects

Domain-Driven Development with Ontologies and Aspects Domain-Driven Development with Ontologies and Aspects Submitted for Domain-Specific Modeling workshop at OOPSLA 2005 Latest version of this paper can be downloaded from http://phruby.com Pavel Hruby Microsoft

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence

Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence Ratko Orlandic Department of Computer Science and Applied Math Illinois Institute of Technology

More information

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Warehouse Logical Design Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Mart logical models MOLAP (Multidimensional On-Line Analytical Processing) stores data

More information

The Relational Data Model and Relational Database Constraints

The Relational Data Model and Relational Database Constraints CHAPTER 5 The Relational Data Model and Relational Database Constraints Copyright 2017 Ramez Elmasri and Shamkant B. Navathe Slide 1-2 Chapter Outline Relational Model Concepts Relational Model Constraints

More information

Building a Data Warehouse step by step

Building a Data Warehouse step by step Informatica Economică, nr. 2 (42)/2007 83 Building a Data Warehouse step by step Manole VELICANU, Academy of Economic Studies, Bucharest Gheorghe MATEI, Romanian Commercial Bank Data warehouses have been

More information

2.0.3 attributes: A named property of a class that describes the range of values that the class or its instances (i.e., objects) may hold.

2.0.3 attributes: A named property of a class that describes the range of values that the class or its instances (i.e., objects) may hold. T0/06-6 revision 2 Date: May 22, 2006 To: T0 Committee (SCSI) From: George Penokie (IBM/Tivoli) Subject: SAM-4: Converting to UML part Overview The current SCSI architecture follows no particular documentation

More information

Chapter 9: Relational DB Design byer/eer to Relational Mapping Relational Database Design Using ER-to- Relational Mapping Mapping EER Model

Chapter 9: Relational DB Design byer/eer to Relational Mapping Relational Database Design Using ER-to- Relational Mapping Mapping EER Model Chapter 9: Relational DB Design byer/eer to Relational Mapping Relational Database Design Using ER-to- Relational Mapping Mapping EER Model Constructs to Relations Relational Database Design by ER- and

More information

Data warehousing in telecom Industry

Data warehousing in telecom Industry Data warehousing in telecom Industry Dr. Sanjay Srivastava, Kaushal Srivastava, Avinash Pandey, Akhil Sharma Abstract: Data Warehouse is termed as the storage for the large heterogeneous data collected

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:

More information

Data Warehousing and OLAP Technology for Primary Industry

Data Warehousing and OLAP Technology for Primary Industry Data Warehousing and OLAP Technology for Primary Industry Taehan Kim 1), Sang Chan Park 2) 1) Department of Industrial Engineering, KAIST (taehan@kaist.ac.kr) 2) Department of Industrial Engineering, KAIST

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

Data Strategies for Efficiency and Growth

Data Strategies for Efficiency and Growth Data Strategies for Efficiency and Growth Date Dimension Date key (PK) Date Day of week Calendar month Calendar year Holiday Channel Dimension Channel ID (PK) Channel name Channel description Channel type

More information

Efficiency Gains in Inbound Data Warehouse Feed Implementation

Efficiency Gains in Inbound Data Warehouse Feed Implementation Efficiency Gains in Inbound Data Warehouse Feed Implementation Simon Eligulashvili simon.e@gamma-sys.com Introduction The task of building a data warehouse with the objective of making it a long-term strategic

More information

Guiding System Modelers in Multi View Environments: A Domain Engineering Approach

Guiding System Modelers in Multi View Environments: A Domain Engineering Approach Guiding System Modelers in Multi View Environments: A Domain Engineering Approach Arnon Sturm Department of Information Systems Engineering Ben-Gurion University of the Negev, Beer Sheva 84105, Israel

More information

INCONSISTENT DATABASES

INCONSISTENT DATABASES INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity

More information

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1 COSC344 Database Theory and Applications σ a= c (P) S P Lecture 3 The Relational Data π A, C (H) Model COSC344 Lecture 3 1 Overview Last Lecture Database design ER modelling This Lecture Relational model

More information

Database Systems Relational Model. A.R. Hurson 323 CS Building

Database Systems Relational Model. A.R. Hurson 323 CS Building Relational Model A.R. Hurson 323 CS Building Relational data model Database is represented by a set of tables (relations), in which a row (tuple) represents an entity (object, record) and a column corresponds

More information

Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD

Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD Matteo Golfarelli Stefano Rizzi Elisa Turricchia University of Bologna - Italy 13th International Conference on Data Warehousing

More information

A Step towards Centralized Data Warehousing Process: A Quality Aware Data Warehouse Architecture

A Step towards Centralized Data Warehousing Process: A Quality Aware Data Warehouse Architecture A Step towards Centralized Data Warehousing Process: A Quality Aware Data Warehouse Architecture Maqbool-uddin-Shaikh Comsats Institute of Information Technology Islamabad maqboolshaikh@comsats.edu.pk

More information

Chapter 2: Entity-Relationship Model

Chapter 2: Entity-Relationship Model Chapter 2: Entity-Relationship Model! Entity Sets! Relationship Sets! Design Issues! Mapping Constraints! Keys! E-R Diagram! Extended E-R Features! Design of an E-R Database Schema! Reduction of an E-R

More information

A Framework for Semi-Automated Implementation of Multidimensional Data Models

A Framework for Semi-Automated Implementation of Multidimensional Data Models Database Systems Journal vol. III, no. 2/2012 31 A Framework for Semi-Automated Implementation of Multidimensional Data Models Ilona Mariana NAGY Faculty of Economics and Business Administration, Babes-Bolyai

More information

Data Warehousing. Jens Teubner, TU Dortmund Summer Jens Teubner Data Warehousing Summer

Data Warehousing. Jens Teubner, TU Dortmund Summer Jens Teubner Data Warehousing Summer Jens Teubner Data Warehousing Summer 2018 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 Jens Teubner Data Warehousing Summer 2018 160 Part VI ETL Process ETL Overview

More information

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1 Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1 Dhirubhai Ambani Institute for Information and Communication Technology, Gandhinagar, Gujarat, India Email:

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Contents. Database. Information Policy. C03. Entity Relationship Model WKU-IP-C03 Database / Entity Relationship Model

Contents. Database. Information Policy. C03. Entity Relationship Model WKU-IP-C03 Database / Entity Relationship Model Information Policy Database C03. Entity Relationship Model Code: 164323-03 Course: Information Policy Period: Spring 2013 Professor: Sync Sangwon Lee, Ph. D 1 Contents 01. Overview of Database Design 02.

More information

Database Design Process

Database Design Process Database Design Process Real World Functional Requirements Requirements Analysis Database Requirements Functional Analysis Access Specifications Application Pgm Design E-R Modeling Choice of a DBMS Data

More information

Inference in Hierarchical Multidimensional Space

Inference in Hierarchical Multidimensional Space Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional

More information

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load

More information

Hierarchies in a multidimensional model: From conceptual modeling to logical representation

Hierarchies in a multidimensional model: From conceptual modeling to logical representation Data & Knowledge Engineering 59 (2006) 348 377 www.elsevier.com/locate/datak Hierarchies in a multidimensional model: From conceptual modeling to logical representation E. Malinowski *, E. Zimányi Department

More information

01/01/2017. Chapter 5: The Relational Data Model and Relational Database Constraints: Outline. Chapter 5: Relational Database Constraints

01/01/2017. Chapter 5: The Relational Data Model and Relational Database Constraints: Outline. Chapter 5: Relational Database Constraints Chapter 5: The Relational Data Model and Relational Database Constraints: Outline Ramez Elmasri, Shamkant B. Navathe(2017) Fundamentals of Database Systems (7th Edition),pearson, isbn 10: 0-13-397077-9;isbn-13:978-0-13-397077-7.

More information

ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES

ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES Ng Wai Keat 1 1 Axiata Analytics Centre, Axiata Group, Malaysia *Corresponding E-mail : waikeat.ng@axiata.com Abstract Data models are generally applied

More information

Data Warehousing Alternatives for Mobile Environments

Data Warehousing Alternatives for Mobile Environments Data Warehousing Alternatives for Mobile Environments I. Stanoi D. Agrawal A. El Abbadi Department of Computer Science University of California Santa Barbara, CA 93106 S. H. Phatak B. R. Badrinath Department

More information

Enabling Off-Line Business Process Analysis: A Transformation-Based Approach

Enabling Off-Line Business Process Analysis: A Transformation-Based Approach Enabling Off-Line Business Process Analysis: A Transformation-Based Approach Arnon Sturm Department of Information Systems Engineering Ben-Gurion University of the Negev, Beer Sheva 84105, Israel sturm@bgu.ac.il

More information

Information Modeling and Relational Databases

Information Modeling and Relational Databases Information Modeling and Relational Databases Second Edition Terry Halpin Neumont University Tony Morgan Neumont University AMSTERDAM» BOSTON. HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

The Basic (Flat) Relational Model. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

The Basic (Flat) Relational Model. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley The Basic (Flat) Relational Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 Outline The Relational Data Model and Relational Database Constraints Relational

More information

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1) Data Mining Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of CS 2016 2017 (1) Points to Cover Problem: Heterogeneous Information Sources

More information

Architecting Object Applications for High Performance with Relational Databases

Architecting Object Applications for High Performance with Relational Databases Architecting Object Applications for High Performance with Relational Databases Shailesh Agarwal 1 Christopher Keene 2 Arthur M. Keller 3 1.0 Abstract This paper presents an approach for architecting OO

More information

Data Warehouse Testing. By: Rakesh Kumar Sharma

Data Warehouse Testing. By: Rakesh Kumar Sharma Data Warehouse Testing By: Rakesh Kumar Sharma Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit

More information

Database Management

Database Management 204320 - Database Management Chapter 9 Relational Database Design by ER and EERto-Relational Mapping Adapted for 204320 by Areerat Trongratsameethong Copyright 2011 Pearson Education, Inc. Publishing as

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Database Instance And Relational Schema Design A Fact Oriented Approach

Database Instance And Relational Schema Design A Fact Oriented Approach Database Instance And Relational Schema Design A Fact Oriented Approach File-oriented approaches create problems for organizations because of d) how master files maintain facts used by certain application

More information

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society

More information

The i com Tool for Intelligent Conceptual Modelling

The i com Tool for Intelligent Conceptual Modelling The i com Tool for Intelligent Conceptual Modelling Enrico Franconi and Gary Ng Department of Computer Science, University of Manchester, UK {franconi ngg}@cs.man.ac.uk http://www.cs.man.ac.uk/ franconi/

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 05 Data Modeling Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Data Modeling

More information

A New Approach of Extraction Transformation Loading Using Pipelining

A New Approach of Extraction Transformation Loading Using Pipelining A New Approach of Extraction Transformation Loading Using Pipelining Dr. Rajender Singh Chhillar* (Professor, CS Department, M.D.U) Barjesh Kochar (Head(MCA,IT),GNIM) Abstract Companies have lots of valuable

More information

Keywords: Abstract Factory, Singleton, Factory Method, Prototype, Builder, Composite, Flyweight, Decorator.

Keywords: Abstract Factory, Singleton, Factory Method, Prototype, Builder, Composite, Flyweight, Decorator. Comparative Study In Utilization Of Creational And Structural Design Patterns In Solving Design Problems K.Wseem Abrar M.Tech., Student, Dept. of CSE, Amina Institute of Technology, Shamirpet, Hyderabad

More information

SIMULATING SECURE DATA EXTRACTION IN EXTRACTION TRANSFORMATION LOADING (ETL) PROCESSES

SIMULATING SECURE DATA EXTRACTION IN EXTRACTION TRANSFORMATION LOADING (ETL) PROCESSES http:// SIMULATING SECURE DATA EXTRACTION IN EXTRACTION TRANSFORMATION LOADING (ETL) PROCESSES Ashish Kumar Rastogi Department of Information Technology, Azad Group of Technology & Management. Lucknow

More information

Extended TDWI Data Modeling: An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques

Extended TDWI Data Modeling: An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques : An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques Class Format: The class is an instructor led format using multiple learning techniques including: lecture to present concepts, principles,

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

Chapter 5. The Relational Data Model and Relational Database Constraints. Slide 5-١. Copyright 2007 Ramez Elmasri and Shamkant B.

Chapter 5. The Relational Data Model and Relational Database Constraints. Slide 5-١. Copyright 2007 Ramez Elmasri and Shamkant B. Slide 5-١ Chapter 5 The Relational Data Model and Relational Database Constraints Chapter Outline Relational Model Concepts Relational Model Constraints and Relational Database Schemas Update Operations

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 5-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 5-1 Slide 5-1 Chapter 5 The Relational Data Model and Relational Database Constraints Chapter Outline Relational Model Concepts Relational Model Constraints and Relational Database Schemas Update Operations

More information