Meeting the challenges of data quality management laura sebastian-coleman - Experience the full eboo

Page 1


https://ebookmass.com/product/meeting-the-challenges-ofdata-quality-management-laura-sebastian-coleman/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Engineering Management: Meeting the Global Challenges, Second Edition 2nd Edition – Ebook PDF Version

https://ebookmass.com/product/engineering-management-meeting-theglobal-challenges-second-edition-2nd-edition-ebook-pdf-version/

ebookmass.com

Meeting the Ethical Challenges of Leadership: Casting Light or Shadow eBook

https://ebookmass.com/product/meeting-the-ethical-challenges-ofleadership-casting-light-or-shadow-ebook/

ebookmass.com

The Fall of Boris Johnson Sebastian Payne

https://ebookmass.com/product/the-fall-of-boris-johnson-sebastianpayne/

ebookmass.com

Designing Commercial Interiors – Ebook PDF Version

https://ebookmass.com/product/designing-commercial-interiors-ebookpdf-version/

ebookmass.com

https://ebookmass.com/product/etextbook-978-1259222139-financialaccounting-9th-edition/

ebookmass.com

Out of the Dark: m/m crime romance (Clear Water Creek Book 1) Scarlet Blackwell

https://ebookmass.com/product/out-of-the-dark-m-m-crime-romance-clearwater-creek-book-1-scarlet-blackwell/

ebookmass.com

The Rules of Dating My Best Friend's Sister Vi Keeland & Penelope Ward

https://ebookmass.com/product/the-rules-of-dating-my-best-friendssister-vi-keeland-penelope-ward/

ebookmass.com

The Cowboy's Honor Sandas Amy

https://ebookmass.com/product/the-cowboys-honor-sandas-amy/

ebookmass.com

Science under Siege: Contesting the Secular Religion of Scientism 1st Edition Dick Houtman (Editor)

https://ebookmass.com/product/science-under-siege-contesting-thesecular-religion-of-scientism-1st-edition-dick-houtman-editor/

ebookmass.com

https://ebookmass.com/product/hot-blooded-phoenix-rising-book-1-skyejordan/

ebookmass.com

MeetingtheChallengesof DataQualityManagement

MeetingtheChallengesof DataQualityManagement

LauraSebastian-Coleman

AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom

Copyright©2022ElsevierInc.Allrightsreserved.

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermission inwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthePublisher’ s permissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearanceCenterandthe CopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions

ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthan asmaybenotedherein).

Notices

Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.

Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusingany information,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodsthey shouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhaveaprofessional responsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliability foranyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,or fromanyuseoroperationofanymethods,products,instructions,orideascontainedinthematerialherein.

BritishLibraryCataloguing-in-PublicationData

AcataloguerecordforthisbookisavailablefromtheBritishLibrary

LibraryofCongressCataloging-in-PublicationData

AcatalogrecordforthisbookisavailablefromtheLibraryofCongress

ISBN:978-0-12-821737-5

ForInformationonallAcademicPresspublications visitourwebsiteat https://www.elsevier.com/books-and-journals

Thecoverimagecomesfroma12thcenturyEnglishmanuscript(currentlyhousedatTheWaltersArtMuseum, Baltimore)thatservedasascientifictextbookformonks.Thisdiagrampresentsageocentricviewoftheuniverse, describingthemovementofthesevenplanetarybodies(theSun,theMoon,andthefiveknownplanets listedon theverticalaxis)aroundtheearth.Thezodiacalnamesarelistedonthetophorizontal.Thepathofeachplanetary bodythroughthezodiacmovesfromlefttoright.Thebottomofthediagramliststhedistancesofplanetarybodies usingmusicalvalues(tone,semitone,threesemitones).

©ISO.ThismaterialisadaptedfromISO8000-16:2016,withpermissionoftheAmericanNationalStandards Institute(ANSI)onbehalfoftheInternationalOrganizationforStandardization.Allrightsreserved.

Publisher: MaraConner

EditorialProjectManager: MarianaL.Kuhl

ProductionProjectManager: PunithavathyGovindaradjane

CoverDesigner: VickyPearson

TypesetbyMPSLimited,Chennai,India

InMemoriam

Inwords,likeweeds,I’llwrapmeo’er, Likecoarsestclothesagainstthecold: Butthatlargegriefwhichtheseenfold Isgiveninoutlineandnomore.

—Alfred,LordTennyson.

Mr.J.J.O’Brien,1933 2015

RyanO.Sebastian,1988 2017

Mr.C.M.Sidlo,1928 2018

ProfessorJoeVoelker,1947 2020

Section1DatainToday’sOrganizations

Section3DataQualityManagementPractices

Foreword

WhyPROFESSIONALDataQualityManagementIssoImportant

Haveyoueverwonderediffinanceprofessionalsactuallylikefinance?Ican’tfigureitout.On theonehand,itseemslikeacoolprofession.Ontheotherhand,headsoffinancegroupsreport thattheirteamsspendthree-quartersoftheirtimeonmundanedatatasks,notfinance.Somaybe not.

Haveyoueverwonderedifdatascientistsliketoseetheirclever,newAI-drivenmodelssucceedinhelpingsolvesomeoftheircompanies’mostdifficultproblems?Ontheonehand, Harvard BusinessReview advisesthatdatascienceisthesexiestjobofthe21stcentury,andsolvingtough problemsseemsprettysexytome.Ontheotherhand,datascientistsarewellawareof“garbagein, garbageout”andthatpoordataqualitykeepstheirmodelsinthelab.1 Somaybenot.

Haveyoueverwonderedifseniormanagersenjoymakingtimely,inspireddecisions?That seemsliketheirjob.Butmostagreethattheymustdosointhefaceofnumbersthatareclearly wronganddisparatereports.Thatseemsmiserabletome,yettheydon’ttakestepstoresolvethe issues.Somaybenot.

NotethatIwouldgetthesameanswersifIaskedequivalentquestionsaboutpracticallyany profession.Itseemsasifeveryone’sjobhastwocomponents:Theirjobanddealingwiththemundanedataissuesthatslowthemdownandmakethemlesseffective.

Weallexperiencetheimpactofpoordataquality.Andmostofushavegrownwaytootolerant oftheproblemandthelittleindignitiesitaddstoourlives.

Inasimilarvein,haveyoueverwonderedifcompaniesreallywanttomakemoney?Everyone Italktoinsiststhatthisisthecase.Yetpoordatameansallemployeesmustspend/wasteincredible amountsoftimedealingwithdataissues.Peopledotheirbest,butplentyoferrorsleakthroughto customers,causingallsortsofproblems.Baddatabreedsmistrustamongteams,departmentsand customers.Itmakesitmoredifficulttounleashdata’stransformativepowers—attoday’slevelsof dataquality,AIisdownrightscary!Agoodfirstestimateistheassociatedcostscometo20%of revenue. 2 Attackingdataqualityissuesmaybethesinglebeststepacompanycantaketoimprove profitsandbuilditsfuture.Somaybecompaniesaren’tallthatinterestedinmakingmoney.

Onthebroadcanvass,poor-qualitydatathreatensdemocracies,getspeoplekilled,andthrottles muchinnovation.

Owingtotheimpact,scope,andubiquityofdataqualityissues,we’reallgoingtohavetoget involvedinmakingimprovements.Andifthegeneralindifferencetotheissuesisn’tenough,other factorscomplicatemattersstillfurther.Considerthefollowingscenario.Youreceiveaphonecall onyourcommutehome.It’stheprincipalofyourteenager’sschool.Theywereinvolvedinafight andbeingsuspendedforthenextweek.

Youarrivehomeandaskyourteenager,“Howwasyourday?” “Great,”theyreply.“IgotanA onmyChinesetest.”

1NithyaSambasivan,ShivaniKapania,HannahHighfill,DianaAkrong,PraveenKumarParitosh,LoraMoisAroyo. SIGCHI,ACM(2021).“Everyonewantstodothemodelwork,notthedatawork:DataCascadesinHigh-StakesAI.” https://research.google/pubs/pub49953/ 2Redman,T.C.(2017).SeizingOpportunityinDataQuality.MITSloanManagementReview.November27,2017. https://sloanreview.mit.edu/article/seizing-opportunity-in-data-quality/

Copyright © 2022ThomasC.Redman.PublishedbyElsevierInc.Allrightsreserved.

xxi

Presumably,yourkidtoldthetruth.Butclearly,theydidnotprovidethehigh-qualitydatayou sought.Dataquality,itseems,ismorethanjust“correctness.”

Thesecondcomplicationisthatmanypeoplemakeaddressingdataqualityseemeasierthanit is.Throughoutmycareer,I’veheardhundredsofversionsofthefollowing,allpromisinganeasy solution:“Weallknowyoudodataqualityasyoumovethedatatothewarehouse.”“Intheageof bigdata,thesheerquantitiesofdatameanyoudon’thavetoworryaboutquality.”“Ourdatacatalogwillsolveallyourdataqualityproblems.”“Datagovernancewillsolvedataqualitywoes.” “TheproblemswillallmeltawaywhenwestartusingAI.”Yethereweare!

DealingwithallthisrequirestheleadershipofPROFESSIONALdataqualitymanagers;skilled andexperiencedintheapproachesandtechniquesneededtoattacktheissuesproperly;savvy enoughtobuildneededpoliticalsupportandgeteveryoneinvolved;wiseenoughtoknowtheeffort willtakealongtime;impatientenoughtogetonwithit;courageousenoughtorebutthenonsense promotedbyamateurs.

Thereismuchtodoandmuchtolearn.Mytop-shelffeaturesworksbyAT&T,Beer,Covey, BrownandDuguid,Davenport,Deming,Drucker,English,Goldratt,Hay,Juran,Kent,Kotter, Ladley,Laney,McGilvray,Porter,theearlyworkofmygroupatBellLabs,Roberts,Silver,Tufte, theearlyworkofWang’sgroupatMIT,Zachman,andothers.IamaddingLauraSebastianColemantothatshelf.Thisbookcoversthehardstuff,fromin-depthdiscussiononwhydataqualitymatters,settingstrategy,whymanagingdataqualityatscaleissodifficult,tothebrutalpolitics associatedwithdata.ItistheonlyoneaimedspecificallyatPROFESSIONALDATAQUALITY MANAGERS.So,ifyouareoneofthose,orhopetobecomeone,starthere.

Professionaldataqualitymanagerisabigjob.Itismostcertainlynotthe“sexiestjobofthe 21stcentury,”butitisquitepossiblythe“mostneededandimportantjobofthe21stcentury.”

ThomasC.Redman

“theDataDoc”,Rumson,NJ,UnitedStates June2021

Glossary

“Wordsarenotassatisfactoryasweshouldlikethemtobe,but,likeourneighbors,wehavegot tolivewiththemandmustmakethebestandnottheworstofthem.”

ABCControls Audit/balance/control(ABC)processesgovernthemovementofdatawithinandbetweensystems.Alsoreferredtoas systemcontrols or controlreports.Thesecontrolsinvolvecountingorsumming recordsatdifferentpointsduringdataprocessingtoensurethatdatahasnotbeenlostor“dropped”aspart ofprocessing.Theseworkontraditionalformats(e.g.,tables,files)inwhichthedataitselfisstatic,rather thandynamic.

AccessMetadata Atypeofoperationalmetadatathatdescribeswhatpeople,processes,andsystemsaccessed whichdatasetsatwhattime,andforwhatdata.ItincludesdetailslikeaccessIDs,SQLcode,andexecutiontimeofqueries.Itcanbeminedforaccessandusagepatterns.

Accessibility Adimensionofqualitythatmeasuresthedegreetowhichdataconsumerscanobtaindatafrom anappropriatesourceandinaformthatallowsthemtouseit.

Accountability Tobeaccountablemeanstobeanswerable,literally“liabletobecalledtoaccount”(Online EtymologicalDictionary).Accountabilitymeansapersonisexpectedtoactinadefinedmannerandmustbe abletoexplaintheiractions.Tohaveaccountabilityfordatameanshavingoverallresponsibilityforadata setanddefiningclearresponsibilitiesforthosethatproducethedata;peopleaccountablefordataalsoensure thatdataisinitsexpectedconditionandareabletoexplaintheconditionofdataundertheircontrol.

Accuracy Adimensionofquality.Thedegreetowhichdatacorrectlyrepresentsattributesofareal-world object,concept,orevent.

Asset Ausefulorvaluablething;akindofpropertyavailabletomeetdebtsandothercommitments.

Attribute Inlogicaldatamodeling,anattributerepresentsacharacteristicofanentity.Becauseofthisuse, attribute issometimesunderstoodasadataelement(apieceofdatausedtorepresentacharacteristicof anentity),afield(partofasystemusedtodisplayorintakedata),oracolumn(aplaceinatabletostore adefinedcharacteristicofarepresentedentity,i.e.,tostorevaluesassociatedwithdataelements).

AuthorizedDataSource Asystemorprocessthathasbeendesignatedbyadatagovernanceteamasthe placefromwhichtoobtaindataforspecificuses.Forexample,thecustomerrelationshipmanagementsystemmaybedesignatedastheauthorizeddatasourceforobtainingdataforemailcampaigns.Synonyms include: authoritativedatasource and systemofrecord.

BaselAccords Avoluntaryregulatoryframeworkforthefinancialservicesindustrycreatedinresponsetothe globalfinancialcrisisof2008.Theaccordsprovideameanstotestbankcapitaladequacy,stresstesting, andmarketliquidityrisk.Theyhaveimportanteffectsonhowfinancialinstitutionshandletheirdata.

BaselineMeasurement Abaselinemeasurementistakentoserveasapointofcomparisonforsubsequent measurement.ASQdefinesitas“Thebeginningpoint,basedonanevaluationofoutputoveraperiodof time,usedtodeterminetheprocessparameterspriortoanyimprovementeffort;thebasisagainstwhich changeismeasured.”(Source:ASQ.)

BigData Datacharacterizedbylargevolume(largeamountsofdatacreated),variety(arangeofformats), andvelocity(datacreatedandmadeaccessibleveryquickly).Forprocessingandstorage,BigData requiresdifferenttechnologiesthandoestraditionaldata.

SamuelButler

BusinessGlossary Asystemusedtodocumentandmanagetermsanddefinitionsneededtounderstandan organization’sbusinessconceptsandtodescribet herelationshipamongconcepts.Inacomplexorganization,businessglossariesareneededtoensurethatpeopleunderstandeachother.Businessglossariesplayacriticalroleindatamanagementbecause theyenabledataconsumerstomakeconnections attheconceptualorlogicallevelamongdataelementsthatmayhavedifferentnamesanddifferent structuresindifferentsystems.Abusinessglossaryisintendedtosupplysystem-agnosticinformation relatedtodata(asopposedtoadatadictionary,whi chdescribesdatathathasbeeninstantiatedina system).

BusinessMetadata Businessmetadatafocuseslargelyonthecontentandconditionofthedataandincludes detailsrelatedtodatagovernance.Businessmetadataincludesthenon-technicalnamesanddefinitionsof concepts,subjectareas,entities,andattributes;attributedatatypesandotherattributeproperties;range descriptions;calculations;algorithmsandbusinessrules;andvaliddomainvaluesandtheirdefinitions.

(Source:DMBOK2.)

BusinessProcessOwner Apersonwhoisaccountableforexecutionofabusinessprocessthatproducesdata. Ideallythebusinessprocessownershouldbeaccountableforthequalityoftheoutputsfromtheprocess, oneofwhichisdata.

BusinessRule Adefinedconstraintonabusinessprocess.Businessrulesdescribewhatmusthappenorwhat cannothappenwithinaprocess.Businessrulescanbeusedasrequirementsfortechnicalprocessesorto establishdataqualityexpectations.

BusinessStrategy Aplanforanorganizationtoachieveitsgoals.Thisincludesaligningthepeople,processes,technology,andinformationrequiredtomeetthesegoals.

BusinessTerminology Thewordsanorganizationusestodescribeitselfanditsprocesses.Theconceptof businessterminologyrecognizesthateachorganizationhasitsownvocabularyforitsbusinessprocesses andthatthereisoftenadifferencebetweenhowconceptsaredefinedinbusinessprocessesandhowthey arenamedwhentheyaredefinedasdatainsystems.Therearealsodifferencesinnamingconventions amongsystems.Businessterminologyprovidesameansbywhichdatacanbeunderstoodinrelationto businessprocessesandinrelationtootherdata.Indoingso,ithelpspeoplerecognizewheninstancesof thesamedataarenameddifferentlyaswellaswheninstancesofdifferentdatahavethesamename.

CaliforniaConsumerPrivacyAct(CCPA) Dataprivacyandprotectionlegislationenactedbythestateof Californiain2018.

CertifiedData Datathathaspassedcertificationrequirements.Theserequirementsmayincludearangeof criteria,suchas:dataispopulatedcompletely,datareconcilestoadefinedstandard,datahaspassedall dataqualityrules,dataissupportedbymetadata.Criteriawilldiffer,dependingonthedataset,butcertificationrequirementsmustclearlystatewhatthecriteriaareforadataset.

Clarity Adimensionofquality.Thedegreetowhichdatavaluesareunambiguous,distinct,non-overlapping, andcomprehensible.

CodifiedData Informationthatisrepresentedbycodevalues.Tocodifyathingistoreduceittoacode.For example,anICD-10codeprovidesashort-handwaytorefertoamedicaldiagnosis.AUSZipcoderefers toanareawheremailisdelivered.

Column Column/field/attribute.Acolumnisacomponentpartofatableinadatabase.Tablesaremadeupof rows,eachrepresentingoneinstanceofanentityrepresentedbythetable,andcolumns,containingcharacteristicsoftherepresentedentity. Field and attribute areusuallyunderstoodassynonymsfor columns becausetheyalsocontaincharacteristicsofarepresentedentity.Butcolumnsarespecifictotables.

ColumnProfile See DistributionofValues. Commerce-basedOrganizationalData See OrganizationalData

Completeness Adimensionofqualitythatanswersthequestion:DoIhaveallofthedataIneedorexpectto have?Completenessdescribesthedegreetowhichadatasetcontainsallrequireddata.Itcanbe

understoodatthedatasetlevel(Doesthedatasetcoverthepopulationrequired?);therecordlevel(Does thedatasetcontainallofthefieldsneededfortheanalysisofthepopulation?);thefieldlevel(Areall mandatoryfieldspopulated?Areoptionalfieldspopulatedaccordingtopopulationrules?).Completeness canalsobeseenthroughthelensofotherrequirements:allthedataforapopulation,fromasystem,fora timeperiod.

Consistency Adimensionofqualitythatanswersthequestion:Doesthedataconformtopatternsdefined throughotherinstancesofthesamedataset?Thedegreetowhichdataexhibitsexpectedpatterns.

ConsumerDataRights(CDR) DataprivacyandprotectionlegislationenactedbythestateoftheFederal AustralianGovernmentin2019.

Control Ameansofprovidingfeedbackwithinasystem.Adatacontrolcanbesetupasbinary;thedata eithermeetstheconditionsofthecontrol,oritdoesnot.Ifthedatadoesnotmeettheconditionsofthe control,thenthefeedbacktothesystemistostoptheprocess.Acontrolcanalsoprovidefeedbackthrough atolerancelevelorthreshold(e.g., x%ofdatamettheconditionofthecontrol;100% x%didnot). (Source:Sebastian-Coleman,2013.)

Correctness Adimensionofqualitythatanswersthequestion:Doesthisdataaccuratelyrepresentwhatitpurportstorepresent?Correctnesscanbeunderstoodintermsofformat(Isthedataformattedcorrectly?),definition(Doesthedefinitionaccuratelydescribethedata?),ordatavalues(Isthevalueinthecolumnthe rightvalueforthisentityinstance?).Correctnessisasynonymfor accuracy

CostofPoorQuality(COPQ) Thecostassociatedwithprovidingpoor-qualityproductsorservices.There arefourcategories:internalfailurecosts(costsassociatedwithdefectsfoundbeforethecustomerreceives theproductorservice),externalfailurecosts(costsassociatedwithdefectsfoundafterthecustomer receivestheproductorservice),appraisalcosts(costsincurredtodeterminethedegreeofconformanceto qualityrequirements),andpreventioncosts(costsincurredtokeepfailureandappraisalcoststoaminimum).(Source:ASQ.)

Cost-BenefitAnalysis(CBA) Acomparisonofthecostsofimplementingachangeandtheexpectedbenefits ofimplementingthechange.Alsocalled benefit-costanalysis.ACBAaccountsforhardnumbers(direct measurablecosts)andsoftnumbers(expectedeffectsonpeople’sbehaviorandattitudes).Dataquality improvementeffortsshouldincludeaCBA.

CriticalDataElement(CDE) Datathatisrequiredbythebusinesstoexecuteisprocessesandmustbeof highqualityforthoseprocessestoexecutesuccessfully.Criticaldataincludesdatarequiredtoservecustomers;meetstrategicgoals;ensurecompliancewithlaws,regulations,andcontractualobligations;and measureitsownsuccess,CDEsarethefocusofdataqualitymonitoringandimprovementefforts.(Source: Jugulum,2014.)

CriticalDataElementsandRelationships(CDERs) The R addedtoCDEemphasizesthefactthatmany dataqualitychallengesarenotlimitedtoisolateddataelementsbuttotherelationshipamongcomponent piecesofdata.

Currency Adimensionofquality.Thedegreetowhichdataismaintainedandvaluescorrectlyreflecttheir real-worldcounterpartswithinagiventimeframe.

Data Ameansofencodingandsharingknowledgeandinformationabouttherealworld.Dataistherepresentationofselectedcharacteristicsofobjects,events,andconcepts,expressedandunderstoodthroughexplicitlydefinedconventionsrelatedtotheirmeaning,collection,andstorage.Sincetheintroductionofthe computerinthemid-20thcentury,theword data hasalsobeenusedtorefertoanyinformationcaptured inorprocessedbyacomputerorotherinformationtechnologysystem.

DataasanAsset Anorganization’sdataisanassetinthatitcanbeusedbytheorganizationtocreatevalue. Dataisoftencomparedtootherorganizationalassets:people,equipment,money,andintellectualproperty. Whentalkedaboutasanasset,dataisconsideredabroadcategory.Thetotalityofanorganization’sdata isanassettothatorganization.(Source:DMBOK2.)

DataasData Anunderstandingofhowdatafunctionsinasemioticsystem;howitiscreatedorcollected, structured,andorganizedforuse;andhowitmaychangeovertimeandhavedifferentusesindifferent circumstances(e.g.,knowledgeofthedatalifecycle).

DataAsset Therecognitionofdataasanassethasledtotheideaofadataasset.Adataassetitasetofdata thatbringsvaluetotheorganizationinparticularwaysthatcanberecognizedasdistinctfromthoseof otherdatasets.Adataassetcanbelargeorsmall,formallyorinformallydefined.Theabilitytodistinguishindividualdataassetswithintheentiretyofanorganization’sdataisusefulbecausedifferentdata setshavedifferentvaluetotheorganization(justasdifferentfinancialinstrumentsorinvestmentsmay havedifferentvaluetotheorganization).Beingabletoidentifyandrecognizethevalueofindividualdata setshelpsanorganizationprioritizeworkrelatedtomanagingdifferentdatasets.Forexample,itmaybe farmoreimportanttogetcustomerdatacorrectthantogetvendordatacorrect.Takentogether,theseindividualdataassetscomprisetheoveralldataassetsoftheorganization.

DataCatalog Alistorinventoryofobjectsinadatabaseorothersystem/platform,withbasicmetadataabout them(e.g.,name,definition,origin,businesspurpose,subjectarea,format,categories,tagsofdata,and otherattributesaboutthedataobjectthatmaybeimportanttoitsusebypeople,processes,orsystems).A datacatalogcancontainreferencestodataatdifferentobjectlevels(e.g.,files,tables,andviews),anditis organizedinawaythatallowsittobeassociatedwithfinerlevelsofgrains(e.g.,fields,relationships). Datacatalogssupportdatadiscovery.Theyenablepeopletobringtogethersetsofdatafordifferentpurposes(e.g.,viataggingorcategorization).Datacatalogscanbecreatedmanuallyorgeneratedbyrunning aprogramagainstadatabase,orthroughacombinationofboth.

DataCertification Theprocessofmeasuringdataagainstdefinedcertificationcriteria( certificationrequirements)anddeterminingthedegreetowhichitadherestothecriteria.Criteriamayincludestandardsfor dataquality,conditionsformanagement,levelofsecurity/protection,qualityofmetadataandothersupportinginformation,oracombinationofthesethings.Standardsforcertificationmaybebasedon requirementsexternaltotheorganizationthatwantsitsdatacertified(e.g.,datais“certified”ifitmeets requirementsforBASELorSOX),orstandardsmaybeinternallydefined(CriticalDataElementsare certifiedbasedonbusinessdefinedrequirements).Differentlevelsofdatamaybecertified(e.g.,system, dataset,datarelationship,dataelement).Datacertificationisnotaone-timeevent.Afterinitialcertification,datamustbeperiodicallyre-auditedforcertif icationrequirements.Differentstandardsforcertificationmayalsobedefinedandmeasured(e.g.,“Gold” standardcertificationmeansdatahaspassedall requirements;“Silver”maymeanithaspassed95% ofrequirements;“Bronze”maymeanithaspassed 90%,andsoon.)

DataChain See DataSupplyChain.

DataConsumer Anyperson,process,orsystemthatusesdata.Thetermisusedtodistinguishbetweendata producers(whocreatedata)andthosewhousedata.Adataconsumerforoneprocessmaybeadataproducerforanotherprocess.Synonymsinclude datacustomer,enduser,and targetsystem

DataDemocratization Theprocessofenablingawidergroupofpeopletoaccessandusedata.Itinvolves removingobstaclestodatausageandencouragingmorepeopletobecomedataliterate.ItisalsoassociatedwithreducedrelianceonITresourcestousedata.

DataDictionary Adatadictionarycontainstableandcolumnnamesanddefinitionsalongwithotherinformationthathelpsdataconsumersusedata.Datadictionariesoftencontaindetailsaboutthephysicalstructure ofdata(e.g.,thekeystructureforatableandthedatatypeandfieldlengthsforcolumns).Acomprehensivedatadictionaryforarelationaldatabasewillincludeinformationonstandardjoinsbetweentables.It mayevenincludefiltercriteriaforparticularkindsofreports.Adatadictionaryisusuallysettosupport theuseofaparticularsystem.Seealso BusinessGlossary

DataDiscovery Theexplorationandassessmentofdatathatisconductedtounderstanditsstructure,content, andpotentialforuse.Discoveryoftenincludesdatapreparation.Italsomayusedatavisualization

techniquestoidentifypatternsandoutliers(thusgoingbeyondbasicdataprofiling).Discoverycanbe usedtounderstandwhatdataexistsinasystemandtodeterminewhetherdatacanbeusedforaparticular purpose.Itcanbeconductedagainstanexistingdatastoreoragainstcandidatedataforaproject.Itcanbe focusedonmultiplesourcesandtherelationshipamongthem,onasingledatasource,orevenonanindividualdataset.

DataElement Adataelementisapartofadataset.Itisusuallyunderstoodasacolumnorfield,butadata elementcanincludemultipleattributes(e.g.,“address”withallofitsattributescanbethoughtofasasingledataelement).

DataEnvironment Abroadtermforthecollectionoffactorsthatinfluencethecreationanduseofdata:the businessprocessesthroughwhichdataisproduced,thetechnicalsystemsthroughwhichitisproducedand stored,thedataitself,metadata(includingdatastandards,definitions,andspecifications),technicalarchitecture(includingdataaccesstools),anddatauses.Allofthesehaveimplicationsforunderstandingdata quality.Thedataenvironmentincludesthepeople,processes,andtechnologyinvolvedincreatingand usingdata.(SeeEnglish,1999;McGilvray,2021.)

DataGovernance Organizationaloversightofdataanddat a-relatedprocesses.TheDataGovernance Institutedescribesitas“asystemofdecisionrightsandaccountabilitiesforinformation-relatedprocesses,executedaccordingtoagreed-uponmodelswhichdescribewhocantakewhatactionswith whatinformation,andwhen,underwhatcircumstances,usingwhatmethods.”(SeeTheData GovernanceInstitute,Datagovernance.com.)

DataGovernanceOrganization Ateamformallyassignedtoimplementandmanageadatagovernance program.

DataGovernanceProgram Asetofprojectsandprocessesputtogethertodefineresponsibilitiesfordata, establishandexecuteprocessesformakingdecisionsaboutdataandensuringthatresponsibilitiesformanagingdataareexecutedconsistently.Datagovernanceprogramsmayincludemultiplecomponentsthat focusonaspectsofdatamanagement,suchasdatapolicy,datasecurity,metadatamanagement,dataqualitymanagement,referencedatamanagement,andmasterdatamanagement.

DataGovernanceStrategy Adatagovernancestrategydescribeshowanorganizationwillimplement,execute,andderivebenefitfromdatagovernancefunctions.Forexample,suchastrategywillstatehowthe organizationwilldefinedecisionrightsandaccountabilitiesfordataaswellasthebehaviorsthattheorganizationwilladoptandenforcetoensurethatdataisusedtobenefittheorganizationandnotusedtothe detrimentoftheorganization.

DataIssue See DataQualityIssue

DataLifeCycle Asetofhigh-levelphasesrelatedtohowdataiscreated,changesovertime,andisdisposed of.Basedontheproductdevelopmentlifecycle,thissetofphaseshasbeendescribeddifferentlybydifferentexperts,butallversionscontaintheideathatdataiscreatedorobtained,stored,used,anddisposedof. Thedatalifecycleiscriticaltodatamanagementbecausetherearedifferentmanagementrequirementsat thedifferentphasesofthelifecycle.Thedatalifecyclediffersfromboththesystemdevelopmentlife cycle(SDLC),whichdescribeshowprojectsareexecuted,andthedatachain,whichdescribeshowdata moveswithinandbetweensystemstomeettheneedsofaparticularorganization.(Source:SebastianColeman,2013.)

DataLineage Datalineageisaformofmetadatathatdescribesthemovementofandchangestodataasit passesthroughsystemsandisadoptedfordifferentuses.Lineagecanbedescribedatdifferentlevelsof detail(process-to-process,system-to-system,table-to-table,columntocolumn).Adocumenteddatachain foranorganizationisaversionofdatalineage.Datalineagereferstoasetofidentifiablepointsthatcan beusedtounderstanddetailsofdatamovementandtransformation(e.g.,transactionalsourcefieldnames, filenames,dataprocessingjobnames,programmingrules,targettablefields).Mostpeoplewhoareconcernedwiththelineageofdatawanttounderstandtwoaspectsofit:thedata’soriginandthewaysin

whichthedatahaschangedsinceitwasoriginallycreated.Changecantakeplacewithinonesystemor amongsystems.(Source:Sebastian-Coleman,2013.)

DataLiteracy Theabilitytoread,understand,interpret,andlearnfromdataindifferentcontextsandtheabilitytocommunicateaboutdatatootherpeople.

DataManagementStrategy Adatamanagementstrategydefineshowtheorganizationwillmanageandsupportthedataitneedsovertime.Datamanagementincludesarangeoffunctionalareas,eachofwhich mayhaveitsownstrategy.Allcomponentpiecesofadatamanagementstrategymustsupportthebusiness strategy.

DataMappingSpecification Adatamappingspecificationdocumentstherulesassociatedwithmovingdata point-to-pointalongthedatasupplychain.Amappingspecificationmaydescribemovementatthefile/ tableleveloratthecolumn/fieldlevel.Alsocalleda source-to-targetmap (STMorSTTM).

DataMart Adatacollectionputtogethertoservespecificpurposes.

DataModel Avisualrepresentationofdatacontentandthe relationshipsbetweendataentitiesandattributes,createdforpurposesofunderstandinghowdatacanbe(oractuallyis)organizedorstructured. Datamodelsincludeentities(understoodastables),at tributes(understoodascolumnscontainingcharacteristicsaboutrepresentedentities),relationshipsbetweenentities,andintegrityrulesalongwithdefinitionsofallofthesepieces.Logicaldatamodelsandphysicaldatamodelshavedifferentattributesand arerelatedtoeachother.Adatamodelcontainsasetofsymbolswithtextlabelsthatattemptsvisually torepresentdatarequirementsascommunicatedtothedatamodeler,foraspecificsetofdatathatcan rangeinsizefromsmall(foraproject)tolarge(foranorganization).Themodelisaformofdocumentationfordatarequirementsanddatadefinitions resultingfromthemodelingprocess.Datamodelsare themainmediumusedtocommunicatedatarequireme ntsfrombusinesstoITandwithinIT,fromanalysts,modelers,andarchitects,todatabasedesignersanddevelopers.(Sources:DAMA,2017; Hoberman2009;Sebastian-Coleman,2013.)

DataModeling Theprocessofdiscovering,analyzing,andscopingdatarequirementsandthenrepresenting andcommunicatingthesedatarequirementsinapreciseformcalledthe datamodel.Datamodelsdepict andenableanorganizationtounderstanditsdataassets.(Source:Hoberman,2009.)

DataMonetization Theprocessofderivingeconomicvaluefrommoney,eitherdirectly(bysellingdataor incorporatingdataintootherproducts)orindirectly(byusingdatatosupporttheexchangeofothergoods andservices).(Source:Laney,2018.)

DataOwner Apersonwhoisaccountableforthequalityofdatainthewidestsenseoftheterm quality. Ownership canbedefinedattheprocess,system,domain,ordatasetlevel.Ownershipmustbedefined notonlyinrelationthescopeofdata,butalsowithinthedatachainandthedatalifecycle.Ownershipof datamaychangeatdifferentpointsinthedatachainoratdifferentphaseswithinthedatalifecycle,or onepersonmaybeaccountablefordatathroughoutthedatachainoracrossthedatalifecycle.

DataPipeline Atermusedtodescribedatacomingintoadatalakeorotherlargeunstructureddatastorage system.

DataProcessingMetadata Atypeofoperationalmetadatacollectedviatheexecutionofprogramsthatmove databetweenandwithinsystems,oftentransformingitaspartofthismovement.Itincludesthehistoryof extractsandresults;detailsofthetiming,size,andsuccessofETLprocesses;datacapturedviaETLlogs andaudit/balance/controlprocesses;anderrorlogs.Thismetadatacanbeminedtoidentifyschedule anomalies,patternsinfilessizes,andtheconsistencyofdatadeliveryandprocessing.Dataprocessing metadataissupportedbymetadatathatdescribesSLArequirements,processingschedules,andsourcesystemcontactinformation.Itcanalsobeusedtoaggregatevolumemetrics.

DataProducer Anyperson,process,orsystemthatcreatesdata.Thetermisusedtodistinguishbetweendata consumers(whousedata)anddatacreators(whomakedata).Adataconsumerforoneprocessmaybea dataproducerforanotherprocess.

DataProfiling Aformofdataanalysisusedtoinspectdataandassessthequalityofdataandmetadata. Usingstatisticaltechniquestodiscoverdatastructureandcontent,profilingenablesanalyststodetermine howcloselydataalignswiththeexpectationsofdataconsumers.Profilingprovidesapictureofdatastructure,content,rules,andrelationshipsbyapplyingstatisticalmethodologiestoreturnasetofstandardcharacteristicsaboutdata:datatypes;fieldlengths;cardinalityofcolumns;granularity;valuesets;format patterns;contentpatterns;impliedrules;cross-columnandcross-filedatarelationships;andcardinalityof theserelationships.Therearebenefitstoprofilingdataindifferentcontexts.Forexample,comprehensive profilingofdataatthebeginningoftheprojectlifecyclecanbeusedtotestassumptionsaboutdatain ordertoreduceprojectrisks.Profilingofdatawithinasystemcanalsobeusedinqualityimprovement effortstoidentifydataissues,improvemetadata,definedataqualitymeasurements,setthresholds,andso on.(Sources:Olson,2003;Sebastian-Coleman,2013.)

DataProfilingResults Thedetaileddatavaluesreturnedwhendataisrunthroughaprofilingprocess.For example,profilingresultsforcolumnsusuallyincludeminimumandmaximumvalues,impliedformatand datatype,alongwithafrequencydistributionofthesetofvalues.

DataProvenance Theoriginofthedata,usuallyunderstoodasthesystemorprocessthroughwhichdataiscreated.

DataQuality Ameasureofthedegreetowhichdatameetstheexpectationsandrequirementsofdataconsumers.Thisideaissometimesexpressedasthedegreetowhichdatais“fitforapurpose.”Theterm data quality isalsosometimesusedtorefertotheactivitiesandtoolsusedtomanagethequalityofdata(See DataQualityManagement.)

DataQualityDimension Acharacteristicofdatathatcanbemeasuredandthroughwhichitsqualitycanbe quantified.Therearemanyframeworksthatdefinedataqualitydimensions.Thereisnoagreed-toset. However,allaccountforsimilarconceptsthathaveacommonsensemeaning.

DataQualityIssue Anyconditionofdatathatpresentsanobstacleorarisktoadataconsumer’suseofthat dataregardlessofwhodiscoveredtheissue,where/whenitwasdiscovered,whatitsrootcausesare,or whattheoptionsareforremediation.

DataQualityIssueManagement Theprocessofremovingorreducingtheimpactofobstaclestotheuseof databydataconsumers(people,processes,andsystems)byidentifying,analyzing,quantifying,prioritizing,andremediatingtherootcausesofobstaclestodataconsumers’usesofdata.(Source:SebastianColeman,2013.)

DataQualityManagement Asetofactivitiesintendedtoensurethatdataisfitforthepurposesofdataconsumers;itincludesthecoreactivitiesrequiredtoassess,measure,andreportontheconditionofdataas wellasthoserequiredtomanagedataissues,preventproblems,andimprovethequalityofdata.

DataQualityManagementStrategy Aplanthatdefinesthecoredataqualitymanagementcapabilitiesthe organizationwillimplement,whatdatasetstheywillbeappliedto,whattoolswillbeused,whowillbe responsibleforexecutingthem,andwhattheexpectedbenefitswillbetotheorganization.

DataQualityMeasurementResults Thedetaileddatavaluesreturnedwhendataqualityrulesorreasonabilitytestsareexecuted.Forexample,theresultofmeasuringthecompletenessofacolumnwillreturnthe countofrecordswithaNULL,countoftotalrecords,percentageofrecordsviolatingtherule,andpotentiallyothercalculationsthatputtheindividualmeasurementresultincontext(e.g.,thresholdvalue,mean ofpastresults).(Source:Sebastian-Coleman,2013.)

DataQualityMonitoring Theability,throughmeasurementsandothercontrols,totracklevelsofquality withinasystemorprocesstoensurethatitcontinuestomeetrequirementsandtodetectunexpected changesinpatternsofsize,composition,orothercharacteristicsofpopulation,andtodetectpotential issuessoactioncanbetakeninresponsetothem.

DataQualityRule Aconstraintdefinedonaqualitydimensionforadataset,adataelement,ortherelationshipbetweendataelements.Dataqualityrulescanbeusedtovalidateormeasuredataaswellastomonitordata.Dataqualityrulescanbethebasisoftransformationrulestocleansedata.

DataQualityStandard Anassertionabouttheexpectedconditionofdata,usuallyrelateddirectlytoaquality dimension:howcompletethedatais;howwellitconformstodefinedrulesforvalidity,integrity,andconsistency;andhowitadherestodefinedexpectationsforpresentation.Astandardmaybeexpressedasa simplerule(“FirstNamemustbepopulatedonMemberrecords”)ormaydescribeasetofconditionsthat mustbemet(“AddressdatamustconformtoUSPSstandards”).Dataqualitystandardsshouldbemeasurable.Determiningwhetherdatameetsasetofconditionsmayinvolvemultiplemeasurements.

DataSet Acollectionofdatabroughttogetherforapurpose.Thedefinitionof set maybesimplewithaclear purpose;forexample,atableisadatasetputtogethertorepresentattributesassociatedwithaspecific entity.Oritmaybemorecomplicatedandservemultiplepurposes;forexample,datadomainmaycomprisemultipletablesfrommultiplesystemsandbeusedforbillingandinteractingwithcustomersaswell asanalysisandreporting.

DataStandards Assertionsabouthowdatashouldbecreated,presented,transformed,orconformedforpurposesofconsistencyinpresentationandmeaningandtoenablemoreefficientuse.Datastandardscanbe definedattheprocess(describingtherequiredinputsorexpectedoutputs),value(column/field),structure (table),ordatabaselevels.Theyhaveanimpactontechnicalprocessingandstorageofdataaswellason dataconsumeraccesstoanduseofdata.

DataSteward Astewardisapersonwhosejobistomanagethepropertyofanotherperson.Datastewards managedataassetsonbehalfofothersinthebestinterestsoftheorganization(McGilvray,2021). Informally,asubjectmatterexpertinadatadomain,dataset,process,ordataelementwhoactsaccountablytowarddataandonwhomothersrelyforinformationandexpertise.Formally,asajobtitle,aperson whohasspecificallydefinedresponsibilitiesrelatedtohelpingtheorganizationcreate,manage,govern, use,andderivevaluefromitsdata.

DataStewardship Stewardingdataisawayofinteractingwithdata;specifically,actingwithaccountability fordataforthegoodoftheorganization.Datacanbe“stewarded”atdifferentlevels,usinginformaland formalapproaches.Stewardshipisrequiredofeveryindividualwhocreatesorusesorganizationaldata. Datastewardship isthemostcommonlabeltodescribeaccountabilityandresponsibilityfordataandprocessesthatensureeffectivecontrolanduseofdataassets.(Sources:DMBOK2;Seiner,2014.)

DataStore Agenericnameforadatabase,datawarehouse,mart,lake,fabricorothersystemthatholdsdata foranalytics,operations,orotherpurposes.

DataStrategy Aplanthatdescribeshowtheorganizationintendstogetvaluefromitsdata.Adatastrategy requiresthattheorganizationhasthedataitneedstosupportbusinessgoalsandthattheorganizationcan usethedataeffectively.Adatastrategymustaccountforthedataitself(howtheorganizationwillobtain orcreatethedataitneedsandhowthisdatawillbemanagedforvalueoveritslifecycle)andfortheabilityoftheorganizationtousethedata(accessibilityandtooling,metadata,andotherexplicitknowledge requiredforuse,skills,knowledge,andexperience—dataliteracy—ofdataconsumers).

DataSupplyChain Thesetofprocessesthroughwhichdataiscreatedanddistributedwithinanorganization oramongorganizations.

DataValidation Aprocessofexecutingtestsagainstdatatodeterminewhetheritisusableornot.Validation canincludearangeoffunctions,fromcheckingreceiveddataagainstcontrolfilestointerrogatingspecific fieldsandrules.Differentactionscanbedefinedbasedontheresultsofthevalidationtests.

DataValuation Dataassetvaluationistheprocessofunderstandingandcalculatingtheeconomicvalueof datatoanorganization.

DataWarehouse Anintegrated,centralizeddecisionsupportdatabaseandtherelatedsoftwareprogramsused tocollect,cleanse,transform,andstoredatafromavarietyofoperationalsourcestosupportbusinessintelligence.Adatawarehousemayalsoincludedependentdatamarts.(Source:DAMA,2011.)

DefaultValue Avaluepopulatedinacolumnorfieldtoindicatethatameaningfulvalueisnotavailableor doesnotpertaintotheattributeinthecontextoftherecord.

DerivedData Datacreatedwithinasystemfromotherdata,forexample,throughacomplextransformation ruleorcalculation.

DesignedData Datathathasbeencreatedthroughprocessesthataccountforitsqualityinrelationtoits potentialuses.Designeddataincludescharacteristicsthathelpenableitsuse;forexample,itissupported byhigh-qualitymetadata,anditmeetsidentifiedstandardsforinteroperability.

DimensionofQuality See DataQualityDimension.

DistributionofValues Asynonymfora columnprofileofvalues.Itincludesboththecountofeachdistinct valueinacolumnandthepercentageofrecordsassociatedwiththatvalue.Inmanycases,adistribution ofvaluesprovidesaquickwayofassessingthereasonabilityofdatacontent.Adistributionforadataset orincrementcanbecomparedwithotherdatasets(suchasabenchmarkorapreviousinstanceofthe samedataset)toidentifywhethertherearedifferencesthatmayindicateaproblemwiththedata.

Edit Acontrolondatainput.Editsconstrainwhatdatacanbeinputtoasystem.Someeditsaresimple(e.g., usingadrop-downlistofoptionstopreventinganinvalidvaluefrombeingenteredinafield).Othersare canbemorecomplex(e.g.,preventingarecordfrombeinginsertedifaninstanceoftherecordalready existsinthesystem).

EnterpriseDataModel(EDM) Aholistic,enterprise-level,implementation-independentconceptualorlogicaldatamodelthatprovidesacommonconsistentviewofdataacrosstheenterprise.Itiscommontouse thetermtomeanahigh-level,simplifieddatamodel,butthisisaquestionofabstractionforpresentation. AnEDMincludeskeyenterprisedataentities(i.e.,businessconcepts),theirrelationships,criticalguiding businessrules,andsomecriticalattributes.Itsetsforththefoundationforalldataanddata-relatedprojects.(Source:DMBOK2.)

Entity Intheprocessofentityresolution,anentityis“areal-worldperson,place,orthingthathasaunique identitythatdistinguishesitfromallotherentitiesofthesametype”(Talburt,2011,p.205).Intheprocess ofmodeling,anentityisaconceptbeingmodeledandissometimesusedasasynonymfor table.

Extract,Transform,Load(ETL) Theprocessofpullingdatafromasourcesystem,preparingitforuseina targetsystem,andthenloadingittothattargetsystem.ETListhestandardprocessforpopulatingdatain datawarehousesandmarts.InBigDataenvironments,anELT(extract,load,transform)processissometimesadopted.ELTspeedsuptheprocessofmakingdataavailablebyremovingtheneedtotransform datatoloadit.

Field See Column

File Asetofdatathathasnotbeenputintoadatastructure(e.g.,atable).

ForeignKeyRelationship Aforeignkeyinatableisareferencebacktoanothertable.Forexample,adiagnosiscodeinaclaimtablewilljointothediagnosiscodetableandconnectwithdescriptionsandother informationaboutthemeaningofthecode.

FormatConformity Adimensionofdataqualitythatanswersthequestionofwhetherdataisinthe expectedform,usuallydefinedviadatatypeandlength.Formatconformityreferstotheadherenceof datatoadefinedphysicalformat.Forexample,USZipcodesarefivedigitslong.AZipcodethatcontainsfewerormorethanfivedigitsdoesnotconformtotheexpectedlengthofaZipcode.AUSZip codethatcontainslettersdoesnotconformtotheexpectedformatofaZipcode.Somedataelements havestrictformatrequirementsthatareenforcedviadatatypeconstraint(e.g.,dates).Othershavevery broadrequirementsthatmaynotbeworthmeasuring(e.g.,firstnamemustbemadeupofletters,but canbeofalmostanylength).

FrequencyDistribution See DistributionofValues

GeneralDataProtectionRegulation(GDRP) AEuropeanUniondataprotectionlegislationimplementedin 2018.

HealthInsurancePortabilityandAccountabilityAct(HIPAA) LegislationenactedbytheUSfederalgovernmentin1996;containsprovisionsthataffecttheuseofprotectedhealthinformation.

Integrity Adimensionofdataqualitythatanswersthequestion:Dothepiecesofadatasetrelatetoeach other(fittogether)inexpectedways?Integrityreferstothestateofbeingwholeandundividedortheconditionofbeingunified.Integrityisthedegreetowhichdataconformtodatarelationshiprules(usuallyas definedbythedatamodel)thatareintendedtoensurethecomplete,consistent,andvalidpresentationof datarepresentingthesameconcepts.Integrityrepresentstheinternalconsistencyofadataset.

IssueManagement See DataQualityIssueManagement.

Management Theprocessofcontrollingpeople,activities,orthingstomeetasetofgoals.

MasterData Masterdataobjectsarecorebusinessobjectsusedindifferentapplicationsacrossanorganization,alongwiththeirassociatedmetadata,attributes,definitions,roles,connections,andtaxonomies. Masterdataobjectsrepresentthe“things”thatmattermosttoanorganization—thosethatareloggedin transactions,reportedon,measured,andanalyzed.(Source:Loshin,2008.)

Metadata Explicit(i.e.,documented)knowledge aboutdatathatenablesdatatobecreated,understood,andused. Metadataisrequiredfortheuseofdata.Theabsenceofmetadataisadataqualityissuebecauselackofinformationaboutdataisanobstacletothe useofdata.Awiderangeofinformationisincludedundertheumbrella ofmetadata.Metadataincludesinformationabouttechnicalandbusinessprocesses,datarulesandconstraints, andlogicalandphysicaldatastructures.Itdescribesthedataitself(e.g.,databases,dataelements, datamodels), theconceptsthedatarepresents(e.g.,businessprocesses,applicationssystems,softwarecode,technologyinfrastructure),andtheconnections/relationshipsbetweendataand concepts.(Source:DMBOK2.)

MetadataAsset Asetofmetadatathatisusedforparticularpurposesthatbringvaluetotheorganization.For example,adatadictionaryforaparticularsystemisametadataasset.Metadataassetsaresetsofmetadata that,takenalltogether,constitutethemetadataassetsoftheorganization.

MetadataManagement Aspecializedformofdatamanagementthatfocusesonmanagingmetadatathroughoutitslifecycle.Metadatamanagementencompassesplanning,implementation,andcontrolactivitiesto enableaccesstohigh-quality,integratedmetadata.Metadatamanagementactivitiesfocusonensuringthat high-qualitymetadataiswidelyaccessiblethroughouttheenterprise.(Source:DMBOK2.)

MetadataModel Adatamodelthatdefinestheattributesofandrelationshipsamongdifferenttypesofmetadataassets(e.g.,businessglossary,catalog,datadictionary,dataqualitystandards,dataprocessingmetadata).Aswithotherdatamodels,itdescribestheattributesassociatedwitheachentity.Forexample, metadataattributesforeachtableinadatabaseincludethingslike:TableBusinessName,TablePhysical Name,TableSubjectArea,Originatingsystemforthetable,andsoon.Itcanbeusedtodefinerequirementsandtosupportoverallmetadatamanagement.

MIN/MAX Inacolumnprofile,theminimum(lowest)andmaximum(highest)valuesinadistributionof values,basedeithernumerically,alphabetically,orthroughacombination.

Normalization Theprocessofstructuringrelationaldatasoitfollowsthestandardsfornormalformas definedbyE.F.Codd.Inlesstechnicallanguage,itmeansreducingmultipleinstancesofanattributeto oneinstancesothereislessredundancyandoneandonlyonesourceforeachvalue.

OperationalMetadata OperationalMetadatadescribesdetailsoftheprocessingandaccessingofdata.

Optionality Aspartofdatamodeling,definingwhetherthepopulationofacolumnismandatory/requiredor optional(populatedonlyunderparticularcircumstances).

OrganizationalCulture “Thepatternofbeliefs,valuesandlearnedwaysofcopingwithexperiencethathave developedduringthecourseofanorganization’shistory,andwhichtendtobemanifestedinitsmaterial arrangementsandinthebehaviorsofitsmembers”(Brown,1998).Moresimplyput,culturedescribesthe waypeopleworkwithinanorganization,withsomereflectiononwhytheyworkthewaytheydo.Thecultureofanorganizationmaybedescribedthroughcharacteristicssuchastheattitudetowardworkitself, leadershipstyle,andsoon.

OrganizationalData Datacreatedandcapturedaspartoftheprocessofexchanginggoodsandservices. Commercialorganizations,governments,educationalinstitutions,andnon-profitsallcreatedatathat

reflectstheoperationalpracticesofthoseorganizationsaswellasthepeopleandotherorganizationswith whichagivenorganizationinteracts.

OriginatingSourceSystem Asystemthatcreatesdata.Usedtosignifywhereaparticulardataset,orevena singledataelement,wasfirstcreated.Thetermisusedtodescribetheroleasystemplaysinthedatalife cycleforaparticularsubsetofdata.Systemscanplaymultipleroles.

Parent-ChildRelationship Inareferentialrelationship,theentitythatmust“comefirst”isreferredtoas the parententity.Theentitythatdependsonthefirstentityisthe childentity.Forexample,foranorganizationtosellproducts,itmustfirsthavealistingofitsproducts.“Product”wouldbeaparententityto “Sales.”Theuseof parent/child asametaphorfortheserelationshipsallowsforotherassertions;for example,orphanrecordsarerecordsonachildtableforwhichreferencesarenotpresentontheparent table.

PersonalInformationProtectionandElectronicDocumentsAct(PIPEDA) CanadianPrivacyprotection legislationpassedin2000.

Precision Adimensionofqualitythatdescribesthedegreetowhichdatameetsarequirementforexactitude.

Process Aseriesofstepsthatturnsinputsintooutputs.

Product Theoutputfromaprocess.

Reasonability Adimensionofdataqualitythatanswersthequestion:Doesthisdataconformtogeneral expectationsbasedonthepopulationitrepresents?

ReferenceData Referencedata,sometimesreferredtoas look-updata or codeanddescriptiondata,associatescodifieddatavalueswiththeirmeanings.Itisused“tocharacterizeotherdatainanorganizationorto relatedatatoinformationbeyondtheboundariesofanorganization”(Chisholm,2001).Referencedatais criticaltotheuseofotherdata.Withoutreferencedata,otherdataisoftenincomprehensible.

ReferentialIntegrity(RI) Thedegreetowhichdataintwoormoretablesrelatedthroughaforeignkeyrelationshipiscomplete.Referentialintegrityisoftenexplainedthroughparent/childtablerelationships. Withinadatabase,allvaluesthatarepresentinachildtableshouldalsobepresentinitsparenttable.Ifa childtablehasvaluesthatarenotinitsparenttable(orphanvalues),itdoesnothavereferentialintegrity withtheparenttable.Theconceptofreferentialintegritycanalsobeappliedattherecordlevel.

Relevance Adimensionofqualitythatanswersthequestion:Willthisdatameetmyneeds?Relevancemeasurestheapplicabilityofthedatatothegoalsofdataconsumers.

Risk Thepossibilitythatsomethingunpleasantorunwelcomewillhappen.Risktodataisthepossibilitythat somethingwillnegativelyaffectitsqualityandmakeitlessfitforuseorthatitwillbemisused,either intentionallyorunintentionally.

RiskManagement Usingmanagerialresourcestointegrateriskidentification,riskassessment,riskprioritization,developmentofrisk-handlingstrategies,andmitigationofrisktoacceptablelevels.(Source:ASQ.)

RootCause Therootcauseofaproblemisthefundamentalreasontheproblemexists.

ServiceLevelAgreement(SLA) Aformalcommitmentbetweenateamthatprovidesservicesandtheusers ofthoseservicestomeetspecificlevelsofperformance,forexample,toprovideserviceswithinadefined timeframe.ProductionsupportteamsforITsystemsoftenhaveSLAstodefineexpecteddeliveryofdata andavailabilityofthesystem.

SingleSourceofTruth Atermusedtodescribeanaspirationalgoalofsomedatamanagementprocesses:to establishonesystemordatasetthatwillcontainthehighest-qualitydataintheorganizationandthatalldata consumerswillusethisastheirsourceofdata.Theconceptofasinglesourceoftruthisalsocontrastedwith theideaof“multipleversionsofthetruth”—multipledatasetsthatmayrepresentthesamepeople,objects, orevents,butthatmayhavevariationbasedonhowdataiscollected,structured,processed,ormaintained.

SourceSystem Anapplicationordatabasefromwhichaperson,process,orothersystemobtainsdata.The datamayoriginateinthesourcesystem,oritmaysimplybestoredthereforusage.

Source-to-TargetMapping(STTM) See DataMappingSpecifications

Standard Somethingconsideredbyanauthorityorbygeneralconsentasabasisofcomparison;anapproved model.Oritisaruleorprinciplethatisusedasabasisforjudgment.Standardsembodyexpectationsina formalmanner.Tostandardizesomethingmeanstocauseittoconformtoastandardortochooseorestablishastandardforsomething.ASQdefines standard as“themetric,specification,gauge,statement,category,segment,grouping,behavior,eventorphysicalproductsampleagainstwhichtheoutputsofa processarecomparedanddeclaredacceptableorunacceptable.”(Source:ASQ.)

Standardization Tostandardizedataistomakeitconformtoastandard,howeverthatstandardisdefined (Note:Indatamanagement, standardization isdifferentfrom normalization.)

StructuredData Datathatisdefinedthroughadatamodel.

System Aninterconnectedsetofelementsthatiscoherentlyorganizedtoachieveapurpose.(Source: Meadows,2008).

System/Application Aninformationtechnology(IT)systemisanapplicationdesignedtoenableusersof thesystemtoexecuteprocessesandmeetgoals,whetherthesearebusinessoriented(sellproducts)or haveotherends(playagame).Informationsystemsi ncludehardware,software,peripheralequipment, anddata.

SystemConstraint Aconditioninasystemthatpreventsorminimizesactionsthatthesystemcanperform.

SystemControl Ameansofprovidingfeedbackaboutautomatedactivitiescarriedoutbyanautomatedprocessorsystem.Controlsmaybeputinplacetostopaprocessorsendanalertifconditionsofthecontrol aremet.See ABCControls

SystemofRecord Asystemthatisdesignatedastheplacewherethebestcopyofadatasetorevenasingle dataelementexists.

Table Adata-basedstructurecomprisingrowsandcolumnsthatorganizedataaboutadefinedentity.

TargetSystem Asystemorapplicationtowhichdataisbroughtinordertobestoredandused.Forexample, adatawarehouseisatargetsystemthatispopulatedwithdatafromdisparateapplicationsanddatabases.

TechnicalDataSkills Theabilitytoquerydata,organizeit,aggregateit,andpresentitforpurposesofcommunicatingaboutthedata.

TechnicalMetadata Technicalmetadataprovidesinformationaboutthetechnicaldetailsofdata,thesystems thatstoredata,andtheprocessesthatmoveitwithinandamongsystems.

TechnologyStrategy Atechnologystrategydescribeshowtheorganizationwillimproveandleveragetechnologyinsupportofbusinessgoals.Itwillincludethefuturestatetechnicalarchitecturetheorganization intendstobuild,howtheorganizationwillmakedecisionsabouttechnology,andwhatstrategicdrivers willinfluencethesedecisions.

Threshold See Tolerance.

Timeliness Adimensionofquality.Thedegreetowhichdataisavailableforusewithinarequiredtimeframe.

Tolerance Thelevelofruleviolation,error,orunexpectedconditionthatisacceptable.Maybeexpressedas arawnumberorasapercentageofrecordsorvariancefromasetnumber.Maybecalculatedindifferent ways,dependingonwhatisbeingmeasuredandhowmeasurementsareexecuted.

TraditionalData Withtheemergenceoftheterm BigData,theterm traditionaldata hasemergedtoreferto datacreatedinoldertechnology.Itisworthnotingthateventraditionaldatacanbe“big.”

TransformationRule Atransformationruledescribeshowdatashouldbechangedasitisbroughtfroma sourcesystemintoatargetsystem.Transformationrulescanbesimple(directlymovingavalue;reformattingafield)orcomplex(combiningasetofrecordstoasinglerecord).

Trustworthiness Adimensionofdataqualitythatmeasurestheperceivedreasonabilityofthedataitself,confidenceinthesourceofthedata,andthereliabilityofthesystemsinwhichitiscreatedandmanaged.

Uniqueness/non-duplication Adimensionofquality;referstothedegreetowhichredundantdataispresent withinasystemordatastructure.Uniquenesscanbeunderstoodattheentityinstancelevel(Doesthemasterdatacontainmultiplerepresentationsofthesamecustomer?),attherecordlevel(Doesthesalesdata

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.