Foreword
WhyPROFESSIONALDataQualityManagementIssoImportant
Haveyoueverwonderediffinanceprofessionalsactuallylikefinance?Ican’tfigureitout.On theonehand,itseemslikeacoolprofession.Ontheotherhand,headsoffinancegroupsreport thattheirteamsspendthree-quartersoftheirtimeonmundanedatatasks,notfinance.Somaybe not.
Haveyoueverwonderedifdatascientistsliketoseetheirclever,newAI-drivenmodelssucceedinhelpingsolvesomeoftheircompanies’mostdifficultproblems?Ontheonehand, Harvard BusinessReview advisesthatdatascienceisthesexiestjobofthe21stcentury,andsolvingtough problemsseemsprettysexytome.Ontheotherhand,datascientistsarewellawareof“garbagein, garbageout”andthatpoordataqualitykeepstheirmodelsinthelab.1 Somaybenot.
Haveyoueverwonderedifseniormanagersenjoymakingtimely,inspireddecisions?That seemsliketheirjob.Butmostagreethattheymustdosointhefaceofnumbersthatareclearly wronganddisparatereports.Thatseemsmiserabletome,yettheydon’ttakestepstoresolvethe issues.Somaybenot.
NotethatIwouldgetthesameanswersifIaskedequivalentquestionsaboutpracticallyany profession.Itseemsasifeveryone’sjobhastwocomponents:Theirjobanddealingwiththemundanedataissuesthatslowthemdownandmakethemlesseffective.
Weallexperiencetheimpactofpoordataquality.Andmostofushavegrownwaytootolerant oftheproblemandthelittleindignitiesitaddstoourlives.
Inasimilarvein,haveyoueverwonderedifcompaniesreallywanttomakemoney?Everyone Italktoinsiststhatthisisthecase.Yetpoordatameansallemployeesmustspend/wasteincredible amountsoftimedealingwithdataissues.Peopledotheirbest,butplentyoferrorsleakthroughto customers,causingallsortsofproblems.Baddatabreedsmistrustamongteams,departmentsand customers.Itmakesitmoredifficulttounleashdata’stransformativepowers—attoday’slevelsof dataquality,AIisdownrightscary!Agoodfirstestimateistheassociatedcostscometo20%of revenue. 2 Attackingdataqualityissuesmaybethesinglebeststepacompanycantaketoimprove profitsandbuilditsfuture.Somaybecompaniesaren’tallthatinterestedinmakingmoney.
Onthebroadcanvass,poor-qualitydatathreatensdemocracies,getspeoplekilled,andthrottles muchinnovation.
Owingtotheimpact,scope,andubiquityofdataqualityissues,we’reallgoingtohavetoget involvedinmakingimprovements.Andifthegeneralindifferencetotheissuesisn’tenough,other factorscomplicatemattersstillfurther.Considerthefollowingscenario.Youreceiveaphonecall onyourcommutehome.It’stheprincipalofyourteenager’sschool.Theywereinvolvedinafight andbeingsuspendedforthenextweek.
Youarrivehomeandaskyourteenager,“Howwasyourday?” “Great,”theyreply.“IgotanA onmyChinesetest.”
1NithyaSambasivan,ShivaniKapania,HannahHighfill,DianaAkrong,PraveenKumarParitosh,LoraMoisAroyo. SIGCHI,ACM(2021).“Everyonewantstodothemodelwork,notthedatawork:DataCascadesinHigh-StakesAI.” https://research.google/pubs/pub49953/ 2Redman,T.C.(2017).SeizingOpportunityinDataQuality.MITSloanManagementReview.November27,2017. https://sloanreview.mit.edu/article/seizing-opportunity-in-data-quality/
Copyright © 2022ThomasC.Redman.PublishedbyElsevierInc.Allrightsreserved.
xxi
Presumably,yourkidtoldthetruth.Butclearly,theydidnotprovidethehigh-qualitydatayou sought.Dataquality,itseems,ismorethanjust“correctness.”
Thesecondcomplicationisthatmanypeoplemakeaddressingdataqualityseemeasierthanit is.Throughoutmycareer,I’veheardhundredsofversionsofthefollowing,allpromisinganeasy solution:“Weallknowyoudodataqualityasyoumovethedatatothewarehouse.”“Intheageof bigdata,thesheerquantitiesofdatameanyoudon’thavetoworryaboutquality.”“Ourdatacatalogwillsolveallyourdataqualityproblems.”“Datagovernancewillsolvedataqualitywoes.” “TheproblemswillallmeltawaywhenwestartusingAI.”Yethereweare!
DealingwithallthisrequirestheleadershipofPROFESSIONALdataqualitymanagers;skilled andexperiencedintheapproachesandtechniquesneededtoattacktheissuesproperly;savvy enoughtobuildneededpoliticalsupportandgeteveryoneinvolved;wiseenoughtoknowtheeffort willtakealongtime;impatientenoughtogetonwithit;courageousenoughtorebutthenonsense promotedbyamateurs.
Thereismuchtodoandmuchtolearn.Mytop-shelffeaturesworksbyAT&T,Beer,Covey, BrownandDuguid,Davenport,Deming,Drucker,English,Goldratt,Hay,Juran,Kent,Kotter, Ladley,Laney,McGilvray,Porter,theearlyworkofmygroupatBellLabs,Roberts,Silver,Tufte, theearlyworkofWang’sgroupatMIT,Zachman,andothers.IamaddingLauraSebastianColemantothatshelf.Thisbookcoversthehardstuff,fromin-depthdiscussiononwhydataqualitymatters,settingstrategy,whymanagingdataqualityatscaleissodifficult,tothebrutalpolitics associatedwithdata.ItistheonlyoneaimedspecificallyatPROFESSIONALDATAQUALITY MANAGERS.So,ifyouareoneofthose,orhopetobecomeone,starthere.
Professionaldataqualitymanagerisabigjob.Itismostcertainlynotthe“sexiestjobofthe 21stcentury,”butitisquitepossiblythe“mostneededandimportantjobofthe21stcentury.”
ThomasC.Redman
“theDataDoc”,Rumson,NJ,UnitedStates June2021
Glossary
“Wordsarenotassatisfactoryasweshouldlikethemtobe,but,likeourneighbors,wehavegot tolivewiththemandmustmakethebestandnottheworstofthem.”
ABCControls Audit/balance/control(ABC)processesgovernthemovementofdatawithinandbetweensystems.Alsoreferredtoas systemcontrols or controlreports.Thesecontrolsinvolvecountingorsumming recordsatdifferentpointsduringdataprocessingtoensurethatdatahasnotbeenlostor“dropped”aspart ofprocessing.Theseworkontraditionalformats(e.g.,tables,files)inwhichthedataitselfisstatic,rather thandynamic.
AccessMetadata Atypeofoperationalmetadatathatdescribeswhatpeople,processes,andsystemsaccessed whichdatasetsatwhattime,andforwhatdata.ItincludesdetailslikeaccessIDs,SQLcode,andexecutiontimeofqueries.Itcanbeminedforaccessandusagepatterns.
Accessibility Adimensionofqualitythatmeasuresthedegreetowhichdataconsumerscanobtaindatafrom anappropriatesourceandinaformthatallowsthemtouseit.
Accountability Tobeaccountablemeanstobeanswerable,literally“liabletobecalledtoaccount”(Online EtymologicalDictionary).Accountabilitymeansapersonisexpectedtoactinadefinedmannerandmustbe abletoexplaintheiractions.Tohaveaccountabilityfordatameanshavingoverallresponsibilityforadata setanddefiningclearresponsibilitiesforthosethatproducethedata;peopleaccountablefordataalsoensure thatdataisinitsexpectedconditionandareabletoexplaintheconditionofdataundertheircontrol.
Accuracy Adimensionofquality.Thedegreetowhichdatacorrectlyrepresentsattributesofareal-world object,concept,orevent.
Asset Ausefulorvaluablething;akindofpropertyavailabletomeetdebtsandothercommitments.
Attribute Inlogicaldatamodeling,anattributerepresentsacharacteristicofanentity.Becauseofthisuse, attribute issometimesunderstoodasadataelement(apieceofdatausedtorepresentacharacteristicof anentity),afield(partofasystemusedtodisplayorintakedata),oracolumn(aplaceinatabletostore adefinedcharacteristicofarepresentedentity,i.e.,tostorevaluesassociatedwithdataelements).
AuthorizedDataSource Asystemorprocessthathasbeendesignatedbyadatagovernanceteamasthe placefromwhichtoobtaindataforspecificuses.Forexample,thecustomerrelationshipmanagementsystemmaybedesignatedastheauthorizeddatasourceforobtainingdataforemailcampaigns.Synonyms include: authoritativedatasource and systemofrecord.
BaselAccords Avoluntaryregulatoryframeworkforthefinancialservicesindustrycreatedinresponsetothe globalfinancialcrisisof2008.Theaccordsprovideameanstotestbankcapitaladequacy,stresstesting, andmarketliquidityrisk.Theyhaveimportanteffectsonhowfinancialinstitutionshandletheirdata.
BaselineMeasurement Abaselinemeasurementistakentoserveasapointofcomparisonforsubsequent measurement.ASQdefinesitas“Thebeginningpoint,basedonanevaluationofoutputoveraperiodof time,usedtodeterminetheprocessparameterspriortoanyimprovementeffort;thebasisagainstwhich changeismeasured.”(Source:ASQ.)
BigData Datacharacterizedbylargevolume(largeamountsofdatacreated),variety(arangeofformats), andvelocity(datacreatedandmadeaccessibleveryquickly).Forprocessingandstorage,BigData requiresdifferenttechnologiesthandoestraditionaldata.
SamuelButler
BusinessGlossary Asystemusedtodocumentandmanagetermsanddefinitionsneededtounderstandan organization’sbusinessconceptsandtodescribet herelationshipamongconcepts.Inacomplexorganization,businessglossariesareneededtoensurethatpeopleunderstandeachother.Businessglossariesplayacriticalroleindatamanagementbecause theyenabledataconsumerstomakeconnections attheconceptualorlogicallevelamongdataelementsthatmayhavedifferentnamesanddifferent structuresindifferentsystems.Abusinessglossaryisintendedtosupplysystem-agnosticinformation relatedtodata(asopposedtoadatadictionary,whi chdescribesdatathathasbeeninstantiatedina system).
BusinessMetadata Businessmetadatafocuseslargelyonthecontentandconditionofthedataandincludes detailsrelatedtodatagovernance.Businessmetadataincludesthenon-technicalnamesanddefinitionsof concepts,subjectareas,entities,andattributes;attributedatatypesandotherattributeproperties;range descriptions;calculations;algorithmsandbusinessrules;andvaliddomainvaluesandtheirdefinitions.
(Source:DMBOK2.)
BusinessProcessOwner Apersonwhoisaccountableforexecutionofabusinessprocessthatproducesdata. Ideallythebusinessprocessownershouldbeaccountableforthequalityoftheoutputsfromtheprocess, oneofwhichisdata.
BusinessRule Adefinedconstraintonabusinessprocess.Businessrulesdescribewhatmusthappenorwhat cannothappenwithinaprocess.Businessrulescanbeusedasrequirementsfortechnicalprocessesorto establishdataqualityexpectations.
BusinessStrategy Aplanforanorganizationtoachieveitsgoals.Thisincludesaligningthepeople,processes,technology,andinformationrequiredtomeetthesegoals.
BusinessTerminology Thewordsanorganizationusestodescribeitselfanditsprocesses.Theconceptof businessterminologyrecognizesthateachorganizationhasitsownvocabularyforitsbusinessprocesses andthatthereisoftenadifferencebetweenhowconceptsaredefinedinbusinessprocessesandhowthey arenamedwhentheyaredefinedasdatainsystems.Therearealsodifferencesinnamingconventions amongsystems.Businessterminologyprovidesameansbywhichdatacanbeunderstoodinrelationto businessprocessesandinrelationtootherdata.Indoingso,ithelpspeoplerecognizewheninstancesof thesamedataarenameddifferentlyaswellaswheninstancesofdifferentdatahavethesamename.
CaliforniaConsumerPrivacyAct(CCPA) Dataprivacyandprotectionlegislationenactedbythestateof Californiain2018.
CertifiedData Datathathaspassedcertificationrequirements.Theserequirementsmayincludearangeof criteria,suchas:dataispopulatedcompletely,datareconcilestoadefinedstandard,datahaspassedall dataqualityrules,dataissupportedbymetadata.Criteriawilldiffer,dependingonthedataset,butcertificationrequirementsmustclearlystatewhatthecriteriaareforadataset.
Clarity Adimensionofquality.Thedegreetowhichdatavaluesareunambiguous,distinct,non-overlapping, andcomprehensible.
CodifiedData Informationthatisrepresentedbycodevalues.Tocodifyathingistoreduceittoacode.For example,anICD-10codeprovidesashort-handwaytorefertoamedicaldiagnosis.AUSZipcoderefers toanareawheremailisdelivered.
Column Column/field/attribute.Acolumnisacomponentpartofatableinadatabase.Tablesaremadeupof rows,eachrepresentingoneinstanceofanentityrepresentedbythetable,andcolumns,containingcharacteristicsoftherepresentedentity. Field and attribute areusuallyunderstoodassynonymsfor columns becausetheyalsocontaincharacteristicsofarepresentedentity.Butcolumnsarespecifictotables.
ColumnProfile See DistributionofValues. Commerce-basedOrganizationalData See OrganizationalData
Completeness Adimensionofqualitythatanswersthequestion:DoIhaveallofthedataIneedorexpectto have?Completenessdescribesthedegreetowhichadatasetcontainsallrequireddata.Itcanbe
understoodatthedatasetlevel(Doesthedatasetcoverthepopulationrequired?);therecordlevel(Does thedatasetcontainallofthefieldsneededfortheanalysisofthepopulation?);thefieldlevel(Areall mandatoryfieldspopulated?Areoptionalfieldspopulatedaccordingtopopulationrules?).Completeness canalsobeseenthroughthelensofotherrequirements:allthedataforapopulation,fromasystem,fora timeperiod.
Consistency Adimensionofqualitythatanswersthequestion:Doesthedataconformtopatternsdefined throughotherinstancesofthesamedataset?Thedegreetowhichdataexhibitsexpectedpatterns.
ConsumerDataRights(CDR) DataprivacyandprotectionlegislationenactedbythestateoftheFederal AustralianGovernmentin2019.
Control Ameansofprovidingfeedbackwithinasystem.Adatacontrolcanbesetupasbinary;thedata eithermeetstheconditionsofthecontrol,oritdoesnot.Ifthedatadoesnotmeettheconditionsofthe control,thenthefeedbacktothesystemistostoptheprocess.Acontrolcanalsoprovidefeedbackthrough atolerancelevelorthreshold(e.g., x%ofdatamettheconditionofthecontrol;100% x%didnot). (Source:Sebastian-Coleman,2013.)
Correctness Adimensionofqualitythatanswersthequestion:Doesthisdataaccuratelyrepresentwhatitpurportstorepresent?Correctnesscanbeunderstoodintermsofformat(Isthedataformattedcorrectly?),definition(Doesthedefinitionaccuratelydescribethedata?),ordatavalues(Isthevalueinthecolumnthe rightvalueforthisentityinstance?).Correctnessisasynonymfor accuracy
CostofPoorQuality(COPQ) Thecostassociatedwithprovidingpoor-qualityproductsorservices.There arefourcategories:internalfailurecosts(costsassociatedwithdefectsfoundbeforethecustomerreceives theproductorservice),externalfailurecosts(costsassociatedwithdefectsfoundafterthecustomer receivestheproductorservice),appraisalcosts(costsincurredtodeterminethedegreeofconformanceto qualityrequirements),andpreventioncosts(costsincurredtokeepfailureandappraisalcoststoaminimum).(Source:ASQ.)
Cost-BenefitAnalysis(CBA) Acomparisonofthecostsofimplementingachangeandtheexpectedbenefits ofimplementingthechange.Alsocalled benefit-costanalysis.ACBAaccountsforhardnumbers(direct measurablecosts)andsoftnumbers(expectedeffectsonpeople’sbehaviorandattitudes).Dataquality improvementeffortsshouldincludeaCBA.
CriticalDataElement(CDE) Datathatisrequiredbythebusinesstoexecuteisprocessesandmustbeof highqualityforthoseprocessestoexecutesuccessfully.Criticaldataincludesdatarequiredtoservecustomers;meetstrategicgoals;ensurecompliancewithlaws,regulations,andcontractualobligations;and measureitsownsuccess,CDEsarethefocusofdataqualitymonitoringandimprovementefforts.(Source: Jugulum,2014.)
CriticalDataElementsandRelationships(CDERs) The R addedtoCDEemphasizesthefactthatmany dataqualitychallengesarenotlimitedtoisolateddataelementsbuttotherelationshipamongcomponent piecesofdata.
Currency Adimensionofquality.Thedegreetowhichdataismaintainedandvaluescorrectlyreflecttheir real-worldcounterpartswithinagiventimeframe.
Data Ameansofencodingandsharingknowledgeandinformationabouttherealworld.Dataistherepresentationofselectedcharacteristicsofobjects,events,andconcepts,expressedandunderstoodthroughexplicitlydefinedconventionsrelatedtotheirmeaning,collection,andstorage.Sincetheintroductionofthe computerinthemid-20thcentury,theword data hasalsobeenusedtorefertoanyinformationcaptured inorprocessedbyacomputerorotherinformationtechnologysystem.
DataasanAsset Anorganization’sdataisanassetinthatitcanbeusedbytheorganizationtocreatevalue. Dataisoftencomparedtootherorganizationalassets:people,equipment,money,andintellectualproperty. Whentalkedaboutasanasset,dataisconsideredabroadcategory.Thetotalityofanorganization’sdata isanassettothatorganization.(Source:DMBOK2.)
DataasData Anunderstandingofhowdatafunctionsinasemioticsystem;howitiscreatedorcollected, structured,andorganizedforuse;andhowitmaychangeovertimeandhavedifferentusesindifferent circumstances(e.g.,knowledgeofthedatalifecycle).
DataAsset Therecognitionofdataasanassethasledtotheideaofadataasset.Adataassetitasetofdata thatbringsvaluetotheorganizationinparticularwaysthatcanberecognizedasdistinctfromthoseof otherdatasets.Adataassetcanbelargeorsmall,formallyorinformallydefined.Theabilitytodistinguishindividualdataassetswithintheentiretyofanorganization’sdataisusefulbecausedifferentdata setshavedifferentvaluetotheorganization(justasdifferentfinancialinstrumentsorinvestmentsmay havedifferentvaluetotheorganization).Beingabletoidentifyandrecognizethevalueofindividualdata setshelpsanorganizationprioritizeworkrelatedtomanagingdifferentdatasets.Forexample,itmaybe farmoreimportanttogetcustomerdatacorrectthantogetvendordatacorrect.Takentogether,theseindividualdataassetscomprisetheoveralldataassetsoftheorganization.
DataCatalog Alistorinventoryofobjectsinadatabaseorothersystem/platform,withbasicmetadataabout them(e.g.,name,definition,origin,businesspurpose,subjectarea,format,categories,tagsofdata,and otherattributesaboutthedataobjectthatmaybeimportanttoitsusebypeople,processes,orsystems).A datacatalogcancontainreferencestodataatdifferentobjectlevels(e.g.,files,tables,andviews),anditis organizedinawaythatallowsittobeassociatedwithfinerlevelsofgrains(e.g.,fields,relationships). Datacatalogssupportdatadiscovery.Theyenablepeopletobringtogethersetsofdatafordifferentpurposes(e.g.,viataggingorcategorization).Datacatalogscanbecreatedmanuallyorgeneratedbyrunning aprogramagainstadatabase,orthroughacombinationofboth.
DataCertification Theprocessofmeasuringdataagainstdefinedcertificationcriteria( certificationrequirements)anddeterminingthedegreetowhichitadherestothecriteria.Criteriamayincludestandardsfor dataquality,conditionsformanagement,levelofsecurity/protection,qualityofmetadataandothersupportinginformation,oracombinationofthesethings.Standardsforcertificationmaybebasedon requirementsexternaltotheorganizationthatwantsitsdatacertified(e.g.,datais“certified”ifitmeets requirementsforBASELorSOX),orstandardsmaybeinternallydefined(CriticalDataElementsare certifiedbasedonbusinessdefinedrequirements).Differentlevelsofdatamaybecertified(e.g.,system, dataset,datarelationship,dataelement).Datacertificationisnotaone-timeevent.Afterinitialcertification,datamustbeperiodicallyre-auditedforcertif icationrequirements.Differentstandardsforcertificationmayalsobedefinedandmeasured(e.g.,“Gold” standardcertificationmeansdatahaspassedall requirements;“Silver”maymeanithaspassed95% ofrequirements;“Bronze”maymeanithaspassed 90%,andsoon.)
DataChain See DataSupplyChain.
DataConsumer Anyperson,process,orsystemthatusesdata.Thetermisusedtodistinguishbetweendata producers(whocreatedata)andthosewhousedata.Adataconsumerforoneprocessmaybeadataproducerforanotherprocess.Synonymsinclude datacustomer,enduser,and targetsystem
DataDemocratization Theprocessofenablingawidergroupofpeopletoaccessandusedata.Itinvolves removingobstaclestodatausageandencouragingmorepeopletobecomedataliterate.ItisalsoassociatedwithreducedrelianceonITresourcestousedata.
DataDictionary Adatadictionarycontainstableandcolumnnamesanddefinitionsalongwithotherinformationthathelpsdataconsumersusedata.Datadictionariesoftencontaindetailsaboutthephysicalstructure ofdata(e.g.,thekeystructureforatableandthedatatypeandfieldlengthsforcolumns).Acomprehensivedatadictionaryforarelationaldatabasewillincludeinformationonstandardjoinsbetweentables.It mayevenincludefiltercriteriaforparticularkindsofreports.Adatadictionaryisusuallysettosupport theuseofaparticularsystem.Seealso BusinessGlossary
DataDiscovery Theexplorationandassessmentofdatathatisconductedtounderstanditsstructure,content, andpotentialforuse.Discoveryoftenincludesdatapreparation.Italsomayusedatavisualization
techniquestoidentifypatternsandoutliers(thusgoingbeyondbasicdataprofiling).Discoverycanbe usedtounderstandwhatdataexistsinasystemandtodeterminewhetherdatacanbeusedforaparticular purpose.Itcanbeconductedagainstanexistingdatastoreoragainstcandidatedataforaproject.Itcanbe focusedonmultiplesourcesandtherelationshipamongthem,onasingledatasource,orevenonanindividualdataset.
DataElement Adataelementisapartofadataset.Itisusuallyunderstoodasacolumnorfield,butadata elementcanincludemultipleattributes(e.g.,“address”withallofitsattributescanbethoughtofasasingledataelement).
DataEnvironment Abroadtermforthecollectionoffactorsthatinfluencethecreationanduseofdata:the businessprocessesthroughwhichdataisproduced,thetechnicalsystemsthroughwhichitisproducedand stored,thedataitself,metadata(includingdatastandards,definitions,andspecifications),technicalarchitecture(includingdataaccesstools),anddatauses.Allofthesehaveimplicationsforunderstandingdata quality.Thedataenvironmentincludesthepeople,processes,andtechnologyinvolvedincreatingand usingdata.(SeeEnglish,1999;McGilvray,2021.)
DataGovernance Organizationaloversightofdataanddat a-relatedprocesses.TheDataGovernance Institutedescribesitas“asystemofdecisionrightsandaccountabilitiesforinformation-relatedprocesses,executedaccordingtoagreed-uponmodelswhichdescribewhocantakewhatactionswith whatinformation,andwhen,underwhatcircumstances,usingwhatmethods.”(SeeTheData GovernanceInstitute,Datagovernance.com.)
DataGovernanceOrganization Ateamformallyassignedtoimplementandmanageadatagovernance program.
DataGovernanceProgram Asetofprojectsandprocessesputtogethertodefineresponsibilitiesfordata, establishandexecuteprocessesformakingdecisionsaboutdataandensuringthatresponsibilitiesformanagingdataareexecutedconsistently.Datagovernanceprogramsmayincludemultiplecomponentsthat focusonaspectsofdatamanagement,suchasdatapolicy,datasecurity,metadatamanagement,dataqualitymanagement,referencedatamanagement,andmasterdatamanagement.
DataGovernanceStrategy Adatagovernancestrategydescribeshowanorganizationwillimplement,execute,andderivebenefitfromdatagovernancefunctions.Forexample,suchastrategywillstatehowthe organizationwilldefinedecisionrightsandaccountabilitiesfordataaswellasthebehaviorsthattheorganizationwilladoptandenforcetoensurethatdataisusedtobenefittheorganizationandnotusedtothe detrimentoftheorganization.
DataIssue See DataQualityIssue
DataLifeCycle Asetofhigh-levelphasesrelatedtohowdataiscreated,changesovertime,andisdisposed of.Basedontheproductdevelopmentlifecycle,thissetofphaseshasbeendescribeddifferentlybydifferentexperts,butallversionscontaintheideathatdataiscreatedorobtained,stored,used,anddisposedof. Thedatalifecycleiscriticaltodatamanagementbecausetherearedifferentmanagementrequirementsat thedifferentphasesofthelifecycle.Thedatalifecyclediffersfromboththesystemdevelopmentlife cycle(SDLC),whichdescribeshowprojectsareexecuted,andthedatachain,whichdescribeshowdata moveswithinandbetweensystemstomeettheneedsofaparticularorganization.(Source:SebastianColeman,2013.)
DataLineage Datalineageisaformofmetadatathatdescribesthemovementofandchangestodataasit passesthroughsystemsandisadoptedfordifferentuses.Lineagecanbedescribedatdifferentlevelsof detail(process-to-process,system-to-system,table-to-table,columntocolumn).Adocumenteddatachain foranorganizationisaversionofdatalineage.Datalineagereferstoasetofidentifiablepointsthatcan beusedtounderstanddetailsofdatamovementandtransformation(e.g.,transactionalsourcefieldnames, filenames,dataprocessingjobnames,programmingrules,targettablefields).Mostpeoplewhoareconcernedwiththelineageofdatawanttounderstandtwoaspectsofit:thedata’soriginandthewaysin
whichthedatahaschangedsinceitwasoriginallycreated.Changecantakeplacewithinonesystemor amongsystems.(Source:Sebastian-Coleman,2013.)
DataLiteracy Theabilitytoread,understand,interpret,andlearnfromdataindifferentcontextsandtheabilitytocommunicateaboutdatatootherpeople.
DataManagementStrategy Adatamanagementstrategydefineshowtheorganizationwillmanageandsupportthedataitneedsovertime.Datamanagementincludesarangeoffunctionalareas,eachofwhich mayhaveitsownstrategy.Allcomponentpiecesofadatamanagementstrategymustsupportthebusiness strategy.
DataMappingSpecification Adatamappingspecificationdocumentstherulesassociatedwithmovingdata point-to-pointalongthedatasupplychain.Amappingspecificationmaydescribemovementatthefile/ tableleveloratthecolumn/fieldlevel.Alsocalleda source-to-targetmap (STMorSTTM).
DataMart Adatacollectionputtogethertoservespecificpurposes.
DataModel Avisualrepresentationofdatacontentandthe relationshipsbetweendataentitiesandattributes,createdforpurposesofunderstandinghowdatacanbe(oractuallyis)organizedorstructured. Datamodelsincludeentities(understoodastables),at tributes(understoodascolumnscontainingcharacteristicsaboutrepresentedentities),relationshipsbetweenentities,andintegrityrulesalongwithdefinitionsofallofthesepieces.Logicaldatamodelsandphysicaldatamodelshavedifferentattributesand arerelatedtoeachother.Adatamodelcontainsasetofsymbolswithtextlabelsthatattemptsvisually torepresentdatarequirementsascommunicatedtothedatamodeler,foraspecificsetofdatathatcan rangeinsizefromsmall(foraproject)tolarge(foranorganization).Themodelisaformofdocumentationfordatarequirementsanddatadefinitions resultingfromthemodelingprocess.Datamodelsare themainmediumusedtocommunicatedatarequireme ntsfrombusinesstoITandwithinIT,fromanalysts,modelers,andarchitects,todatabasedesignersanddevelopers.(Sources:DAMA,2017; Hoberman2009;Sebastian-Coleman,2013.)
DataModeling Theprocessofdiscovering,analyzing,andscopingdatarequirementsandthenrepresenting andcommunicatingthesedatarequirementsinapreciseformcalledthe datamodel.Datamodelsdepict andenableanorganizationtounderstanditsdataassets.(Source:Hoberman,2009.)
DataMonetization Theprocessofderivingeconomicvaluefrommoney,eitherdirectly(bysellingdataor incorporatingdataintootherproducts)orindirectly(byusingdatatosupporttheexchangeofothergoods andservices).(Source:Laney,2018.)
DataOwner Apersonwhoisaccountableforthequalityofdatainthewidestsenseoftheterm quality. Ownership canbedefinedattheprocess,system,domain,ordatasetlevel.Ownershipmustbedefined notonlyinrelationthescopeofdata,butalsowithinthedatachainandthedatalifecycle.Ownershipof datamaychangeatdifferentpointsinthedatachainoratdifferentphaseswithinthedatalifecycle,or onepersonmaybeaccountablefordatathroughoutthedatachainoracrossthedatalifecycle.
DataPipeline Atermusedtodescribedatacomingintoadatalakeorotherlargeunstructureddatastorage system.
DataProcessingMetadata Atypeofoperationalmetadatacollectedviatheexecutionofprogramsthatmove databetweenandwithinsystems,oftentransformingitaspartofthismovement.Itincludesthehistoryof extractsandresults;detailsofthetiming,size,andsuccessofETLprocesses;datacapturedviaETLlogs andaudit/balance/controlprocesses;anderrorlogs.Thismetadatacanbeminedtoidentifyschedule anomalies,patternsinfilessizes,andtheconsistencyofdatadeliveryandprocessing.Dataprocessing metadataissupportedbymetadatathatdescribesSLArequirements,processingschedules,andsourcesystemcontactinformation.Itcanalsobeusedtoaggregatevolumemetrics.
DataProducer Anyperson,process,orsystemthatcreatesdata.Thetermisusedtodistinguishbetweendata consumers(whousedata)anddatacreators(whomakedata).Adataconsumerforoneprocessmaybea dataproducerforanotherprocess.
DataProfiling Aformofdataanalysisusedtoinspectdataandassessthequalityofdataandmetadata. Usingstatisticaltechniquestodiscoverdatastructureandcontent,profilingenablesanalyststodetermine howcloselydataalignswiththeexpectationsofdataconsumers.Profilingprovidesapictureofdatastructure,content,rules,andrelationshipsbyapplyingstatisticalmethodologiestoreturnasetofstandardcharacteristicsaboutdata:datatypes;fieldlengths;cardinalityofcolumns;granularity;valuesets;format patterns;contentpatterns;impliedrules;cross-columnandcross-filedatarelationships;andcardinalityof theserelationships.Therearebenefitstoprofilingdataindifferentcontexts.Forexample,comprehensive profilingofdataatthebeginningoftheprojectlifecyclecanbeusedtotestassumptionsaboutdatain ordertoreduceprojectrisks.Profilingofdatawithinasystemcanalsobeusedinqualityimprovement effortstoidentifydataissues,improvemetadata,definedataqualitymeasurements,setthresholds,andso on.(Sources:Olson,2003;Sebastian-Coleman,2013.)
DataProfilingResults Thedetaileddatavaluesreturnedwhendataisrunthroughaprofilingprocess.For example,profilingresultsforcolumnsusuallyincludeminimumandmaximumvalues,impliedformatand datatype,alongwithafrequencydistributionofthesetofvalues.
DataProvenance Theoriginofthedata,usuallyunderstoodasthesystemorprocessthroughwhichdataiscreated.
DataQuality Ameasureofthedegreetowhichdatameetstheexpectationsandrequirementsofdataconsumers.Thisideaissometimesexpressedasthedegreetowhichdatais“fitforapurpose.”Theterm data quality isalsosometimesusedtorefertotheactivitiesandtoolsusedtomanagethequalityofdata(See DataQualityManagement.)
DataQualityDimension Acharacteristicofdatathatcanbemeasuredandthroughwhichitsqualitycanbe quantified.Therearemanyframeworksthatdefinedataqualitydimensions.Thereisnoagreed-toset. However,allaccountforsimilarconceptsthathaveacommonsensemeaning.
DataQualityIssue Anyconditionofdatathatpresentsanobstacleorarisktoadataconsumer’suseofthat dataregardlessofwhodiscoveredtheissue,where/whenitwasdiscovered,whatitsrootcausesare,or whattheoptionsareforremediation.
DataQualityIssueManagement Theprocessofremovingorreducingtheimpactofobstaclestotheuseof databydataconsumers(people,processes,andsystems)byidentifying,analyzing,quantifying,prioritizing,andremediatingtherootcausesofobstaclestodataconsumers’usesofdata.(Source:SebastianColeman,2013.)
DataQualityManagement Asetofactivitiesintendedtoensurethatdataisfitforthepurposesofdataconsumers;itincludesthecoreactivitiesrequiredtoassess,measure,andreportontheconditionofdataas wellasthoserequiredtomanagedataissues,preventproblems,andimprovethequalityofdata.
DataQualityManagementStrategy Aplanthatdefinesthecoredataqualitymanagementcapabilitiesthe organizationwillimplement,whatdatasetstheywillbeappliedto,whattoolswillbeused,whowillbe responsibleforexecutingthem,andwhattheexpectedbenefitswillbetotheorganization.
DataQualityMeasurementResults Thedetaileddatavaluesreturnedwhendataqualityrulesorreasonabilitytestsareexecuted.Forexample,theresultofmeasuringthecompletenessofacolumnwillreturnthe countofrecordswithaNULL,countoftotalrecords,percentageofrecordsviolatingtherule,andpotentiallyothercalculationsthatputtheindividualmeasurementresultincontext(e.g.,thresholdvalue,mean ofpastresults).(Source:Sebastian-Coleman,2013.)
DataQualityMonitoring Theability,throughmeasurementsandothercontrols,totracklevelsofquality withinasystemorprocesstoensurethatitcontinuestomeetrequirementsandtodetectunexpected changesinpatternsofsize,composition,orothercharacteristicsofpopulation,andtodetectpotential issuessoactioncanbetakeninresponsetothem.
DataQualityRule Aconstraintdefinedonaqualitydimensionforadataset,adataelement,ortherelationshipbetweendataelements.Dataqualityrulescanbeusedtovalidateormeasuredataaswellastomonitordata.Dataqualityrulescanbethebasisoftransformationrulestocleansedata.
DataQualityStandard Anassertionabouttheexpectedconditionofdata,usuallyrelateddirectlytoaquality dimension:howcompletethedatais;howwellitconformstodefinedrulesforvalidity,integrity,andconsistency;andhowitadherestodefinedexpectationsforpresentation.Astandardmaybeexpressedasa simplerule(“FirstNamemustbepopulatedonMemberrecords”)ormaydescribeasetofconditionsthat mustbemet(“AddressdatamustconformtoUSPSstandards”).Dataqualitystandardsshouldbemeasurable.Determiningwhetherdatameetsasetofconditionsmayinvolvemultiplemeasurements.
DataSet Acollectionofdatabroughttogetherforapurpose.Thedefinitionof set maybesimplewithaclear purpose;forexample,atableisadatasetputtogethertorepresentattributesassociatedwithaspecific entity.Oritmaybemorecomplicatedandservemultiplepurposes;forexample,datadomainmaycomprisemultipletablesfrommultiplesystemsandbeusedforbillingandinteractingwithcustomersaswell asanalysisandreporting.
DataStandards Assertionsabouthowdatashouldbecreated,presented,transformed,orconformedforpurposesofconsistencyinpresentationandmeaningandtoenablemoreefficientuse.Datastandardscanbe definedattheprocess(describingtherequiredinputsorexpectedoutputs),value(column/field),structure (table),ordatabaselevels.Theyhaveanimpactontechnicalprocessingandstorageofdataaswellason dataconsumeraccesstoanduseofdata.
DataSteward Astewardisapersonwhosejobistomanagethepropertyofanotherperson.Datastewards managedataassetsonbehalfofothersinthebestinterestsoftheorganization(McGilvray,2021). Informally,asubjectmatterexpertinadatadomain,dataset,process,ordataelementwhoactsaccountablytowarddataandonwhomothersrelyforinformationandexpertise.Formally,asajobtitle,aperson whohasspecificallydefinedresponsibilitiesrelatedtohelpingtheorganizationcreate,manage,govern, use,andderivevaluefromitsdata.
DataStewardship Stewardingdataisawayofinteractingwithdata;specifically,actingwithaccountability fordataforthegoodoftheorganization.Datacanbe“stewarded”atdifferentlevels,usinginformaland formalapproaches.Stewardshipisrequiredofeveryindividualwhocreatesorusesorganizationaldata. Datastewardship isthemostcommonlabeltodescribeaccountabilityandresponsibilityfordataandprocessesthatensureeffectivecontrolanduseofdataassets.(Sources:DMBOK2;Seiner,2014.)
DataStore Agenericnameforadatabase,datawarehouse,mart,lake,fabricorothersystemthatholdsdata foranalytics,operations,orotherpurposes.
DataStrategy Aplanthatdescribeshowtheorganizationintendstogetvaluefromitsdata.Adatastrategy requiresthattheorganizationhasthedataitneedstosupportbusinessgoalsandthattheorganizationcan usethedataeffectively.Adatastrategymustaccountforthedataitself(howtheorganizationwillobtain orcreatethedataitneedsandhowthisdatawillbemanagedforvalueoveritslifecycle)andfortheabilityoftheorganizationtousethedata(accessibilityandtooling,metadata,andotherexplicitknowledge requiredforuse,skills,knowledge,andexperience—dataliteracy—ofdataconsumers).
DataSupplyChain Thesetofprocessesthroughwhichdataiscreatedanddistributedwithinanorganization oramongorganizations.
DataValidation Aprocessofexecutingtestsagainstdatatodeterminewhetheritisusableornot.Validation canincludearangeoffunctions,fromcheckingreceiveddataagainstcontrolfilestointerrogatingspecific fieldsandrules.Differentactionscanbedefinedbasedontheresultsofthevalidationtests.
DataValuation Dataassetvaluationistheprocessofunderstandingandcalculatingtheeconomicvalueof datatoanorganization.
DataWarehouse Anintegrated,centralizeddecisionsupportdatabaseandtherelatedsoftwareprogramsused tocollect,cleanse,transform,andstoredatafromavarietyofoperationalsourcestosupportbusinessintelligence.Adatawarehousemayalsoincludedependentdatamarts.(Source:DAMA,2011.)
DefaultValue Avaluepopulatedinacolumnorfieldtoindicatethatameaningfulvalueisnotavailableor doesnotpertaintotheattributeinthecontextoftherecord.
DerivedData Datacreatedwithinasystemfromotherdata,forexample,throughacomplextransformation ruleorcalculation.
DesignedData Datathathasbeencreatedthroughprocessesthataccountforitsqualityinrelationtoits potentialuses.Designeddataincludescharacteristicsthathelpenableitsuse;forexample,itissupported byhigh-qualitymetadata,anditmeetsidentifiedstandardsforinteroperability.
DimensionofQuality See DataQualityDimension.
DistributionofValues Asynonymfora columnprofileofvalues.Itincludesboththecountofeachdistinct valueinacolumnandthepercentageofrecordsassociatedwiththatvalue.Inmanycases,adistribution ofvaluesprovidesaquickwayofassessingthereasonabilityofdatacontent.Adistributionforadataset orincrementcanbecomparedwithotherdatasets(suchasabenchmarkorapreviousinstanceofthe samedataset)toidentifywhethertherearedifferencesthatmayindicateaproblemwiththedata.
Edit Acontrolondatainput.Editsconstrainwhatdatacanbeinputtoasystem.Someeditsaresimple(e.g., usingadrop-downlistofoptionstopreventinganinvalidvaluefrombeingenteredinafield).Othersare canbemorecomplex(e.g.,preventingarecordfrombeinginsertedifaninstanceoftherecordalready existsinthesystem).
EnterpriseDataModel(EDM) Aholistic,enterprise-level,implementation-independentconceptualorlogicaldatamodelthatprovidesacommonconsistentviewofdataacrosstheenterprise.Itiscommontouse thetermtomeanahigh-level,simplifieddatamodel,butthisisaquestionofabstractionforpresentation. AnEDMincludeskeyenterprisedataentities(i.e.,businessconcepts),theirrelationships,criticalguiding businessrules,andsomecriticalattributes.Itsetsforththefoundationforalldataanddata-relatedprojects.(Source:DMBOK2.)
Entity Intheprocessofentityresolution,anentityis“areal-worldperson,place,orthingthathasaunique identitythatdistinguishesitfromallotherentitiesofthesametype”(Talburt,2011,p.205).Intheprocess ofmodeling,anentityisaconceptbeingmodeledandissometimesusedasasynonymfor table.
Extract,Transform,Load(ETL) Theprocessofpullingdatafromasourcesystem,preparingitforuseina targetsystem,andthenloadingittothattargetsystem.ETListhestandardprocessforpopulatingdatain datawarehousesandmarts.InBigDataenvironments,anELT(extract,load,transform)processissometimesadopted.ELTspeedsuptheprocessofmakingdataavailablebyremovingtheneedtotransform datatoloadit.
Field See Column
File Asetofdatathathasnotbeenputintoadatastructure(e.g.,atable).
ForeignKeyRelationship Aforeignkeyinatableisareferencebacktoanothertable.Forexample,adiagnosiscodeinaclaimtablewilljointothediagnosiscodetableandconnectwithdescriptionsandother informationaboutthemeaningofthecode.
FormatConformity Adimensionofdataqualitythatanswersthequestionofwhetherdataisinthe expectedform,usuallydefinedviadatatypeandlength.Formatconformityreferstotheadherenceof datatoadefinedphysicalformat.Forexample,USZipcodesarefivedigitslong.AZipcodethatcontainsfewerormorethanfivedigitsdoesnotconformtotheexpectedlengthofaZipcode.AUSZip codethatcontainslettersdoesnotconformtotheexpectedformatofaZipcode.Somedataelements havestrictformatrequirementsthatareenforcedviadatatypeconstraint(e.g.,dates).Othershavevery broadrequirementsthatmaynotbeworthmeasuring(e.g.,firstnamemustbemadeupofletters,but canbeofalmostanylength).
FrequencyDistribution See DistributionofValues
GeneralDataProtectionRegulation(GDRP) AEuropeanUniondataprotectionlegislationimplementedin 2018.
HealthInsurancePortabilityandAccountabilityAct(HIPAA) LegislationenactedbytheUSfederalgovernmentin1996;containsprovisionsthataffecttheuseofprotectedhealthinformation.
Integrity Adimensionofdataqualitythatanswersthequestion:Dothepiecesofadatasetrelatetoeach other(fittogether)inexpectedways?Integrityreferstothestateofbeingwholeandundividedortheconditionofbeingunified.Integrityisthedegreetowhichdataconformtodatarelationshiprules(usuallyas definedbythedatamodel)thatareintendedtoensurethecomplete,consistent,andvalidpresentationof datarepresentingthesameconcepts.Integrityrepresentstheinternalconsistencyofadataset.
IssueManagement See DataQualityIssueManagement.
Management Theprocessofcontrollingpeople,activities,orthingstomeetasetofgoals.
MasterData Masterdataobjectsarecorebusinessobjectsusedindifferentapplicationsacrossanorganization,alongwiththeirassociatedmetadata,attributes,definitions,roles,connections,andtaxonomies. Masterdataobjectsrepresentthe“things”thatmattermosttoanorganization—thosethatareloggedin transactions,reportedon,measured,andanalyzed.(Source:Loshin,2008.)
Metadata Explicit(i.e.,documented)knowledge aboutdatathatenablesdatatobecreated,understood,andused. Metadataisrequiredfortheuseofdata.Theabsenceofmetadataisadataqualityissuebecauselackofinformationaboutdataisanobstacletothe useofdata.Awiderangeofinformationisincludedundertheumbrella ofmetadata.Metadataincludesinformationabouttechnicalandbusinessprocesses,datarulesandconstraints, andlogicalandphysicaldatastructures.Itdescribesthedataitself(e.g.,databases,dataelements, datamodels), theconceptsthedatarepresents(e.g.,businessprocesses,applicationssystems,softwarecode,technologyinfrastructure),andtheconnections/relationshipsbetweendataand concepts.(Source:DMBOK2.)
MetadataAsset Asetofmetadatathatisusedforparticularpurposesthatbringvaluetotheorganization.For example,adatadictionaryforaparticularsystemisametadataasset.Metadataassetsaresetsofmetadata that,takenalltogether,constitutethemetadataassetsoftheorganization.
MetadataManagement Aspecializedformofdatamanagementthatfocusesonmanagingmetadatathroughoutitslifecycle.Metadatamanagementencompassesplanning,implementation,andcontrolactivitiesto enableaccesstohigh-quality,integratedmetadata.Metadatamanagementactivitiesfocusonensuringthat high-qualitymetadataiswidelyaccessiblethroughouttheenterprise.(Source:DMBOK2.)
MetadataModel Adatamodelthatdefinestheattributesofandrelationshipsamongdifferenttypesofmetadataassets(e.g.,businessglossary,catalog,datadictionary,dataqualitystandards,dataprocessingmetadata).Aswithotherdatamodels,itdescribestheattributesassociatedwitheachentity.Forexample, metadataattributesforeachtableinadatabaseincludethingslike:TableBusinessName,TablePhysical Name,TableSubjectArea,Originatingsystemforthetable,andsoon.Itcanbeusedtodefinerequirementsandtosupportoverallmetadatamanagement.
MIN/MAX Inacolumnprofile,theminimum(lowest)andmaximum(highest)valuesinadistributionof values,basedeithernumerically,alphabetically,orthroughacombination.
Normalization Theprocessofstructuringrelationaldatasoitfollowsthestandardsfornormalformas definedbyE.F.Codd.Inlesstechnicallanguage,itmeansreducingmultipleinstancesofanattributeto oneinstancesothereislessredundancyandoneandonlyonesourceforeachvalue.
OperationalMetadata OperationalMetadatadescribesdetailsoftheprocessingandaccessingofdata.
Optionality Aspartofdatamodeling,definingwhetherthepopulationofacolumnismandatory/requiredor optional(populatedonlyunderparticularcircumstances).
OrganizationalCulture “Thepatternofbeliefs,valuesandlearnedwaysofcopingwithexperiencethathave developedduringthecourseofanorganization’shistory,andwhichtendtobemanifestedinitsmaterial arrangementsandinthebehaviorsofitsmembers”(Brown,1998).Moresimplyput,culturedescribesthe waypeopleworkwithinanorganization,withsomereflectiononwhytheyworkthewaytheydo.Thecultureofanorganizationmaybedescribedthroughcharacteristicssuchastheattitudetowardworkitself, leadershipstyle,andsoon.
OrganizationalData Datacreatedandcapturedaspartoftheprocessofexchanginggoodsandservices. Commercialorganizations,governments,educationalinstitutions,andnon-profitsallcreatedatathat
reflectstheoperationalpracticesofthoseorganizationsaswellasthepeopleandotherorganizationswith whichagivenorganizationinteracts.
OriginatingSourceSystem Asystemthatcreatesdata.Usedtosignifywhereaparticulardataset,orevena singledataelement,wasfirstcreated.Thetermisusedtodescribetheroleasystemplaysinthedatalife cycleforaparticularsubsetofdata.Systemscanplaymultipleroles.
Parent-ChildRelationship Inareferentialrelationship,theentitythatmust“comefirst”isreferredtoas the parententity.Theentitythatdependsonthefirstentityisthe childentity.Forexample,foranorganizationtosellproducts,itmustfirsthavealistingofitsproducts.“Product”wouldbeaparententityto “Sales.”Theuseof parent/child asametaphorfortheserelationshipsallowsforotherassertions;for example,orphanrecordsarerecordsonachildtableforwhichreferencesarenotpresentontheparent table.
PersonalInformationProtectionandElectronicDocumentsAct(PIPEDA) CanadianPrivacyprotection legislationpassedin2000.
Precision Adimensionofqualitythatdescribesthedegreetowhichdatameetsarequirementforexactitude.
Process Aseriesofstepsthatturnsinputsintooutputs.
Product Theoutputfromaprocess.
Reasonability Adimensionofdataqualitythatanswersthequestion:Doesthisdataconformtogeneral expectationsbasedonthepopulationitrepresents?
ReferenceData Referencedata,sometimesreferredtoas look-updata or codeanddescriptiondata,associatescodifieddatavalueswiththeirmeanings.Itisused“tocharacterizeotherdatainanorganizationorto relatedatatoinformationbeyondtheboundariesofanorganization”(Chisholm,2001).Referencedatais criticaltotheuseofotherdata.Withoutreferencedata,otherdataisoftenincomprehensible.
ReferentialIntegrity(RI) Thedegreetowhichdataintwoormoretablesrelatedthroughaforeignkeyrelationshipiscomplete.Referentialintegrityisoftenexplainedthroughparent/childtablerelationships. Withinadatabase,allvaluesthatarepresentinachildtableshouldalsobepresentinitsparenttable.Ifa childtablehasvaluesthatarenotinitsparenttable(orphanvalues),itdoesnothavereferentialintegrity withtheparenttable.Theconceptofreferentialintegritycanalsobeappliedattherecordlevel.
Relevance Adimensionofqualitythatanswersthequestion:Willthisdatameetmyneeds?Relevancemeasurestheapplicabilityofthedatatothegoalsofdataconsumers.
Risk Thepossibilitythatsomethingunpleasantorunwelcomewillhappen.Risktodataisthepossibilitythat somethingwillnegativelyaffectitsqualityandmakeitlessfitforuseorthatitwillbemisused,either intentionallyorunintentionally.
RiskManagement Usingmanagerialresourcestointegrateriskidentification,riskassessment,riskprioritization,developmentofrisk-handlingstrategies,andmitigationofrisktoacceptablelevels.(Source:ASQ.)
RootCause Therootcauseofaproblemisthefundamentalreasontheproblemexists.
ServiceLevelAgreement(SLA) Aformalcommitmentbetweenateamthatprovidesservicesandtheusers ofthoseservicestomeetspecificlevelsofperformance,forexample,toprovideserviceswithinadefined timeframe.ProductionsupportteamsforITsystemsoftenhaveSLAstodefineexpecteddeliveryofdata andavailabilityofthesystem.
SingleSourceofTruth Atermusedtodescribeanaspirationalgoalofsomedatamanagementprocesses:to establishonesystemordatasetthatwillcontainthehighest-qualitydataintheorganizationandthatalldata consumerswillusethisastheirsourceofdata.Theconceptofasinglesourceoftruthisalsocontrastedwith theideaof“multipleversionsofthetruth”—multipledatasetsthatmayrepresentthesamepeople,objects, orevents,butthatmayhavevariationbasedonhowdataiscollected,structured,processed,ormaintained.
SourceSystem Anapplicationordatabasefromwhichaperson,process,orothersystemobtainsdata.The datamayoriginateinthesourcesystem,oritmaysimplybestoredthereforusage.
Source-to-TargetMapping(STTM) See DataMappingSpecifications
Standard Somethingconsideredbyanauthorityorbygeneralconsentasabasisofcomparison;anapproved model.Oritisaruleorprinciplethatisusedasabasisforjudgment.Standardsembodyexpectationsina formalmanner.Tostandardizesomethingmeanstocauseittoconformtoastandardortochooseorestablishastandardforsomething.ASQdefines standard as“themetric,specification,gauge,statement,category,segment,grouping,behavior,eventorphysicalproductsampleagainstwhichtheoutputsofa processarecomparedanddeclaredacceptableorunacceptable.”(Source:ASQ.)
Standardization Tostandardizedataistomakeitconformtoastandard,howeverthatstandardisdefined (Note:Indatamanagement, standardization isdifferentfrom normalization.)
StructuredData Datathatisdefinedthroughadatamodel.
System Aninterconnectedsetofelementsthatiscoherentlyorganizedtoachieveapurpose.(Source: Meadows,2008).
System/Application Aninformationtechnology(IT)systemisanapplicationdesignedtoenableusersof thesystemtoexecuteprocessesandmeetgoals,whetherthesearebusinessoriented(sellproducts)or haveotherends(playagame).Informationsystemsi ncludehardware,software,peripheralequipment, anddata.
SystemConstraint Aconditioninasystemthatpreventsorminimizesactionsthatthesystemcanperform.
SystemControl Ameansofprovidingfeedbackaboutautomatedactivitiescarriedoutbyanautomatedprocessorsystem.Controlsmaybeputinplacetostopaprocessorsendanalertifconditionsofthecontrol aremet.See ABCControls
SystemofRecord Asystemthatisdesignatedastheplacewherethebestcopyofadatasetorevenasingle dataelementexists.
Table Adata-basedstructurecomprisingrowsandcolumnsthatorganizedataaboutadefinedentity.
TargetSystem Asystemorapplicationtowhichdataisbroughtinordertobestoredandused.Forexample, adatawarehouseisatargetsystemthatispopulatedwithdatafromdisparateapplicationsanddatabases.
TechnicalDataSkills Theabilitytoquerydata,organizeit,aggregateit,andpresentitforpurposesofcommunicatingaboutthedata.
TechnicalMetadata Technicalmetadataprovidesinformationaboutthetechnicaldetailsofdata,thesystems thatstoredata,andtheprocessesthatmoveitwithinandamongsystems.
TechnologyStrategy Atechnologystrategydescribeshowtheorganizationwillimproveandleveragetechnologyinsupportofbusinessgoals.Itwillincludethefuturestatetechnicalarchitecturetheorganization intendstobuild,howtheorganizationwillmakedecisionsabouttechnology,andwhatstrategicdrivers willinfluencethesedecisions.
Threshold See Tolerance.
Timeliness Adimensionofquality.Thedegreetowhichdataisavailableforusewithinarequiredtimeframe.
Tolerance Thelevelofruleviolation,error,orunexpectedconditionthatisacceptable.Maybeexpressedas arawnumberorasapercentageofrecordsorvariancefromasetnumber.Maybecalculatedindifferent ways,dependingonwhatisbeingmeasuredandhowmeasurementsareexecuted.
TraditionalData Withtheemergenceoftheterm BigData,theterm traditionaldata hasemergedtoreferto datacreatedinoldertechnology.Itisworthnotingthateventraditionaldatacanbe“big.”
TransformationRule Atransformationruledescribeshowdatashouldbechangedasitisbroughtfroma sourcesystemintoatargetsystem.Transformationrulescanbesimple(directlymovingavalue;reformattingafield)orcomplex(combiningasetofrecordstoasinglerecord).
Trustworthiness Adimensionofdataqualitythatmeasurestheperceivedreasonabilityofthedataitself,confidenceinthesourceofthedata,andthereliabilityofthesystemsinwhichitiscreatedandmanaged.
Uniqueness/non-duplication Adimensionofquality;referstothedegreetowhichredundantdataispresent withinasystemordatastructure.Uniquenesscanbeunderstoodattheentityinstancelevel(Doesthemasterdatacontainmultiplerepresentationsofthesamecustomer?),attherecordlevel(Doesthesalesdata