
https://ebookmass.com/product/data-simplification-taminginformation-with-open-source-tools-1st-edition-berman/

https://ebookmass.com/product/data-simplification-taminginformation-with-open-source-tools-1st-edition-berman/
https://ebookmass.com/product/red-zone-a-steamy-instalove-footballromance-carina-alyce/
ebookmass.com
MorganKaufmannisanimprintofElsevier 50HampshireStreet,5thFloor,Cambridge,MA02139,USA
Copyright # 2016ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,without permissioninwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthe Publisher’spermissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearance CenterandtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions.
ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher (otherthanasmaybenotedherein).
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroaden ourunderstanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecome necessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingand usinganyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationor methodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomthey haveaprofessionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeany liabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligence orotherwise,orfromanyuseoroperationofanymethods,products,instructions,orideascontainedinthe materialherein.
BritishLibraryCataloguinginPublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary
LibraryofCongressCataloging-in-PublicationData
AcatalogrecordforthisbookisavailablefromtheLibraryofCongress
ISBN:978-0-12-803781-2
ForinformationonallMKpublications visitourwebsiteat https://www.elsevier.com/
Publisher: ToddGreen
AcquisitionEditor: ToddGreen
EditorialProjectManager: AmyInvernizzi
ProductionProjectManager: PunithavathyGovindaradjane
Designer: MarkRogers
TypesetbySPiGlobal,India
“solve”bigdataproblems.Although,muchofdatascienceinnovationfocusesonhardware,cloud computing,andnovelalgorithmstosolveBD2Kproblems,thecriticalissuesremainatthelevelof theutilityofthedata(eg,simplification)addressedinthisimportantbookbyBerman.
DataSimplification provideseasy,freesolutionstotheunintendedconsequencesofdatacomplexity.Thisbookshouldbethefirst(andprobablymostimportant)guidetosuccessinthedatasciences. Iwillbeprovidingcopiestomytrainees,programmers,analysts,andfaculty,asrequiredreading.
MichaelJ.Becich,MD,PhD
AssociateVice-ChancellorforInformaticsintheHealthSciences ChairmanandDistinguishedUniversityProfessor,DepartmentofBiomedicalInformatics Director,CenterforCommercialApplication(CCA)ofHealthcareData UniversityofPittsburghSchoolofMedicine
purposes,canbecreatedforasingledocumentordataset.Asdataaccrues,indexescanbe updated.Whendatasetsarecombined,theirrespectiveindexescanbemerged.Agoodwayofthinking aboutindexesisthatthedocumentcontainsallofthecomplexity;theindexcontainsallofthe simplicity.Datascientistswhounderstandhowtocreateanduseindexeswillbeinthebestposition tosearch,retrieve,andanalyzetextualdata.Methodsareprovidedforautomaticallycreatingcustomizedindexesdesignedforspecificanalyticpursuitsandforbindingindextermstostandard nomenclatures.
Chapter4,UnderstandingYourData,describeshowdatacanbequicklyassessed,priortoformal quantitativeanalysis,todevelopsomeinsightintowhatthedatameans.Afewsimplevisualization tricksandsimplestatisticaldescriptorscangreatlyenhanceadatascientist’sunderstandingofcomplex andlargedatasets.Varioustypesofdataobjects,suchastextfiles,images,andtime-seriesdata,canbe profiledwithasummarysignaturethatcapturesthekeyfeaturesthatcontributetothebehaviorand contentofthedataobject.Suchprofilescanbeusedtofindrelationshipsamongdifferentdataobjects, ortodeterminewhendataobjectsarenotcloselyrelatedtooneanother.
Chapter5,IdentifyingandDeidentifyingData,tacklesoneofthemostunder-appreciatedandleast understoodissuesindatascience.Measurements,annotations,properties,andclassesofinformationhave noinformationalmeaningunlesstheyareattachedtoanidentifierthatdistinguishesonedataobjectfrom allotherdataobjects,andthatlinkstogetheralloftheinformationthathasbeenorwillbeassociatedwith theidentifieddataobject.Themethodofidentificationandtheselectionofobjectsandclassestobeidentifiedrelatesfundamentallytotheorganizationalmodelofcomplexdata.Ifthesimplifyingstepofdata identificationisignoredorimplementedimproperly,datacannotbeshared,andconclusionsdrawnfrom thedatacannotbebelieved.Allwell-designedinformationsystemsare,attheirheart,identificationsystems:waysofnamingdataobjectssothattheycanberetrieved.Onlywell-identifieddatacanbeusefully deidentified.Thischapterdiscussesmethodsforidentifyingdataanddeidentifyingdata.
Chapter6,GivingMeaningtoData,exploresthemeaningofmeaning,asitappliestocomputer science.Weshalllearnthatdata,byitself,hasnomeaning.Itisthejobofthedatascientisttoassign meaningtodata,andthisisdonewithdataobjects,triples,andclassifications(seeGlossaryitems, Data object, Triple, Classification, Ontology).Unfortunately,courseworkintheinformationsciencesoften omitsdiscussionofthecriticalissueof"datameaning";advancingfromdatacollectiontodataanalysis withoutstoppingtodesigndataobjectswhoserelationshipstootherdataobjectsaredefinedanddiscoverable.Inthischapter,readerswilllearnhowtoprepareandclassifymeaningfuldata.
Chapter7,Object-OrientedData,showshowwecanunderstanddata,usingafewelegantcomputationalprinciples.Modernprogramminglanguages,particularlyobject-orientedprogramminglanguages,useintrospectivedata(ie,thedatawithwhichdataobjectsdescribethemselves)tomodify theexecutionofaprogramatrun-time;anelegantprocessknownasreflection.Usingintrospection andreflection,programscanintegratedataobjectswithrelateddataobjects.Theimplementations ofintrospection,reflectionandintegration,areamongthemostimportantachievementsinthefield ofcomputerscience.
Chapter8,ProblemSimplification,demonstratesthatitisjustasimportanttosimplifyproblemsas itistosimplifydata.Thisfinalchapterprovidessimplebutpowerfulmethodsforanalyzingdata,withoutresortingtoadvancedmathematicaltechniques.Theuseofrandomnumbergeneratorstosimulate thebehaviorofsystems,andtheapplicationofMonteCarlo,resampling,andpermutativemethodsto awidevarietyofcommonproblemsindataanalysis,willbediscussed.Theimportanceofdata reanalysis,followingpreliminaryanalysis,isemphasized.
case,itwouldbeimpossibletoproduceanyvalidresultsbasedonastudyofpatientsdiagnosedasMFH.The resultswouldbeabiasedandirreproduciblecacophonyofdatacollectedacrossdifferent,andundetermined, classesoftumors.Believeitornot,thisspecificexample,oftheblendedMFHclassoftumors,isselected fromthereal-lifeannalsoftumorbiology.30, 31 Theliteratureisrifewithresearchofdubiousquality,based onpoorlydesignedclassificationsandblendedclasses.Adetaileddiscussionofthistopicisfoundin Section6.5,PropertiesthatCrossMultipleClasses.Onecaveat.Effortstoeliminateclassblendingcan becounterproductiveifundertakenwithexcesszeal.Forexample,inanefforttoreduceclassblending, aresearchermaychoosegroupsofsubjectswhoareuniformwithrespecttoeveryknownobservableproperty.Forexample,supposeyouwanttoactuallycompareappleswithoranges.Toavoidclassblending,you mightwanttomakeverysurethatyourapplesdonotincludedanykumquats,orpersimmons.Youshouldbe certainthatyourorangesdonotincludeanylimesorgrapefruits.Imaginethatyougoevenfurther,choosing onlyapplesandorangesofonevariety(eg,Macintoshapplesandnaveloranges),size(eg10cm),andorigin (eg,California).Howwillyourcomparisonsapplytothevarietiesofapplesandorangesthatyouhaveexcludedfromyourstudy?Youmayactuallyreachconclusionsthatareinvalidandirreproducibleformore generalizedpopulationswithineachclass.Inthiscase,youhavesucceededineliminatedclassblending, attheexpenseoflosingrepresentativepopulationsoftheclasses.See Simpson’sparadox.
ChildClass Thedirectorfirstgenerationsubclassofaclass.Sometimesreferredtoasthedaughterclassor,less precisely,asthesubclass.See Parentclass.See Classification.
Class Aclassisagroupofobjectsthatshareasetofpropertiesthatdefinetheclassandthatdistinguishthemembersoftheclassfrommembersofotherclasses.Theword"class,"lowercase,isusedasageneralterm.The word"Class,"uppercase,followedbyanuppercasenoun(egClassAnimalia),representsaspecificclasswithin aformalclassification.See Classification Classification Asysteminwhicheveryobjectinaknowledgedomainisassignedtoaclasswithinahierarchyof classes.Thepropertiesofsuperclassesareinheritedbythesubclasses.Everyclasshasoneimmediatesuperclass(ie,parentclass),althoughaparentclassmayhavemorethanoneimmediatesubclass(ie,childclass). Objectsdonotchangetheirclassassignmentinaclassification,unlesstherewasamistakeintheassignment. Forexample,arabbitisalwaysarabbit,anddoesnotchangeintoatiger.Classificationscanbethoughtofas thesimplestandmostrestrictivetypeofontology,andservetoreducethecomplexityofaknowledgedomain.32 Classificationscanbeeasilymodeledinanobject-orientedprogramminglanguageandarenonchaotic (ie,calculationsperformedonthemembersandclassesofaclassificationshouldyieldthesameoutput,each timethecalculationisperformed).Aclassificationshouldbedistinguishedfromanontology.Inanontology,a classmayhavemorethanoneparentclassandanobjectmaybeamemberofmorethanoneclass.Aclassificationcanbeconsideredaspecialtypeofontologywhereineachclassislimitedtoasingleparentclass andeachobjecthasmembershipinoneandonlyoneclass.See Nomenclature.See Thesaurus.See Vocabulary. See Classification.See Dictionary.See Terminology.See Ontology.See Parentclass.See Childclass.See Superclass.See Unclassifiableobjects.
Coding Theterm"coding"hasthreeverydifferentmeanings;dependingonwhichbranchofscienceinfluences yourthinking.Forprogrammers,codingmeanswritingthecodethatconstitutesacomputerprogrammer. Forcryptographers,codingissynonymouswithencrypting(ie,usingaciphertoencodeamessage).Formedics, codingiscallinganemergencyteamtohandleapatientinextremis.Forinformaticiansandlibraryscientists, codinginvolvesassigninganalphanumericidentifier,representingaconceptlistedinanomenclature,toa term.Forexample,asurgicalpathologyreportmayincludethediagnosis,"Adenocarcinomaofprostate." AnomenclaturemayassignacodeC4863000thatuniquelyidentifiestheconcept"Adenocarcinoma."Coding thereportmayinvolveannotatingeveryoccurrenceoftheword"Adenocarcinoma"withthe"C4863000"identifier.Foradetailedexplanationofcoding,anditsimportanceforsearchingandretrievingdata,seethefulldiscussioninSection3.4,"AutoencodingandIndexingwithNomenclatures."See Autocoding.See Nomenclature CommandLine Instructionstotheoperatingsystem,thatcanbedirectlyenteredasalineoftextfromtheasystemprompt(eg,theso-calledCprompt,"c:\>",inWindowsandDOSoperatingsystems;theso-calledshell prompt,"$",inLinux-likesystems).
CommandLineUtility Programslackinggraphicuserinterfacesthatareexecutedviacommandlineinstructions. Theinstructionsforautilityaretypicallycouchedasaseriesofarguments,onthecommandline,followingthe nameoftheexecutablefilethatcontainstheutility.See Utility.
DataQualityAct IntheU.S.,thedatauponwhichpublicpolicyisbasedmusthavequalityandmustbeavailable forreviewbythepublic.Simplyput,publicpolicymustbebasedonverifiabledata.TheDataQualityActof 2002,requirestheOfficeofManagementandBudgettodevelopgovernment-widestandardsfordataquality.33
DataObject Adataobjectiswhateverisbeingdescribedbythedata.Forexample,ifthedatais"6feettall," thenthedataobjectisthepersonorthingtowhich"6feettall"applies.Minimally,adataobjectisa metadata/datapair,assignedtoauniqueidentifier(ie,atriple).Inpractice,themostcommondataobjects aresimpledatarecords,correspondingtoarowinaspreadsheetoralineinaflat-file.Dataobjectsinobjectorientedprogramminglanguagestypicallyencapsulateseveralitemsofdata,includinganobjectname,an objectuniqueidentifier,multipledata/metadatapairs,andthenameoftheobject’sclass.See Triple .See Identifier.See Metadata
DataRepurposing Involvesusingolddatainnewways,thatwerenotforeseenbythepeoplewhooriginally collectedthedata.Datarepurposingcomesinthefollowingcategories:(1)Usingthepreexistingdatato askandanswerquestionsthatwerenotcontemplatedbythepeoplewhodesignedandcollectedthedata; (2)Combiningpreexistingdatawithadditionaldata,ofthesamekind,toproduceaggregatedatathatsuits anewsetofquestionsthatcouldnothavebeenansweredwithanyoneofthecomponentdatasources;(3) Reanalyzingdatatovalidateassertions,theories,orconclusionsdrawnfromtheoriginalstudies;(4)Reanalyzingtheoriginaldatasetusingalternateorimprovedmethodstoattainoutcomesofgreaterprecisionorreliabilitythantheoutcomesproducedintheoriginalanalysis;(5)Integratingheterogeneousdatasets(ie,data setswithseeminglyunrelatedtypesofinformation),forthepurposeanansweringquestionsordeveloping conceptsthatspandiversescientificdisciplines;(6)Findingsubsetsinapopulationoncethoughttobehomogeneous;(7)Seekingnewrelationshipsamongdataobjects;(8)Creatingon-the-fly,noveldatasetsthrough datafilelinkages;(9)Creatingnewconceptsorwaysofthinkingaboutoldconcepts,basedonareexamination ofdata;(10)Fine-tuningexistingdatamodels;and(11)Startingoverandremodelingsystems.34 See Heterogeneousdata.
Dictionary Aterminologyorwordlistaccompaniedbyadefinitionforeachitem.See Nomenclature.See Vocabulary.See Terminology.
ExeFile Afilewiththefilenamesuffix".exe".Incommonparlance,filenameswiththe".exe"suffixareexecutablecode.See Executablefile
ExecutableFile Afilethatcontainscompiledcomputercodethatcanbereaddirectlyfromthecomputer’sCPU, withoutinterpretationbyaprogramminglanguage.AlanguagesuchasCwillcompileCcodeintoexecutables. Scriptinglanguages,suchasPerl,Python,andRubyinterpretplain-textscriptsandsendinstructionstoaruntimeengine,forexecution.Becauseexecutablefileseliminatetheinterpretationstep,theytypicallyrunfaster thanplain-textscripts.See Exefile.
FOSS Freeandopensourcesoftware.EquivalenttoFLOSS(FreeLibreOpenSourceSoftware),anabbreviation thattradesredundancyforinternationalappeal.See FreeSoftwareMovementversusOpenSourceInitiative. FreeSoftwareMovementVersusOpenSourceInitiative Beyondtrivialsemantics,thedifferencebetweenfree softwareandopensourcesoftwarerelatestotheessentialfeaturenecessaryfor"opensource"software(ie, accesstothesourcecode)andtothedifferentdistributionlicensesoffreesoftwareandopensourcesoftware. Sticklersinsistthatfreesoftwarealwayscomeswithpermissiontomodifyandredistributesoftwareinaprescribedmannerasdiscussedinthesoftwarelicense;apermissionnotalwaysgrantedinopensourcesoftware. Inpractice,thereisverylittledifferencebetweenfreesoftwareandopensourcesoftware.RichardStallmanhas writtenanessaythatsummarizesthetwodifferentapproachestocreatingfreesoftwareandopensource software.35
FreeSoftware Theconceptoffreesoftware,aspopularizedbytheFreeSoftwareFoundation,referstosoftware thatcanbeusedfreely,withoutrestriction.Theterm"free"doesnotnecessarilyrelatetotheactualcostofthe software.
Namespace Anamespaceistherealminwhichametadatatagapplies.Thepurposeofanamespaceistodistinguishmetadatatagsthathavethesamename,butadifferentmeaning.Forexample,withinasingleXMLfile, themetadatatag"date"maybeusedtosignifyacalendardate,orthefruit,orthesocialengagement.Toavoid confusion,metadatatermsareassignedaprefixthatisassociatedwithaWebdocumentthatdefinestheterm (ie,establishesthetag’snamespace).Inpracticalterms,atagthatcanhavedifferentdescriptivemeaningsin differentcontextsisprovidedwithaprefixthatlinkstoawebdocumentwhereinthemeaningofthetag,asit appliesintheXMLdocumentisspecified.
Nomenclature Anomenclaturesisalistingoftermsthatcoveralloftheconceptsinaknowledgedomain.A nomenclatureisdifferentfromadictionaryforthreereasons:(1)thenomenclaturetermsarenotannotated withdefinitions,(2)nomenclaturetermsmaybemulti-word,and(3)thetermsinthenomenclatureare limitedtothescopeoftheselectedknowledgedomain.Inaddition,mostnomenclaturesgroupsynonyms underagroupcode.Forexample,afoodnomenclaturemightcollectsubmarine,hoagie,po’boy,grinder, hero,andtorpedounderanalphanumericcodesuchas"F63958".Nomenclaturessimplifytextualdocuments byunitingsynonymoustermsunderacommoncode.Documentsthathavebeencodedwiththesamenomenclaturecanbeintegratedwithotherdocumentsthathavebeensimilarlycoded,andqueriesconducted oversuchdocumentswillyieldthesameresults,regardlessofwhichtermisentered(ie,asearchforeither hoagie,orpo’boywillretrievethesameinformation,ifbothtermshavebeenannotatedwiththesynonym code,"F63948").Optimally,thecanonicalconceptslistedinthenomenclatureareorganizedintoahierarchicalclassification.38,39 See Coding.See Autocoding.
Nonatomicity Nonatomicityistheassignmentofacollectionofobjectstoasingle,compositeobject,thatcannot befurthersimplifiedorsensiblydeconstructed.Forexample,thehumanbodyiscomposedoftrillionsof individualcells,eachofwhichlivesforsomelengthoftime,andthendies.Manyofthecellsinthebody arecapableofdividingtoproducemorecells.Inmanycases,thecellsofthebodythatarecapableofdividing canbeculturedandgrowninplasticcontainers,muchlikebacteriacanbeculturedandgrowninPetridishes.If thehumanbodyiscomposedofindividualcells,whydowehabituallythinkofeachhumanasasingleliving entity?Whydon’twethinkofhumansasbagsofindividualcells?Perhapsthereasonstemsfromthe coordinatedresponsesofcells.Whensomeonestepsonthecellsofyourtoe,thecellsinyourbrainsensepain, thecellsinyourmouthandvocalcordssayouch,andanarmyofinflammatorycellsrushtothesceneofthe crime.Thecellsinyourtoearenotcapableofregisteringanactionablecomplaint,withoutagreatdealof assistance.Anotherreasonthatorganisms,composedoftrillionsoflivingcells,aregenerallyconsideredtohave nonatomicity,probablyrelatestothe"species"conceptinbiology.Everycellinanorganismdescendedfrom thesamezygote,andeveryzygoteineverymemberofthesamespeciesdescendedfromthesameancestral organism.Hence,thereseemstobelittlebenefittoassigninguniqueentitystatustotheindividualcellsthat composeorganisms,whentheclassstructurefororganismsisbasedondescentthroughzygotes.See Species. Notation3 Alsocalledn3.Asyntaxforexpressingassertionsastriples(uniquesubject+metadata+data).Notation 3expressesthesameinformationasthemoreformalRDFsyntax,butn3iseasierforhumanstoread.40 RDFand n3areinterconvertible,andeitheronecanbeparsedandequivalentlytokenized(ie,brokenintoelementsthatcan bereorganizedinadifferentformat,suchasadatabaserecord).See RDF.See Triple.
Ontology Anontologyisacollectionofclassesandtheirrelationshipstooneanother.Ontologiesareusuallyrulebasedsystems(ie,membershipinaclassisdeterminedbyoneormoreclassrules).Twoimportantfeatures distinguishontologiesfromclassifications.Ontologiespermitclassestohavemorethanoneparentclass.For example,theclassofautomobilesmaybeadirectsubclassof"motorizeddevices"andadirectsubclassof "mechanizedtransporters."Inaddition,aninstanceofaclasscanbeaninstanceofanynumberofadditional classes.Forexample,aLamborghinimaybeamemberofclass"automobiles"andofclass"luxuryitems."This meansthatthelineageofaninstanceinanontologycanbehighlycomplex,withasingleinstanceoccurringin multipleclasses,andwithmanyconnectionsbetweenclasses.Becauserecursiverelationsarepermitted,itis possibletobuildanontologywhereinaclassisbothanancestorclassandadescendantclassofitself.Aclassificationisahighlyrestrainedontologywhereininstancescanbelongtoonlyoneclass,andeachclassmay haveonlyoneparentclass.Becauseclassificationshaveanenforcedlinearhierarchy,theycanbeeasily
modeled,andthelineageofanyinstancecanbetracedunambiguously.See Classification.See Multiclass classification.See Multiclassinheritance
OpenAccess Adocumentisopenaccessifitscompletecontentsareavailabletothepublic.Openaccessapplies todocumentsinthesamemannerasopensourceappliestosoftware.
OpenSource Softwareisopensourceifthesourcecodeisavailabletoanyonewhohasaccesstothesoftware.See Opensourcemovement.See Openaccess
OpenSourceMovement Opensourcesoftwareissoftwareforwhichthesourcecodeisavailable.TheOpen SourceSoftwaremovementisanoffspringoftheFreeSoftwaremovement.Althoughagooddealoffreesoftwareisno-costsoftware,theintendedmeaningoftheterm"free"isthatthesoftwarecanbeusedwithoutrestrictions.TheOpenSourceInitiativepostsanopensourcedefinitionRopaRandalistofapprovedlicensesthat canbeattachedtoopensourceproducts.
ParentClass Theimmediateancestor,orthenext-higherclass(ie,thedirectsuperclass)ofaclass.Forexample, intheclassificationoflivingorganisms,ClassVertebrataistheparentclassofClassGnathostomata.Class GnathostomataistheparentclassofClassTeleostomi.Inaclassification,whichimposessingleclassinheritance,eachchildclasshasexactlyoneparentclass;whereasoneparentclassmay haveseveraldifferent childclasses.Furthermore,someclasses,inparticular thebottomclassinthelineage,havenochildclasses (ie,aclassneednotalwaysbeasuperclassofotherclasses).Aclasscanbedefinedbyitsproperties,its membership(ie,theinstancesthatbelongtotheclass),andbythenameofitsparentclass.Whenwelist alloftheclassesinaclassification,inanyorder,wecanalwaysreconstructthecompleteclasslineage, intheircorrectlineageandbranchings,ifweknowthenameofeachclass’sparentclass.See Instance See Childclass.See Superclass .
RDF ResourceDescriptionFramework(RDF)isasyntaxinXMLnotationthatformallyexpressesassertionsas triples.TheRDFtripleconsistsofauniquelyidentifiedsubjectplusametadatadescriptorforthedataplusa dataelement.Triplesarenecessaryandsufficienttocreatestatementsthatconveymeaning.Triplescanbe aggregatedwithothertriplesfromthesamedatasetorfromotherdatasets,solongaseachtriplepertains toauniquesubjectthatisidentifiedequivalentlythroughthedatasets.EnormousdatasetsofRDFtriples canbemergedorfunctionallyintegratedwithothermassiveorcomplexdataresources.ForadetaileddiscussionseeOpenSourceToolsfor Chapter6,"SyntaxforTriples."See Notation3.See Semantics.See Triple See XML
RDFSchema ResourceDescriptionFrameworkSchema(RDFS).Adocumentcontainingalistofclasses,their definitions,andthenamesoftheparentclass(es)foreachclass.InanRDFSchema,thelistofclassesis typicallyfollowedbyalistofpropertiesthatapplytooneormoreclassesintheSchema.Tobeuseful, RDFSchemasarepostedontheInternet,asaWebpage,withauniqueWebaddress.Anyonecanincorporate theclassesandpropertiesofapublicRDFSchemaintotheirownRDFdocuments(publicorprivate)bylinking namedclassesandproperties,intheirRDFdocument,tothewebaddressoftheRDFSchemawheretheclasses andpropertiesaredefined.See Namespace.See RDFS.
RDFS SameasRDFSchema.
Reflection Aprogrammingtechniquewhereinacomputerprogramwillmodifyitself,atrun-time,basedoninformationitacquiresthroughintrospection.Forexample,acomputerprogrammayiterateoveracollectionofdata objects,examiningtheself-descriptiveinformationforeachobjectinthecollection(ie,objectintrospection).If theinformationindicatesthatthedataobjectbelongstoaparticularclassofobjects,thentheprogrammaycalla methodappropriatefortheclass.Theprogramexecutesinamannerdeterminedbydescriptiveinformation obtainedduringrun-time;metaphoricallyreflectinguponthepurposeofitscomputationaltask.See Introspection.
Reproducibility Reproducibilityisachievedwhenrepeatstudiesproducethesameresults,overandover.Reproducibilityiscloselyrelatedtovalidation,whichisachievedwhenyoudrawthesameconclusions,fromthe data,overandoveragain.Implicitintheconceptof"reproducibility"isthattheoriginalresearchmustsomehowconveythemeansbywhichthestudycanbereproduced.Thisusuallyrequiresthecarefulrecordingof methods,algorithms,andmaterials.Insomecases,reproducibilityrequiresaccesstothedataproducedinthe originalstudies.Ifthereisnofeasiblewayforscientiststoundertakeareconstructionoftheoriginalstudy,orif
theresultsobtainedintheoriginalstudycannotbeobtainedinsubsequentattempts,thenthestudyisirreproducible.Iftheworkisreproduced,buttheresultsandtheconclusionscannotberepeated,thenthestudyis consideredinvalid.See Validation.See Verification.
Script Ascriptisaprogramthatiswritteninplain-text,inasyntaxappropriateforaparticularprogramming language,thatneedstobeparsedthroughthatlanguage’sinterpreterbeforeitcanbecompiledandexecuted. Scriptstendtorunabitslowerthanexecutablefiles,buttheyhavetheadvantagethattheycanbeunderstoodby anyonewhoisfamiliarwiththescript’sprogramminglanguage.Scriptscanbeidentifiedbytheso-calledshebanglineatthetopofthescript.See Shebang.See Executablefile.See Haltingascript.
Semantics Thestudyofmeaning(Greekroot,semantikos,signficantmeaning).Inthecontextofdatascience,semanticsisthetechniqueofcreatingmeaningfulassertionsaboutdataobjects.Ameaningfulassertion,asused here,isatripleconsistingofanidentifieddataobject,adatavalue,andadescriptorforthedatavalue.Inpractical terms,semanticsinvolvesmakingassertionsaboutdataobjects(ie,makingtriples),combiningassertionsabout dataobjects(ie,mergingtriples),andassigningdataobjectstoclasses;hencerelatingtriplestoothertriples.Asa wordofwarning,fewinformaticianswoulddefinesemanticsintheseterms,butmostdefinitionsforsemantics arefunctionallyequivalenttothedefinitionofferedhere.Mostlanguageisunstructuredandmeaningless.Considertheassertion:Samistired.Thisisanadequatelystructuredsentencewithasubjectverbandobject.Butwhat isthemeaningofthesentence?TherearealotofpeoplenamedSam.WhichSamisbeingreferredtointhis sentence?WhatdoesitmeantosaythatSamistired?Is"tiredness"aconstitutivepropertyofSam,ordoes itonlyapplytospecificmoments?Ifso,forwhatmomentintimeistheassertion,"Samistired"actuallytrue? Toacomputer,meaningcomesfromassertionsthathaveaspecific,identifiedsubjectassociatedwithsomesensiblepieceoffullydescribeddata(metadatacoupledwiththedataitdescribes).See Triple.See RDF.
Shebang StandardscriptswritteninRuby,Perl,orPythonallbeginwithashebang,acolloquialismreferringto theconcatenationofthepoundcharacter,"#"(knownbythemusically-inclinedasaSHarpcharacter),followed byanexclamationsign,"!"(connotingasurpriseorabang).Atypicalshebangline(ie,topline)forPerl,Ruby, Python,andBashscriptsis:
#!/usr/local/bin/perl
#!/usr/local/bin/ruby
#!/usr/local/bin/python
#!/usr/local/bin/sh
Ineachcase,theshebangisfollowedbythedirectorypathtothescript,andthisistraditionallyfollowedby optionalprogrammingargumentsspecifictoeachlanguage.Theshebangline,thoughessentialinsomeUnixlikesystems,isunnecessaryintheWindowsoperatingsystem.Inthisbook,Iusetheshebanglinetoindicate, ataglance,thelanguageinwhichascriptiscomposed.
Simpson’sParadox Occurswhenacorrelationthatholdsintwodifferentdatasetsisreversedifthedatasetsare combined.Forexample,baseballplayerAmayhaveahigherbattingaveragethanplayerBforeachoftwo seasons,butwhenthedataforthetwoseasonsarecombined,playerBmayhavethehigher2-seasonaverage. Simpson’sparadoxisjustoneexampleofunexpectedchangesinoutcomewhenvariablesareunknowingly hiddenorblended.41
Species Speciesisthebottom-mostclassofanyclassificationorontology.Becausethespeciesclasscontainsthe individualobjectsoftheclassification,itistheonlyclasswhichisnotabstract.Thespecialsignificanceofthe speciesclassisbestexemplifiedintheclassificationoflivingorganisms.Everyspeciesoforganismcontains individualsthatshareacommonancestralrelationshiptooneanother.Whenwelookatagroupofsquirrels,we knowthateachsquirrelinthegrouphasitsownuniquepersonality,itsownuniquegenes(ie,genotype),andits ownuniquesetofphysicalfeatures(ie,phenotype).Moreover,althoughtheDNAsequencesofindividual squirrelsareunique,weassumethatthereisacommonalitytothegenomeofsquirrelsthatdistinguishesit fromthegenomeofeveryotherspecies.Ifweusethemoderndefinitionofspeciesasanevolvinggenepool,
weseethatthespeciescanbethoughtofasabiologicallifeform,withsubstance(apopulationofpropagating genes),andafunction(evolvingtoproducenewspecies).42–44 Putsimply,speciesspeciate;individualsdonot. Asacorollary,speciesevolve;individualssimplypropagate.Hence,thespeciesclassisaseparablebiological unitwithformandfunction.We,asindividuals,arefocusedonthelivesofindividualthings,andwemustbe remindedoftheroleofspeciesinbiologicalandnonbiologicalclassifications.TheconceptofspeciesisdiscussedingreaterdetailinSection6.4.See Blendedclass.See Nonatomicity
Specification Aspecificationisaformalmethodfordescribingobjects(physicalobjectssuchasnutsandboltsor symbolicobjects,suchasnumbers,orconceptsexpressedastext).Ingeneral,specificationsdonotrequirethe inclusionofspecificitemsofinformation(ietheydonotimposerestrictionsonthecontentthatisincludedinor excludedfromdocuments),andspecificationsdonotimposeanyorderofappearanceofthedatacontainedin thedocument(ie,youcanmixupandrearrangespecifiedobjects,ifyoulike).Specificationsarenotgenerally certifiedbyastandardsorganization.Theyaregenerallyproducedbyspecialinterestorganizations,andthe legitimacyofaspecificationdependsonitspopularity.ExamplesofspecificationsareRDF(ResourceDescriptionFramework)producedbytheW3C(WorldWideWebConsortium),andTCP/IP(TransferControl Protocol/InternetProtocol),maintainedbytheInternetEngineeringTaskForce.Themostwidelyimplemented specificationsaresimpleandeasilyimplemented.See Specificationversusstandard.
SpecificationVersusStandard Datastandards,ingeneral,tellyouwhatmustbeincludedinaconformingdocument,and,inmostcases,dictatestheformatofthefinaldocument.Inmanyinstances,standardsbarinclusionof anydatathatisnotincludedinthestandard(eg,youshouldnotincludeastronomicaldatainanclinicalx-ray report).Specificationssimplyprovideaformalwayfordescribingthedatathatyouchoosetoincludeinyour document.XMLandRDF,asemanticdialectofXML,areexamplesofspecifications.Theybothtellyou howdatashouldberepresented,butneithertellyouwhatdatatoinclude,orhowyourdocumentordatasetshould appear.Filesthatcomplywithastandardarerigidlyorganizedandcanbeeasilyparsedandmanipulatedbysoftwarespecificallydesignedtoadheretothestandard.Filesthatcomplywithaspecificationaretypicallyselfdescribingdocumentsthatcontainwithinthemselvesalltheinformationnecessaryforahumanoracomputer toderivemeaningfromthefilecontents.Intheory,filesthatcomplywithaspecificationcanbeparsedandmanipulatedbygeneralizedsoftwaredesignedtoparsethemarkuplanguageofthespecification(eg,XML,RDF)and toorganizethedataintodatastructuresdefinedwithinthefile.Therelativestrengthsandweaknessesofstandards andspecificationsarediscussedinSection2.6,"SpecificationsGood,StandardsBad."See XML.See RDF.
Superclass Anyoftheancestralclassesofasubclass.Forexample,intheclassificationoflivingorganisms,the classofvertebratesisasuperclassoftheclassofmammals.Theimmediatesuperclassofaclassisitsparent class.Incommonparlance,whenwespeakofthesuperclassofaclass,weareusuallyreferringtoitsparent class.See Parentclass
Syntax Syntaxisthestandardformorstructureofastatement.WhatweknowasEnglishgrammarisequivalentto thesyntaxfortheEnglishlanguage.CharlesMeaddistinctlysummarizedthedifferencebetweensyntaxand semantics:"Syntaxisstructure;semanticsismeaning."45 See Semantics.
TaxonomicOrder Inbiologicaltaxonomy,thehierarchicallineageoforganismsaredividedintoadescending listofnamedorders:Kingdom,Phylum(Division),Class,Order,Family,andGenus,Species.Aswehave learnedmoreandmoreabouttheclassesoforganisms,moderntaxonomistshaveaddedadditionalranksto theclassification(eg,supraphylum,subphylum,suborder,infraclass,etc.).Wasthisreallynecessary?All ofthistaxonomiccomplexitycouldbeavertedbydroppingnamedranksandsimplyreferringtoeveryclass as"Class."Modernspecificationsforclasshierarchies(eg,RDFSchema)encapsulateeachclasswiththename ofitssuperclass.Wheneveryobjectyieldsitsclassandsuperclass,itispossibletotraceanyobject’sclass lineage.Forexample,intheclassificationoflivingorganisms,ifyouknowthenameoftheparentforeach class,youcanwriteasimplescriptthatgeneratesthecompleteancestrallineageforeveryclassandspecies withintheclassification.46 See Class.See Taxonomy.See RDFSchema.See Species Taxonomy Ataxonomyisthecollectionofnamedinstances(classmembers)inaclassificationoranontology. Whenyouseeaschematicshowingclassrelationships,withindividualclassesrepresentedbygeometric shapesandtherelationshipsrepresentedbyarrowsorconnectinglinesbetweentheclasses,thenyouare
essentiallylookingatthestructureofaclassification,minusthetaxonomy.Youcanthinkofbuildingataxonomyastheactofpouringallofthenamesofalloftheinstancesintotheirproperclasses.Ataxonomyis similartoanomenclature;thedifferenceisthatinataxonomy,everynamedinstancemusthaveanassigned class.See Taxonomicorder
Terminology Thecollectionofwordsandtermsusedinsomeparticulardiscipline,field,orknowledgedomain. Nearlysynonymouswithvocabularyandwithnomenclature.Vocabularies,unliketerminologies,arenottobe confinedtothetermsusedinaparticularfield.Nomenclatures,unliketerminologies,usuallyaggregateequivalenttermsunderacanonicalsynonym.
Thesaurus Avocabularythatgroupstogethersynonymousterms.Athesaurusisverysimilartoanomenclature. Therearetwominordifferences.Nomenclaturesincludedmulti-wordterms;whereasathesaurusistypically composedofone-wordterms.Inaddition,nomenclaturesaretypicallyrestrictedtoawell-definedtopicor knowledgedomain(eg,namesofstars,infectiousdiseases,etc.).See Nomenclature.See Vocabulary.See Classification.See Dictionary.See Terminology.See Ontology
Triple Incomputersemantics,atripleisanidentifieddataobjectassociatedwithadataelementandthedescriptionofthedataelement.Inthecomputerscienceliterature,thesyntaxforthetripleiscommonlydescribedas: subject,predicate,object,whereinthesubjectisanidentifier,thepredicateisthedescriptionoftheobject,and theobjectisthedata.Thedefinitionoftriple,usinggrammaticalterms,canbeoff-puttingtothedatascientist, whomaythinkintermsofspreadsheetentries:akeythatidentifiesthelinerecord,acolumnheadercontaining themetadatadescriptionofthedata,andacellthatcontainsthedata.Inthisbook,thethreecomponentsofa triplearedescribedas:(1)theidentifierforthedataobject,(2)themetadatathatdescribesthedata,and(3)the dataitself.Intheory,alldatasets,databases,andspreadsheetscanbeconstructedordeconstructedascollectionsoftriples.See Introspection.See Dataobject.See Semantics.See RDF.See Meaning UnclassifiableObjects Classificationscreateaclassforeveryobjectandtaxonomiesassigneachandeveryobjecttoitscorrectclass.Thismeansthataclassificationisnotpermittedtocontainunclassifiedobjects;aconditionthatputsfussytaxonomistsinanuntenableposition.Supposeyouhaveanobject,andyousimplydonot knowenoughabouttheobjecttoconfidentlyassignittoaclass.Or,supposeyouhaveanobjectthatseemstofit morethanoneclass,andyoucan’tdecidewhichclassisthecorrectclass.Whatdoyoudo?Historically,scientistshaveresortedtocreatinga"miscellaneous"classintowhichotherwiseunclassifiableobjectsaregivena temporaryhome,untilmoresuitableaccommodationscanbeprovided.Ihavespokenwithnumerousdata managers,andeveryoneseemstobeofamindthat"miscellaneous"classes,createdasastopgapmeasure, serveausefulpurpose.Notso.Historically,thepromiscuousapplicationof"miscellaneous"classeshave proventobeahugeimpedimenttotheadvancementofscience.Inthecaseoftheclassificationoflivingorganisms,theclassofprotozoansstandsasacaseinpoint.ErnstHaeckel,aleadingbiologicaltaxonomistinhis time,createdtheKingdomProtista(ie,protozoans),in1866,toaccommodateawidevarietyofsimpleorganismswithsuperficialcommonalities.Haeckelhimselfunderstoodthattheprotistswereablendedclassthat includedunrelatedorganisms,buthebelievedthatfurtherstudywouldresolvetheconfusion.Inasense, hewasright,buttheprocesstookmuchlongerthanhehadanticipated;occupyinggenerationsoftaxonomists overthefollowing150years.Today,KingdomProtistanolongerexists.Itsmembershavebeenreassignedto otherclasses.Nonetheless,textbooksofmicrobiologystilldescribetheprotozoans,justasthoughthisname continuedtooccupyalegitimateplaceamongterrestrialorganisms.Inthemeantime,therapeuticopportunities foreradicatingso-calledprotozoalinfections,usingclass-targetedagents,havenodoubtbeenmissed.47 You mightthinkthatthecreationofaclassoflivingorganisms,withnoestablishedscientificrelationtothereal world,wasarareandancienteventintheannalsofbiology,havinglittleornochanceofbeingrepeated.Notso. Aspecialpseudoclassoffungi,deuteromyctetes(spelledwithalowercase"d,"signifyingitsquestionablevalidityasatruebiologicclass)hasbeencreatedtoholdfungiofindeterminatespeciation.Atpresent,thereare severalthousandsuchfungi,sittinginataxonomiclimbo,waitingtobeplacedintoadefinitivetaxonomic class.47, 48 See Blendedclass.
UndifferentiatedSoftware Intellectualpropertydisputeshavedrivendeveloperstodividesoftwareintotwocategories:undifferentiatedsoftwareanddifferentiatedsoftware.Undifferentiatedsoftwarecomprisesthefundamentalalgorithmsthateveryoneuseswhenevertheydevelopanewsoftwareapplication.Itisinnobody’s interesttoassignpatentstobasicalgorithmsandtheirimplementations.Nobodywantstodevotetheircareers toprosecutingordefendingtenuouslegalclaimsovertheownershipofthefundamentalbuildingblocksof computerscience.Differentiatedsoftwarecomprisescustomizedapplicationsthataresufficientlynewand differentfromanyprecedingproductthatpatentprotectionwouldbereasonable.
Utility Inthecontextofsoftware,autilityisanapplicationthatisdedicatedtoperformingonespecifictaskvery wellandveryfast.Inmostinstances,utilitiesareshortprograms,oftenrunningfromthecommandline,and thuslackinganygraphicuserinterface.Manyutilitiesareavailableatnocost,withopensourcecode.Ingeneral,simpleutilitiesarepreferabletomultipurposesoftwareapplications.29 Remember,anapplicationthat claimstodoeverythingfortheuseris,mostoften,anapplicationthatrequirestheusertodoeverything fortheapplication. See Commandline.See Commandlineutility
Validation Validationistheprocessthatcheckswhethertheconclusionsdrawnfromdataanalysisarecorrect.49 Validationusuallystartswithrepeatingthesameanalysisofthesamedata,usingthemethodsthatwereoriginallyrecommended.Obviously,ifadifferentsetofconclusionsisdrawnfromthesamedataandmethods,the originalconclusionscannotbevalidated.Validationmayinvolveapplyingadifferentsetofanalyticmethods tothesamedata,todetermineiftheconclusionsareconsistent.Itisalwaysreassuringtoknowthatconclusions arerepeatable,withdifferentanalyticmethods.Inprioreras,experimentswerevalidatedbyrepeatingtheentireexperiment,thusproducinganewsetofobservationsforanalysis.Manyoftoday’sscientificexperiments arefartoocomplexandcostlytorepeat.Insuchcases,validationrequiresaccesstothecompletecollectionof theoriginaldata,andtothedetailedprotocolsunderwhichthedatawasgenerated.Oneofthemostuseful methodsofdatavalidationinvolvestestingnewhypotheses,basedontheassumedvalidityoftheoriginal conclusions.Forexample,ifyouweretoacceptDarwin’sanalysisofbarnacledata,leadingtohistheory ofevolution,thenyouwouldexpecttofindachronologichistoryoffossilsinascendinglayersofshale.This wasthecase;thus,paleontologistsstudyingtheBurgessshalereservesprovidedsomevalidationtoDarwin’s conclusions.Validationshouldnotbemistakenforproof.Nonetheless,therepeatabilityofconclusions,over time,withthesameordifferentsetsofdata,andthedemonstrationofconsistencywithrelatedobservations,is aboutallthatwecanhopeforinthisimperfectworld.See Verification.See Reproducibility.
Verification Theprocessbywhichdataischeckedtodeterminewhetherthedatawasobtainedproperly(ie, accordingtoapprovedprotocols),andthatthedataaccuratelymeasuredwhatitwasintendedtomeasure, onthecorrectspecimens,andthatallstepsindataprocessingweredoneusingwell-documentedprotocols. Verificationoftenrequiresalevelofexpertisethatisatleastashighastheexpertiseoftheindividuals whoproducedthedata.49 Dataverificationrequiresafullunderstandingofthemanystepsinvolvedindata collectionandcanbeatime-consumingtask.Inonecelebratedcase,inwhichtwostatisticiansrevieweda microarraystudyperformedatDukeUniversity,thetimedevotedtotheirverificationeffortwas reportedtobe2000hours.50 Toputthisstatementinperspective,theofficialwork-year,accordingtothe U.S.OfficeofPersonnelManagement,is2087hours.Verificationisdifferentfromvalidation.Verification isperformedondata;validationisdoneontheresultsofdataanalysis.See Validation.See Microarray See Introspection.
Vocabulary Acomprehensivecollectionofthewordsandtheirassociatedmeanings.Insomequarters,"vocabulary"and"nomenclature"areusedinterchangeably,buttheyaredifferentfromoneanother.Nomenclatures typicallyfocusontermsconfinedtooneknowledgedomain.Nomenclaturestypicallydonotcontaindefinitionsforthecontainedterms.Nomenclaturestypicallygrouptermsbysynonymy.Lastly,nomenclaturesincludemulti-wordterms.Vocabulariesarecollectionsofsinglewords,culledfrommultipleknowledge domains,withtheirdefinitions,andassembledinalphabeticorder.See Nomenclature.See Thesaurus.See Taxonomy.See Dictionary.See Terminology.
XML AcronymforeXtensibleMarkupLanguage,asyntaxformarkingdatavalueswithdescriptors (ie,metadata).Thedescriptorsarecommonlyknownastags.InXML,everydatavalueisenclosedbya start-tag,containingthedescriptorandindicatingthatavaluewillfollow,andanend-tag,containingthesame descriptorandindicatingthatavalueprecededthetag.Forexample: <name>ConradNervig </name >.The enclosinganglebrackets,"<>",andtheend-tagmarker,"/",arehallmarksofHTMLandXMLmarkup.This simplebutpowerfulrelationshipbetweenmetadataanddataallowsustoemploymetadata/datapairsasthough eachwereaminiaturedatabase.ThesemanticvalueofXMLbecomesapparentwhenwebindametadata/data pairtoauniqueobject,formingaso-calledtriple.See Triple.See Meaning.See Semantics.See HTML.
1. KappelmanLA,McKeemanR,LixuanZhangL.EarlywarningsignsofITprojectfailure:thedominant dozen. InformationSystemsManagement 2006;23:31–6.
2. ArquillaJ.ThePentagon’sbiggestboondoggles. TheNewYorkTimes(OpinionPages) March12,2011.
3. LohrS.LessonsfromBritain’sHealthInformationTechnologyFiasco. TheNewYorkTimes September27, 2011.
4.DepartmentofHealthMediaCentre. DismantlingtheNHSNationalProgrammeforIT PressRelease September22,2011.Availablefrom: http://mediacentre.dh.gov.uk/2011/09/22/dismantling-the-nhs-nationalprogramme-for-it/ [accessed12.06.2012].
5. WhittakerZ.UK’sdelayednationalhealthITprogrammeofficiallyscrapped. ZDNet September22,2011.
6. LohrS.Googletoendhealthrecordsserviceafteritfailstoattractusers. TheNewYorkTimes Jun24,2011.
7. SchwartzE.Shoppingforhealthsoftware,somedoctorsgetbuyer’sremorse. TheHuffingtonPostInvestigativeFund January29,2010.
8.HeeksR,MundyD,SalazarA. Whyhealthcareinformationsystemssucceedorfail.Manchester:Institutefor DevelopmentPolicyandManagement,UniversityofManchester;June1999.Availablefrom: http://www.sed. manchester.ac.uk/idpm/research/publications/wp/igovernment/igov_wp09.htm.[accessed12.07.2012].
9. BeizerB. Softwaretestingtechniques. 2nded.Hoboken,NJ:VanNostrandReinhold;1990.
10. UnreliableResearch.Troubleatthelab. TheEconomist October19,2013.
11. KolataG.Cancerfight:uncleartestsfornewdrug. TheNewYorkTimes April19,2010.
12. IoannidisJP.Whymostpublishedresearchfindingsarefalse. PLoSMed 2005;2:e124.
13. BakerM.Reproducibilitycrisis:blameitontheantibodies. Nature 2015;521:274–6.
14. NaikG.Scientists’elusivegoal:reproducingstudyresults. WallStreetJournal December2,2011.
15. InnovationorStagnation. Challengeandopportunityonthecriticalpathtonewmedicalproducts. Silver Spring,MD:U.S.DepartmentofHealthandHumanServices,FoodandDrugAdministration;2004.
16. HurleyD.Whyaresofewblockbusterdrugsinventedtoday? TheNewYorkTimes November13,2014.
17. AngellM.Thetruthaboutthedrugcompanies. TheNewYorkReviewofBooks July15,2004;vol.51.
18. CrossingtheQualityChasm. Anewhealthsystemforthe21stcentury. Washington,DC:QualityofHealth CareinAmericaCommittee,InstituteofMedicine;2001.
19. WurtmanRJ,BettikerRL.Theslowingoftreatmentdiscovery,1965–1995. NatMed 1996;2:5–6.
20. Ioannidis JP.Microarraysandmolecularresearch:noisediscovery? Lancet 2005;365:454–5.
21. WeigeltB,Reis-FilhoJS.Molecularprofilingcurrentlyoffersnomorethantumourmorphologyandbasic immunohistochemistry. BreastCancerRes 2010;12:S5.
22.TheRoyalSociety. Personalisedmedicines:hopesandrealities.London:TheRoyalSociety;2005.Available from: https://royalsociety.org/ /media/Royal_Society_Content/policy/publications/2005/9631.pdf [accessed 01.01.2015].
23. VlasicB.Toyota’sslowawakeningtoadeadlyproblem. TheNewYorkTimes February1,2010.
JulesBermanholdstwobachelorofsciencedegrees fromMIT(Mathematics,andEarthandPlanetary Sciences),aPhDfromTempleUniversity,andan MD,fromtheUniversityofMiami.HewasagraduateresearcherintheFelsCancerResearchInstitute,atTempleUniversity,andattheAmerican HealthFoundationinValhalla,NewYork.His post-doctoralstudieswerecompletedattheU.S.NationalInstitutesofHealth,andhisresidencywas completedattheGeorgeWashingtonUniversity MedicalCenterinWashington,D.C.Dr.Berman servedasChiefofAnatomicPathology,SurgicalPathology,andCytopathologyattheVeteransAdministrationMedicalCenterinBaltimore,Maryland, whereheheldjointappointmentsattheUniversity ofMarylandMedicalCenterandattheJohnsHopkinsMedicalInstitutions.In1998,hetransferredto theU.S.NationalInstitutesofHealth,asaMedical Officer,andastheProgramDirectorforPathology InformaticsintheCancerDiagnosisProgramatthe NationalCancerInstitute.Dr.BermanisapastpresidentoftheAssociationforPathologyInformatics, andthe2011recipientoftheassociation’sLifetimeAchievementAward.Heisalistedauthoronover 200scientificpublicationsandhaswrittenmorethanadozenbooksinhisthreeareasofexpertise: informatics,computerprogramming,andcancerbiology.Dr.Bermaniscurrentlyafreelancewriter.