How to Detect Online Hate towards Migrants and Refugees?

Page 1

sustainability

Article

HowtoDetectOnlineHatetowardsMigrantsandRefugees? DevelopingandEvaluatingaClassifierofRacistand XenophobicHateSpeechUsingShallowandDeepLearning

CarlosArcila-Calderón 1,* ,JavierJ.Amores 1 ,PatriciaSánchez-Holgado 1 ,LazarosVrysis 2 , NikolaosVryzas 2 andMartínOllerAlonso 3

1 FacultaddeCienciasSociales,CampusUnamuno,UniversityofSalamanca,37007Salamanca,Spain

2 MultidisciplinaryMedia&MediatedCommunicationResearchGroup(M3C), AristotleUniversityofThessaloniki,54124Thessaloniki,Greece

3 DepartmentofSocialandPoliticalSciences,Università degliStudidiMilano,20122Milano,Italy

* Correspondence:carcila@usal.es

Citation: Arcila-Calderón,C.; Amores,J.J.;Sánchez-Holgado,P.; Vrysis,L.;Vryzas,N.;OllerAlonso, M.HowtoDetectOnlineHate towardsMigrantsandRefugees? DevelopingandEvaluatinga ClassifierofRacistandXenophobic HateSpeechUsingShallowandDeep Learning. Sustainability 2022, 14, 13094.https://doi.org/10.3390/ su142013094

AcademicEditors:StefanoRuggieri andAlessiaPassanisi

Received:19September2022

Accepted:11October2022

Published:13October2022

Publisher’sNote: MDPIstaysneutral withregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations.

Copyright: ©2022bytheauthors. LicenseeMDPI,Basel,Switzerland. Thisarticleisanopenaccessarticle distributedunderthetermsand conditionsoftheCreativeCommons Attribution(CCBY)license(https:// creativecommons.org/licenses/by/ 4.0/).

Abstract: Hatespeechspreadingonlineisamatterofgrowingconcernsincesocialmediaallowsfor itsrapid,uncontrolled,andmassivedissemination.Forthisreason,severalresearchersarealready workingonthedevelopmentofprototypesthatallowforthedetectionofcyberhateautomatically andonalargescale.However,mostofthemaredevelopedtodetecthateonlyinEnglish,andvery fewfocusspecificallyonracismandxenophobia,thecategoryofdiscriminationinwhichthemost hatecrimesarerecordedeachyear.Inaddition,adhocdatasetsmanuallygeneratedbyseveral trainedcodersarerarelyusedinthedevelopmentoftheseprototypessincealmostallresearchers usealreadyavailabledatasets.Theobjectiveofthisresearchistoovercomethelimitationsofthose previousworksbydevelopingandevaluatingclassificationmodelscapableofdetectingracistand/or xenophobichatespeechbeingspreadonline,firstinSpanish,andlaterinGreekandItalian.Inthe developmentoftheseprototypes,threedifferentiatedmachinelearningstrategiesaretested.First, varioustraditionalshallowlearningalgorithmsareused.Second,deeplearningisused,specifically, anadhocdevelopedRNNmodel.Finally,aBERT-basedmodelisdevelopedinwhichtransformers andneuralnetworksareused.Theresultsconfirmthatdeeplearningstrategiesperformbetterin detectinganti-immigrationhatespeechonline.Itisforthisreasonthatthedeeparchitectureswerethe onesfinallyimprovedandtestedforhatespeechdetectioninGreekandItalianandinmultisource. Theresultsofthisstudyrepresentanadvanceinthescientificliteratureinthisfieldofresearch,since uptonow,noonlineanti-immigrationhatedetectorshadbeentestedintheselanguagesandusing thistypeofdeeparchitecture.

Keywords: hatespeech;racism;xenophobia;migration;socialmedia;deeplearning

1.Introduction

Violentspeechisnotanexclusivecommunicationaldysfunctionofourcontemporary societies,butitistoday,whenitseemsmoreworryingthaneverduetoitsmassivediffusion ondigitalplatforms.Theinternetandinformationandcommunicationtechnologieshave todayallowedonlinehatespeechtoincreaseunabated.Inthisnewcontext,socialmediahas becometheforuminwhichthistypeofmessagespreadsmorequicklyanduncontrollably, asevidencedbythelatestreportspublishedbytheAnti-DefamationLeague[1,2].This growthinonlinehatespeechalsocoincideswithanunstoppableincreaseinregistered hatecrimesinEurope[3],whichcouldevidencethecorrelationbetweenbothphenomena pointedoutbyMüllerandSchwarz[4].Moreover,ifthisconnectionisso,sincemostofthe hatecrimescommittedinEuropeareduetoracistand/orxenophobicreasons(according tothedatacollectedbytheOSCE’shatecrimereporting),wecouldaffirmthatmostof theincreasinghatespeechthatisspreadonlineisbasedonthistypeofdiscrimination

,13094.https://doi.org/10.3390/su142013094https://www.mdpi.com/journal/sustainability

Sustainability 2022
14
,

andisaimedmainlytowardsmigrants,refugees,andasylumseekerswhocometoorare withinEuropeanborders.Inthissameline,recentworksdevelopedbytheauthorshave evidencedanegativetrendintherepresentationofmigrantsandrefugeesthatisspread bythemainmediaofMediterraneancountries[5]andinWesternEurope[6],whichcould bealsorelatedtotheincreaseinracistandxenophobichate.Otherstudiesalsoindicate thephenomenonofderesponsibilisation[7]ofhatespeechspreadingonline,especiallyby youngeraudienceswhofeelthattheirpublichatelanguageshouldnotbetakenseriously.

Withthesepremises,someresearchershaveunderstoodthatisurgenttoexplorenew methodsfordetectingandpreventingonlinehateatthegloballevel,butalsoinregional contexts,whereonlineandofflinehatehasnotstoppedincreasingeither.Forthisreason, inrecentyears,diversepublicandprivateinstitutionshavebeenmakinggreateffortstotry todetectandcounterhatespeechonline,althoughmostlyinageneralwayandnotdealing specificallywithracistandxenophobichate.Inaddition,thelargeamountofinformation offeredondigitalplatformstodaymakesitmoredifficultthanevertomonitor,detect,and combatthesehatefulcontents.This,inturn,meansthatvictimsofonlinehatemightbe increasing,somethingthat,intheSpanishcase,thelatestRaxenreports[8,9]show,even thoughmostincidentsmightnotberecorded.Inthissituation,itisimportanttotryto developnewmethodologicalstrategiesthatallowustomonitortheseviolentspeechesthat spreadonsocialplatforms,payingspecialattentiontoracistandxenophobichate.Taking thisintoaccount,itissurprisingthat,althoughtherearealreadyseveralresearcherswho areaddressingthisprobleminEnglish,therearestillsofewresearcherswhoaredoingitin otherspeakingcontexts,focusingspecificallyonanti-immigrationhateonline,thecategory thatinternationallyworriesthemost[10,11].

Withthesepremises,theaimofthisworkistogenerateandtestadetectorofracistand xenophobiconlinehatespeech.Thenoveltiesofthisworkcomparedtothepreviousones areseveral.Inthefirstplace,forthedevelopmentoftheprototypes,wewillgenerateadhoc datasetswithmessagesextractedfromsocialmedia,manuallyannotatedbyseveraltrained coders.Second,inadditiontoshallowlearningalgorithms,deeparchitectureswillalsobe usedinthedevelopmentoftheprototypes.Ontheonehand,anadhocdevelopedRNN modelisdeveloped,andontheotherhand,aBERT-basedmodelisdeveloped,inwhich transformersandneuralnetworkswillbeused.Finally,althoughtheprototypeswillbe trainedandtestedfirstintheSpanishlanguage,theoneswiththebestperformancewilllater beimprovedandretrainedfollowingthesamestrategysothattheyarecapableofdetecting anti-immigrationhatespeechalsoinGreekandItalian(Mediterraneancountriesthatare themaingatewayformostimmigrantsarrivinginEurope)andinmultiplesources.These detectorswillidentifyincreasesinthistypeofcyberhateanddeveloptailoredprograms tocombatandcounterit,butalsowillacquireempiricalknowledgeabouttheseviolent speeches,aboutthegroupstowhichtheyareaddressed,aboutthesourcesorprofiles propagatinghate,andlastly,abouthowthesetypesofmessagescouldbetriggeringhate crimesinthephysicalenvironment.Inshort,thisprototypewillallowustounderstandthe spreadofonlinehateaimedtowardsdisplacedpeopleandtodevisestrategiestocounteract andpreventitspossibleeffects,includingphysicalhatecrimes.

2.DefiningOnlineHateSpeech

Hatespeechisnotanewconcern;in1997Calvertalreadypointedtothistypeof discourseasaproblemtoanalyse,understand,andcombatwithcommunicationalapproaches,necessarilyinvolvingallelementsofthecommunicationtransmissionmodels. However,hatespeechhasbecomeabiggerconcerntodayduetotherapidgrowthof digitalmedia,especiallysocialmedia,inwhichformerreadersandaudienceshavebecome prosumers[12],withmoreandmorefollowerstowhomtheycanlaunchtheirmessages andcontent.Inaddition,nowadays,itseemsthatthemoresensationalistthiscontentis, themorefollowersitgets.Withoutadoubt,inthisnewdimensionofimmediacyand freedom,itismucheasierforhatefulmessagestospreadquicklyandwithoutanykind ofcontrol.Forthisreason,hatespeechhasnotstoppedincreasinginrecentyears,and

Sustainability 2022, 14,13094 2of16

thatiswhyithasbecomesomethingsocomplexanddifficulttodetectandcombat,as wellasnecessaryandurgent.Thisiswhatthepresentworktriestosolvethroughanovel computationalstrategythatseekstoautomaticallydetectlatenthateinthemessagesspread throughTwitter,primarilyinSpanishand,secondarily,inGreekandItalian.However, beforetacklinganyonlinehatespeechdetectionstrategy,itfirstneedstobedefined.

Thus,fromatheoreticalapproach,hatespeechisunderstoodasthepromotionof messagesthatimplyrejection,contempt,humiliation,harassment,discrediting,andstigmatizationofpeopleorsocialgroupsbasedonattributes,suchasnationalityorcolourof skin.Thus,forspeechtobeconsideredhateful,oneofthemainconditionsisthatthe discriminatorymessageisdirectedtowardsoneofthevulnerablegroupstypifiedinthe Europeanframework,ortowardsanindividualwhocanbeidentifiedaspartofoneof thosecollectives,whoserejectionismotivatedbytheirapparentbelongingtothegroup. Inthissense,theCouncilofEurope[13]addsthatforspeechtobeunderstoodasahate crime,itmustpropagate,incite,promote,orjustifyracism,xenophobia,anti-Semitism,and otherformsofintolerance.TheEuropeanCommissionagainstRacismandIntolerance,in itsGeneralRecommendationNo.15[14],alsospecifiesthathatredcanbemotivatedby reasonsofrace,colour,ancestry,national,orethnicoriginamongmanyothercharacteristics orpersonalconditions.Ascanbeseen,theofficialdefinitionsofhatespeechpayspecial attentiontoracistand/orxenophobicdiscriminationasthemaincauseofalltypesof rejectionandhate.Foritspart,theMinistryoftheInteriorofSpain,initslatestevaluation reportonhatecrimesinSpain[15],collectedatotalof11categoriesofdiscrimination intowhichcrimescommittedagainstvulnerableaudiencescanbeclassified,whereracism and/orxenophobiaarethefirst,inwhichmorecrimesareregisteredeveryyear.

Intheacademicsphere,someauthorssuchasMiró Llinares[16]havealsostudiedhate speechoffering,inadditiontoabroaddefinition,ataxonomywithdifferentlevelsofhate online.Thus,accordingtothisauthor,itispossibleandnecessarytodifferentiatebetween thetypethehatespeechthatcouldconstituteacrime,fromthespeechthat,evenexpressing rejectionandintoleranceofcertainvulnerablegroups,canbeframedwithinthemargins offreedomofexpression.Thesetypesofmessageswouldincludeslightinsults,criticism, andoffensestoindividualorcollectivesensitivity,whichinsomecases,couldbeanattack onpeople’sdignity,butnotahatecrime.Regardingillegalhatespeech,thesetypesof messageswouldincludeallthosethatarespreadinapublicandmassivecontextandthat moredirectlyandexplicitlyinciteviolence,intimidation,hostility,ordiscriminationagainst avulnerablegrouporanindividualbelongingtoavulnerablegroup—inthecaseofracist and/orxenophobichate,theywouldbemigrants,refugees,asylumseekers,andallkinds ofstigmatizedraces,ethnicities,andnationalities.Withthesepremises,thepresentwork coversallthetypifiedlevelsofhate,tryingtoextendhatespeechdetectionasmuchas possible,consideringthatthemostexplicithate(whichcouldbeconsideredacrime)is considerablyreducedinEuropeancontexts.However,itisexpectedthatinthetraining processandingeneratingthemachinelearningmodels,thefinalprototypewillberefined, anddetectionwillfinallybelimitedtothemostexplicitlevelsofhate.

3.DetectingRacistandXenophobicHateSpeechOnline

Inrecentyears,manyauthorshavestudiedhatespeechonlinefromverydifferent perspectives.ChettyandAlathur[17]analyseditfromthejurisprudentialbasis,concluding thatappropriatepoliticalmeasuresaswellastheactionsofsocialplatformsareessential toeffectivelycounteracthatespeech.Otherauthors,suchasElSheriefetal.[18],analysed itusingadata-basedlinguisticandpsycholinguisticperspective,offeringaframework ofunderstandingfromwhichtoidentifythehatethatisspreadonsocialmedia.Witha moreautomatedandmassivedetectionapproach,Mondal,Silva,andBenevenuto[19] proposedasystemformeasuringandmonitoringhatespeechpropagatedonthesocial networksTwitterandWhisperbasedonspecifickeywordsandexpressions,focusingonthe recognitionofthemaintargetstowhichhateisdirectedmassively.Fortheirpart,Malmasi andZampieri[20]aswellasSalminenetal.[21],aresomeofthefewauthorswhoproposed

Sustainability 2022, 14,13094 3of16

methodstoautomaticallydetecthatespreadonsocialmediabasedonNLPandsupervised classificationtechniques.

However,alltheseworkshavesomethingincommon:theyalldealwithhatespeech fromagenericandinternationalpointofview;thatis,tryingtoidentifyhatespeechspread justinEnglish,motivatedbyallkindsofdiscriminatoryreasons,aimedtowardsalltypes ofvulnerableaudiences,andatanytimeandcontext,isanapproachthatistooambitious andcouldposeaproblemofinternalvalidity,especiallyinlarge-scalestrategies.Even theprototyperecentlydevelopedbySalminenetal.[21],oneofthemostinnovativeand advancedprototypesusingdeeplearningandincludingdetectioninvariousonlinesources, isbasedonthissametypeofapproach.Tryingtodetectonlinehateinageneralwaycanbe reductionistbyobviatingthecomplexityofhowhatespeechisspread,tryingtocoverthem allinasingleclassifiertrainedwithgeneralexamples.Thiscouldbealimitationbecause theresultingmodelsmaynotbeaseffective,reliable,and,paradoxically,generalizableas thosethataretrainedwithrealexamplesofaspecificcontext,aspecifictypeofhate,anda specificdiscriminatorycategory,separatinganddifferentiatingconcepts,characteristics, andlinguisticnuances.

Inthissense,itshouldbenotedthatontheinternationalscenetherearealreadysome examplesofstrategiesandtoolsfordetectingcyberhatethattakeintoaccountthedifferent levelsofhatespeech,aswellassomeofthedifferentcategoriesofprejudicethatcan motivateitorthedifferentvulnerablegroupswhomaybevictims.Wecanhighlightworks suchastheonedevelopedbyDavidsonetal.[22],whichdifferentiatesbetweenmessages thatexpressexplicithateandmessagesthatarejustoffensive,ortheonedevelopedby Badjatiyaetal.[23],whichaimstospecificallyidentifymessageswithracistorsexistcontent andalsousesdeepmodelling.However,mostofthecitedstudiesthatofferautomatichate speechdetectionmethodsbasedonmachinelearninghaveanotherlimitationincommon: theydonotuseadhocgeneratedtrainingcorpus.Mostoftheprototypesdevelopedsofar basetheirdetectiononpreviouslydevelopedlexicondictionaries,or,inthecaseofusinga corpusofexamplestotraintheclassificationalgorithms,theyusealreadyavailabledatasets developedinpreviousworks,suchastheprototypedevelopedbySalminenetal.[21]. Thisapproachalsoinfluencestheinternalvalidityoftheprototypeanditsfinalreliability. IntheSpanishcontext,oneofthefewstudiesthatattemptedtoaddressthedetectionof onlinehatespeechinSpanishistheonedevelopedbyPereiraKohatsuetal.[24].This prototypepresentsthesamelimitationsasmostofthosedevelopedinternationallysince italsoaddresseshatespeechinagenericwaywithoutdistinguishingaudiencesortypes. Inaddition,althoughPereiraKohatsudiddevelopanadhoctrainingcorpustogenerate predictivemodels,thiscorpuswasgeneratedbyasinglecoder,whichalsoposesaninternal validityproblemduetoitspotentialsubjectivity.Similarly,therearerecentprojectsthat havecreatedcorporaofhatefulspeechintheGreek[25,26]andItalianlanguages[27]for thetrainingofhatespeechdetectionmodels.

Fromamoretechnicalstandpoint,recurrentneuralnetworks(RNN)havebecome apopularchoiceforhate-speechdetectionandclassificationinshortmicro-blogging texts[28,29].Duwairi,Hayajneh,andQuwaider[30]investigatedtheabilityofconvolutionalneuralnetworks(CNN),CNN-LSTM(longshort-termmemory),andbidirectional LSTM-CNNmodelstodetecthatefulcontentfromsocialmediaintheArabiclanguage, withthelasttwoarchitecturescombiningCNNandRNNachievingthebestscores.Inthe workofAl-HassanandAl-Dossari[31],asupport-vectormachine(SVM)classifieriscomparedagainstLTSM,CNNwithLTSM,agatedrecurrentunit(GRU),andCNNwithGRU models.Alldeeplearningmodelsoutperformthebaseline,withthecombinedarchitecture achievingbetterperformanceinthiscaseaswell.Al-MakhadmehandTolba[32]propose anensembleofdeepclassifierscombinedwithanaturallanguageprocessing(NLP)-based semanticfeatureextractionlayer.PrasadandMishra[33]explorethefeasibilityofbidirectionalencoderrepresentationsfromtransformer(BERT)-basedmodelsformultilingual hateandabusivespeechdetection,oneofthemostadvancedtechniquesinthislinewhich hasalsobeentestedinthiswork.In[34],transferlearningisinvestigated,introducing

Sustainability 2022, 14,13094 4of16

theBERT-basedtransformer,AraBERT,thatshowsanimprovedperformanceinAlgerian dialectalArabic.In[35],multitasklearning(MTL)isproposedfortheadaptationofa pretrainedhatespeechdetectionmodelintheArabiclanguageincross-corporatasks. Hatespeechdetectionmodelsareadaptedtothetargetdomain,andtheirperformance deterioratessignificantlywhenappliedindifferentdomains[36].Thisisalsoshownin theworkofBashar,Nayak,Luong,andBalasubramaniam[37],whotrainedmodelsfor hatespeechdetectioninthecontextoftheCOVID-19pandemic.Theimportanceofthe datasetsizeandqualityishighlightedalsobyKovács,Alonso,andSaini[38]inbothclassic machinelearninganddeeplearningapproaches.Textpreprocessingcanalsosignificantly improveperformance[39].

Consideringthesepremises,thegeneralobjectiveofthepresentworkistodevelop andvalidateamoreadvancedcomputationalstrategythatallowsforthedetectionof hatespeechonlinebasedonracismorxenophobiafollowingthelinesofresearchthatthe authorshavealreadybeendevelopingspecificallyintheSpanishcontexts,whichcould beconsideredaspilotstudiesonwhichthisprojectisbased[10,11,40,41].Inmostofthese studies,theauthorstreatracismandxenophobiaasasinglecategoryinthesamewaythat theMinistryoftheInteriorofSpaindoeswhenitrecordshatecrimes.Thisisbasedonthe factthatbothtypesofdiscriminationareparallelandpresentdifficultiestodifferentiate. AccordingtoauthorssuchasDíezNicolás[42]orCortina[43],onmanyoccasions,even withhelpofmeasurementtools,itistoodifficulttodistinguishbetweenoneandthe othertypeofprejudiceasthemainreasonforrejectionandhate,sinceinmostcases,the categoriesareconcatenated,intertwined,andoneisintrinsicallylinkedtotheother.For thisreason,theyareusuallystudiedtogether.

Inamoreparticularway,thepresentworkaimstosolveandovercomethelimitations ofpreviouslydevelopedprototypesbasedonaseriesofdifferentiatingelements.On theonehand,weexclusivelyfocusondetectinghatespeechmotivatedbyracismand xenophobia,whichallowsfortheelaborationofmorespecific,complete,andprecise corporatogeneratemorereliablepredictivemodels.Inthesameway,wegenerateour owndatasetsofrealtweets,atfirst,onlyinSpanish,butlater,alsoinGreekandItalian, totrainthepredictivemodels.Inthissense,sincethecreationofthesecorporarequires themanualannotationofpreviouslydownloadedandfilteredmessagesfromtheTwitter APIs,weposethefollowingresearchquestion:Whatfrequencyandpercentageofhate tweetsduetoracism/xenophobiaaredetectedthroughmanualannotationinasampleof previouslyfilteredtweetsaboutmigration?(RQ1).

Ontheotherhand,anotherinnovativeelementthatthisworkpresentsistheuseof deeplearninginthegenerationofpredictivemodelsthatwillallowfortheclassificationof hateinTwittermessagesautomaticallyandonalargescale.Specifically,recurrentneural networkswillbeused(andanadhocmodelandaBERT-basedmodel),analgorithmthat, apriori,shouldpresentsignificantadvantagesovertraditionalclassificationalgorithms, offeringbetteraperformance,especiallywhenappliedtotextclassifications,asisthe case.However,thereisnotenoughempiricalknowledgetoaffirmthatdeepmodelling willofferahigherreliabilitythanshallowalgorithms.Forthisreason,wealsoposethe followingquestions:Whichmachinelearningalgorithmpresentsthebestperformance whengeneratingapredictivemodelcapableofdetectinghatespeechspreadonTwitterin Spanish,basedonracist/xenophobicreasons(RQ2)?Doesdeepmodellingperformbetter thanshallowmodellingforgeneratingaprototypecapableofdetectingracist/xenophobic hatespeechonTwitterinSpanish(RQ2A)?

Inaddition,thisworkincludesanexternalvalidationphaseasanotherinnovative elementinwhichthefirstdevelopedclassifieristestedwithanewsampleoftweets.This stagewillcheck,beyondtheinternalevaluationoftheprototype,howreliablethemodel withthebestevaluationmetricsiswhenitcomestodetectingnewmessagesinSpanish aboutmigrantsandrefugeespostedonTwitter.Moreover,regardingthisstageofvalidation, weposethefollowingresearchquestion:Willthebestperformingalgorithmreliablydetect hatespeechinanewsampleoftweetsaboutmigrationinSpanish(RQ3)?

Sustainability 2022, 14,13094 5of16

Finally,theresearchalsoincludestheevaluationofthedetectorinotherlanguages andothersourcesaswell,notonlyTwitter.Specifically,themodelswiththehighest scorevalidatedinSpanishwillbetrainedinadditionallanguagesusingthesamemachine learningarchitecture,thatis,messagesinGreekandItalianpromotinghatespeechabout migrantsandrefugeesfoundonline.Therefore,onemoreresearchquestioncanbeposed: CanthebestperformingmachinelearningmodelsinSpanishberetrainedandappliedto otherlanguagesaswell,keepingthesamelevelofperformance(RQ4)?

4.Method

Asindicated,thedetectorofracistandxenophobiconlinehatespeechhasbeen developedfollowingalarge-scaledetectionstrategybasedontheintensivecomputationof dataundertheSupercomputingCentreofCastillayLeón,ScayleusingNLPandmachine learning.Forthis,themethodologicalworkwasdevelopedover4stages:theinitial explorationandtheoreticalapproach,thegenerationofthedatasets,thegenerationof thepredictivemodels,theexternalvalidationoftheprototype,andtheadaptationofthis prototypetoGreekandItaliananditsevaluationonothersources.

4.1.TheoreticalPhase

Inthisphase,wecarriedoutanin-depthqualitativeexplorationofhatespeechthat spreadsonsocialmediasuchasTwitterand,specifically,thatwhichismotivatedbyracist and/orxenophobicreasons.Aliteraturereviewrelatedtothisfieldofstudywasalso carriedout,whichservedasatheoreticalapproach.Inaddition,weidentifiedprofiles andhashtagsonTwitterthroughwhichagreaternumberofmessagescontainingracist andxenophobichatearepublished.ExploringthesepotentialsourcesofhateonTwitter helpedustobetterunderstandandnarrowdownthedifferentwaysinwhichracistand xenophobichateisexpressed,aswellasthedifferentcontextsinwhichitspreads,the mostcommonvictims,andthemostcommonlyusedtermsandexpressions.This,inturn, helpedustosubsequentlygeneratethelinguisticfiltersthatwouldallowustodownload thefirstsampleofpotentialracisthatetweetsformanualclassificationinordertogenerate theadhocdataset.

4.2.DatasetGenerationPhase

Aftertheexploratoryphase,thedatasethadtobecreatedfromrealandvalidated examplesofshortmessagescontainingthetypeofhatetodetect.Theobjectivewasthat thedevelopeddatasetcouldbeusedasacorpustotrainthehatespeechdetectors.The generationofthedatasetwascarriedoutinaseriesofsub-phasesthatareexplainedbelow.

4.2.1.DefinitionandTypologyofHateSpeechtoDetect

Firstly,criteriawereestablishedtodefinethetypeofspeechtobedetectedtogenerate acustomizeddataset.Inaccordancewiththepossibilitiesthathadbeenidentifiedinthe previousqualitativeexplorationandtakingintoaccountboththedefinitionsprovided bythedifferentauthorsandinstitutionsandtheEuropeanlegalframeworkitself,the definitionofhatespeechwasbroadened,encompassingthedifferentmeaningsandtypes offeredbyacademia,publicinstitutions,andtheSpanishpenalcode,aswellasthethree levelsofonlinehateprovidedbyMiró Llinares[16].Thus,alltypesofhatefulspeeches wereincludedforthegenerationofthedataset,fromthemostexplicitandviolenttothe mostsubtle,sinceinthepreviousphase,asmallminorityofdirectlyracist/xenophobic hatehadbeendetected,evenintheprofilesmostpolarized.Sincetheintentionwasto beabletodetectasmanymessagesaspossiblewithracistand/orxenophobiccontent,it wasnecessarytocovermoretypesofhatemessages,includingthemostimplicitones.In addition,inthevalidationprocessofthemanualclassification,whichwouldbecarriedout followingthebasisofcontentanalysis,andinthesubsequenttrainingofthemodels,itwas expectedthattheresultswouldberefined,leavingonlytheclearerexamplesandfiltering andrejectingthemostdoubtfulorambiguousfornothavingintercoderagreement.For

Sustainability 2022, 14,13094 6of16

thisreason,itwasalsointerestingtocoverthewidestpossiblerangeoftypesofhate.On theotherhand,whatwouldbeconsideredhatespeechduetoracistand/orxenophobic discriminationwasalsodefined,compilingallthederogatoryterms,expressions,and targetscollectedintheexploratoryphase.

4.2.2.ElaborationofDictionaryFiltersandDownloadingtheFirstSampleofTweets

Subsequently,adictionaryoftermsandcombinationsofwordswerecreatedtoserve asafilterforaninitialdownloadofpotentialtweetswithracistand/orxenophobichate. Todothis,westartedfromthequalitativeexplorationofTwitteraccounts,profiles,and hashtagsthroughwhichagreaternumberofracistandxenophobicmessagesarespread inSpain.Thus,Tweetswerelocatedusingkeywordsidentifiedintheexploratoryphase inwhichpotentialvictimsofthiskindofhatewerementioned.Theyaremainlyforced migrants,refugees,andasylumseekers,butalsoregularimmigrantsandalltypesofnonwesternethnicitiesandforeigncultures,racializedpeople,sub-SaharanAfricans,gypsies, Latinos,Asians,Muslims,etc.

Secondly,basedonthesefirstexamplesofmessageswithhatespeechextractedfrom potentialhateperpetratoraccountsonTwitter,wemadethefinalselectionofthesearch words[44]tocreatethefinalfilterdictionarythatwouldbeusedforthedownload.Specifically,alistofwords,roots,orwordcombinationsthatcouldberepresentativeorindicative ofracistand/orxenophobichatewasdrawnup.Thisfilterdictionarywasdevelopedad hocwiththeaimofaccessingtweetsmostlikelytocontainthetypeofhatesought,thus optimizingthetagging,streamlining,andoptimizingofthedatasetcreationprocess.In thisway,thefilteredtweetsweredownloaded,whichwouldlaterbeclassifiedmanuallyto generatethetrainingcorpus.

WefinallycarriedoutthedownloadusingthedictionarygeneratedbetweenOctober andDecember2019.Althoughwedownloadedalargernumberoftweets,wefinally collectedasampleof24,000messagesforlatermanualclassification.

4.2.3.ManualPairClassificationandClean-UpoftheFinalDataset

Afterdownloading,weproceededtomanuallyclassifythepotentialanti-immigration hatetweets,forwhichtheDoccanoplatformwasused.Alltweetswereclassifiedbytwo binarytrainedjudgesashateandnon-hatemessages.Simultaneously,thetweetsthat werenotinterestingtoincludeinthedatasetwerediscarded,suchasthosefromother contexts,forexample.Thedatasetwasgeneratedonlywiththemessagesinwhichthere wasagreementbetweenbothcoders,discardingthosewithoutagreement.Withthisstep, weintendedtoensurethereliabilityandqualityoftheresultingdataset,thusovercoming thelimitationsofsomepreviouslydevelopedprototypes,e.g.,[24].Aftercompilingand cleaningthefinaldataset,itwasmadeupofatotalof3751racist/xenophobichatetweets (15.6%)and7892non-racist/xenophobichatetweets(32.9%).

4.3.GenerationofPredictiveModelsPhase

Inafinalphase,weusedthedatasetdevelopedtogeneratetheclassifiersthatwould laterallowustoidentifytheanti-immigrationhatemessagesspreadonTwitterinSpanish. Specifically,wegeneratedatotalofninepredictivemodels,sixofthemusingtraditional algorithms,anothermodelfromthevotesofthoseshallowmodels,andtwofinalmodels usingdeeplearning,specificallyrecurrentneuralnetworksandtransformers.

4.3.1.ShallowModelling

Forthedevelopmentoftheshallowmodels,thescikit-learnlibrariesandtheNatural LanguageToolkit(NLTK)wereused.Specifically,6modelsweregeneratedwiththefollowingconventionalalgorithms:originalNaïveBayes,NaïveBayesformultinomialmodels, NaïveBayesforBernoulli’smultivariatemodels,logisticregression,linearclassifierswith stochasticgradientdescenttraining,andasupportvectorclassifier.Inallofthem,weused thedefaultparametersettingsfromthescikit-learnlibrary[45]andbag-of-wordsasthe

Sustainability 2022, 14,13094 7of16

textrepresentation.Inaddition,thedatasetwasrandomlydividedintotwosubsets,one with70%ofthemessagesfortraining,andanotherwith30%fortestingthemodels.After trainingtheshallowmodels,wefinallygeneratedafinalsummaryclassifierwhichbased itspredictiononthevotesoftheprevious6classifiers.Inthiscase,aconfidencethreshold of80%wasincludedsothesummarydetectorwouldchoosethecategorypredictedbyat least5ofthe6shallowclassifiers.

4.3.2.DeepModelling

Finally,adeeplearningarchitecturewasusedtogeneratethefinalprototype.Specifically,anadhocrecurrentneuralnetworkwasgeneratedusingembeddingsasthetext representation.Todothis,weusedtheKeraslibrarywithTensorFlowasthebackendto createasequentialmodelwithfourlayers.Theinputlayerwasusedtocreatetheembeddings,whichweretrainedusingthe10,000mostcommonwordsofthecreatedvocabulary plus1000out-of-vocabularybuckets,assuggestedbyGéron[46].Thus,theembedding matrixincludedonerowforeachofthese11,000wordsandonecolumnforeachofthe 6embeddingdimensions.Thesecondandthethirdwerehiddenlayersthatconsistedof GRUs(asimplifiedversionoftheconventionalLSTMcells)with128neuronseach[47]. Finally,theoutputlayerwasadenselayerwithoneneuronandusedthesigmoidactivation toestimatetheprobabilitythataparticularmessagecontainedracist/xenophobichate.We usedstandardlosswithbinarycrossentropyandanAdamoptimizertocompilethismodel. Subsequently,weimplementedthetrainingcorpususing10epochs,andweusedthetest setforvalidation(30steps).

Inaddition,wedevelopedafinaldeeplearningclassifierusingbidirectionalencoder representationsfromtransformers(BERT)[48],apre-trainedlargelanguagemodelthat uses177,854,978parametersandthatwasfine-tunedwithourannotatedhatefulandnonhatefulmessages.ThismodelwasgeneratedusinganAdamoptimizerwithlearning rate=3 × 10 5 andepsilon=1 × 10 8,sparsecategoricalcrossentropyforloss,and 3epochs.ThenumberofparametersofbothmodelscanbeconsultedinTable 1

Table1. Deeplearningalgorithms’complexityintermsofmodels’numberofparameters.

4.4.ExternalValidationPhase

Finally,oncethepredictivemodelshadbeengeneratedandevaluated,theclassifier withthebestperformancewasvalidatedwithnewsamples.Theobjectiveofthisstage wastotesthowaccurateandreliabletheclassifierwiththebestevaluationmetricsiswhen puttingitintopracticewithnewdata.Thisnewdatasetcontained10,285tweetsretrievedin NovemberandDecember2020.Themessagesweremanuallyclassifiedbytwonewhuman coders.Inthiscase,thesamplewascarefullyreviewedbeforethemanualannotation, eliminatingbeforestartingtotagallthetweetsthatwewantedtodiscardbecausethey camefromothercontextsorbelongedtoothercategoriesofprejudice,forexample.Thus, aftertheclassificationprocess,onlythetweetsthatdidnothaveanagreementwere rejected,withwhichitwaspossibletoconsiderablyincreasethevalidtweetswithinterjudgeagreement.Attheendoftheclassificationprocess,theintercoderreliabilitywas checkedagain,seekingfullagreement.Thus,thismanualclassificationresultedin83%of thetweetsannotatedwithagreement(n=8588),ofwhich2781weremessagesofracist and/orxenophobichate(27.04%ofthetotal,32.38%ofthosewithagreementtweets),and 5807weremessagesthatdidnotcontainracistand/orxenophobichate(56.46%ofthe total,67.62%ofthosewithagreement).Atotalof16.5%ofthesamplewasrejectedfor nothavinginter-judgeagreement(n=1697).Subsequently,theagreementofthemanual classificationwiththepredictionsofthedetectorwiththebestperformancewaschecked, andnewevaluationmetricswereextractedfromthedetectorwiththatnewsampletoknow

Sustainability 2022, 14,13094 8of16
RNN BERT 216,798 177,854,978

towhatextentithadcoincidedwiththehumanclassifiers,andthuscheckhowaccurate andreliablehadbeenthedetectionoftheclassifierwiththenewtweets.

4.5.EvaluationofthePrototypeonMultisourceContentandinOtherLanguages

Animportantobjectiveofthisresearchwastocreateadetectorthatcouldbecapable ofdetectinghatespeechinothersimilarcontextsandlanguages.Theabilityofthemodels withthebestperformancestoisolatehatespeechcomingfrommultipleonlinesources hadtobeinvestigated.SourcesincludeTwitter,YouTube,Facebook,webarticles,and commentsfromdigitalmedia,associationwebsites,andblogsinwhichthetypeofhate speechanalysedcouldbepotentiallyspread.Inadditiontothis,itisveryimportantfor theeffectivenessoftheproposedapproachtobevalidatedinotherlanguagesaswellby simplyretrainingthemodelinthenewtargetlanguages.Forevaluatingthesedesirable characteristics,wemadeuseofthePHARMdatasets[49].ThePreventingHateAgainst RefugeesandMigrants(PHARM)projectconcernsamulti-sourceplatformfortheanalysis ofunstructurednewsandsocialmediamessages.Inadditiontoawebinterfaceforscraping andanalysinghatespeech,PHARMoffersamultilingualdatasetcontainingmultisource racist/xenophobichatespeechrecords.ThesourcesincludewebsitesinGreek,Italian,and Spanish,aswellasTwitter,YouTube,andFacebook,andconcernnewsarticles,comments, tweets,Facebookposts,andYouTubecomments.Currently(September2022),thePHARM datasethasabout35krecords.Thisdatasethasbeenannotatedbyhumancoders,while theresultshavebeencheckedforacceptableinter-coderreliabilityfollowingthestrategy previouslydevelopedintheSpanishprototype.InadditiontothePHARMrepository,the finaldatasetswereaugmentedwithadditionaldataprovidedbythePHARMdevelopment team.Table 2 depictsthebasicdescriptivecharacteristicsoftheformeddatasets.

Table2. DatasetdetailsforthesupplementaryevaluationinSpanish,Greek,andItalian.

Aswillbefurtheranalysedinthefollowingsection,themostpromisingclassifiers fortheprototypehatespeechdetectorprovedtobethedeeplearningmodels.Therefore, thisarchitecturewasadoptedforthisexperimentalscenarioaswell,aftermakingsome slightmodifications.Themostnotablemodificationwastheadditionofinstanceweighting inthetrainingphase.Duetotheimbalancebetweenhatefulandnon-hatefulrecords,the initialclassifiertendedtodevelopabiastowardstheclassthatwasoverrepresented,the non-hatefulclass.Therestoftheparametersremainedthesame.Thenumberofunitsfor eachGRUlayerwasreducedto64,asthissetupofferedthesameperformancewithless computationalcost.

5.Results

Inthefirstplace,beforegivingwaytoansweringtheresearchquestionsraised,wewill exploretheresultsofthemanualannotationcarriedouttocreatethedatasets.Inthissense, thefirstthingtopointoutisthehighpercentageoftweetsthatwerefinallyrejected,which isgreaterthan50%.Thisindicatestheinitialcomplexitytofacethetaskofidentifyingthis typeofhatespeechreliablyandinaparticularlinguisticcontext.Ontheotherhand,itis observedthatthepercentageofhatetweetswithfullagreementisconsiderablyreduced despitehavingclassifiedpreviouslyfilteredmessages.Specifically,andrespondingtoRQ1, 15.6%ofracistand/orxenophobichatetweets(N=3751)werevalidated,comparedto 32.9%ofmessageslabelledasnon-racist/xenophobichate(N=7892).Thesepercentages showthatlinguisticfilterdictionaries,nomatterhowcompleteandcomplex,cannotbe

Sustainability 2022, 14,13094 9of16
PHARMOtherSources HateNoHateHateNoHate ES139011,108972723,787 EL43596040-5362 IT58484904-18,451

aneffectivemethodtoidentifythesetypesofhatemessages,somethingthatwasalready assumed.However,theyservedtooptimizetheprocess,sincewithoutthesefilters,the procedureoffindinghatespeechexamplesthroughoutTwitterwouldhavebeenendless.In addition,consideringthatthedatasetscanalwaysbeenrichedandupdatedwithnewdata, themostimportantthing,especiallyinthisinitialprototype,wastoestablishastrategyto generatecorporaofqualityratherthanquantity.

Regardingtheevaluationofthegeneratedmodels,threemetricswereused:accuracy, F1-score,andAUC-ROC.TheAccuracywascalculatedbyusingthetotalsumofcorrect predictionsacrossallclasses;theF1-scorewascalculatedasthearithmeticmeanoftheperclassF1-scores,whicharetheharmonicmeansoftheprecisionandrecallmetrics.Finally, theAUC-ROCunveilstheefficiencyoftheclassifiersatallthresholds.Therefore,both micro-andmacro-averagedmetricsarepresent(accuracyandF1-score,respectively),with thelatterbeinginsensitivetoapossibleimbalancebetweenclasses.Whenmacro-averaging, allclassesaretreatedasequal,unveilinglowscoresonclasseswithfewinstances.Ascan beseeninTable 3,classificationperformanceisatleastacceptableinmostcases,asscores higherthan0.75havebeenrecorded.ExceptforthesimpleNaïveBayesalgorithm,all modelsshowedsimilarperformances.Moreover,theaccuracyandAUC-ROCscoreswere higherfortheadhocRNNmodel,whichconfirmsthecomparativeadvantageofdeep learninginthistypeoftask.RespondingtoRQ2,wecanaffirmthatlogisticregressionand supportvectormachinesaretheshallowalgorithmsthatpresentthebestperformancein thiscase.RespondingtoRQ2A,wecanconfirmthatthedeeplearningapproachoffersthe highestperformance.TheresultsarealsovisualizedinFigure 1

Table3. Evaluationmetricsofthemodelsgeneratedwitheachofthealgorithms. ClassificationAlgorithmAccuracyF1-ScoreAUC

Figure1. Performanceratingsforalltestedalgorithms.

Sustainability 2022, 14,13094 10of16
NB0.670.730.65 MNB0.750.830.64 BNB0.680.810.50 LR0.780.840.71 SGD0.750.820.69 SVC0.760.830.71 MVE(MajorityVotingEnsemble) 0.760.820.68 RNN0.860.780.92 ’ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 NB MNBBNB LR SGDSVCMVERNN Accuracy F-Score AUC

Next,theexternalvalidationphasewascarriedout.Atthisstage,theaimwasto assesshowcapablethedeeplearningclassifiergeneratedadhocisonnewdata,i.e.,new tweetsconcerningdifferenttemporalcontexts.Forthis,onlythetweetswithanagreement resultingfromthemanualclassificationofthenewsamplewereused(n=8588).Thus, afterrunningthemodelonthatsample,firstly,Krippendor’sAlphawasusedtocheck theintercoderreliabilitybetweenthemanuallycodedtweetswithagreementandthe predictionsofferedbytheprototype.Theresultofthisreliabilitypretestwas α =0.6, anacceptablefigure,butnottoohigh.Subsequently,theevaluationmetricsofthedeep modelwereextractedwhenbeingrunandtestedonthenewsample,withmorepromising results:Accuracy=0.85,F1-Score=0.74,andAUC-ROC=0.88.Takingthesemetricsinto accountandrespondingtoRQ3,wecanconfirmthattheclassificationprototypeshowsan acceptableperformancewhentestedwithnew,unseen,realdata.

Finally,theevaluationofthedeeplearningarchitectureonnewdatasetsandlanguages tookplace.Asindicated,atthisstage,notonlytheRNNmodeldevelopedadhocwas improvedandtested,butalsoanewdeeplearningmodelbasedonBERT(bidirectional encoderrepresentationsfromtransformers),amachinelearningtechniquebasedontransformerswhichissupposedlythemostadvancedlinguisticmodelfornaturallanguage processingandespeciallyforembeddingcontextualizedwords[48,50],wasimproved andtested.

Table 4 depictstheaccuracyandF1-scoremetricsforthetwodeepmodelsandthe Spanish,Greek,andItalianlanguages.Theresultsindicatethatthesemodelscanhavea goodperformanceindifferentlanguagesaswell,notonlyinSpanish,andinmultisource. BothmodelsachievehighaccuracyratingsconsistentlyintheSpanishlanguage(0.86,0.85, 0.87,and0.90inthefourtests,respectively)andshowsimilarperformancesfortheGreek andItalianlanguages,aswell.Itshouldbealsonotedthatthegapbetweenthemicro-and macro-averagedmetrics(accuracyandF1-score,respectively)becamesmaller,indicating thattheinstanceweightingtechniqueinthetrainingprocessledtoabetter-balanced classifierwithoutabiastowardstheclasswithmoreparadigms.Therefore,responding toRQ4,wecanconfirmthattheproposeddeeplearningarchitecturescanbeappliedto detectracistandxenophobichatespeechinotherlanguagesandotherplatformsaswell, retainingitsperformance.

Table4. EvaluationmetricsofthedeeplearningmodelstrainedonthePHARMdatasets.

6.Discussion

Thisworkhasgeneratedthefirstprototypescapableofdetectinganti-immigration onlinehatespeechautomaticallyandonalargescale.Theseclassifiersweretestedand validatedfirstlyformessagesspreadonTwitterinSpanish,andlaterimprovedandadapted fortheirapplicationonmoreonlinesources(YouTube,Facebook,andmediawebsites)and inmorelanguages(GreekandItalian).Forthis,wehavegeneratedadhocdatasetsthrough manualsortingtasks,andwehaveused,firstly,traditionalclassificationalgorithmsfor thegenerationoftheprimarymodels,andsecondly,twodifferentdeeplearningstrategies, aninnovationwithrespecttodetectors,developedbyotherauthors.Specifically,the

Sustainability 2022, 14,13094 11of16
ES0.870.87 EL0.790.78 IT0.910.89 BERT-basedmodel ES0.900.86 EL0.810.76 IT0.910.88
LanguageAccuracyF1-Score AD-HOCRNNmodel

developmentofthedifferentclassifiershadbeenbasedonnaturallanguageprocessing andsupervisedmachinelearningtechniques.Regardingthemachinelearningalgorithms used,differentshallowanddeeparchitectureshavebeenputtothetest,fromthemost traditionalbasedonshallowmodelling,todeepmodelsbasedontransformers,comparing theirperformances.

Wehaveconfirmedthatdeepmodellingperformsconsiderablybetterthanshallow modellingfordetectinghatespeechdirectedtowardsmigrantsandrefugeesintweetsin Spanishsincethemodelstrainedwithneuralnetworksweretheonesthatbroadlypresented thebestevaluationmetrics,somethingthathadalreadybeenevidencedinnumerouspast studies[23,29,30].Thatiswhytheyweretheonesusedforthemodel’simprovementand adaptationtodetectanti-immigrationonlinehatespeechinnewlanguagesandsources, bothofferingacceptableperformancesinallnewcases.Atthispoint,itshouldalsobenoted thatthetransformer-basedmodelseemstoofferaslightlybetterperformanceingeneral termsthantheadhocdevelopedRNNmodel,somethingthatwasalsoexpectedsinceit iswhatthemostrecentstudiesshowed[48,50].Ourfindingscomplywiththoseofrecent studies,suchasin[51],wheremachinelearningmodels(supportvectormachinesand logisticregression),deeplearningmodels(LSTM,CNN,andBi-LSTM),andtransformerbasedlanguagemodels(multilingualBERT,XLM,andmonolingualSpanishBETO)are comparedforhatespeechdetectioninSpanish.TheBERT-basedmodelsoutperformedthe MLandDL,withtheBETOachievingthehigherF1-scoreof77.62%.

7.Conclusions

Inwayofconclusion,thisresearchworkprovesthatitisfeasibletogenerateautomatic detectorsofonlineracist/xenophobichatespeechusingmachinelearningwithsolid performances.Inaddition,specificvalidateddatasetshavebeengeneratedadhocforthe trainingofthepredictivemodels,somethingthatotherauthorshavenotconsideredso farandthatallowsforthemodelstoovercomethepossibleweaknessesderivedfromthe internalvalidityofthedetectorspreviouslydeveloped.Inthissense,wecanpointoutthat, althoughtheamountofhateandnohatemessagesaddedtothedatasetsaftercross-coding anddatacleaningmayseemlow,themostimportantthinginthisprocessistohavequality examplesoverquantity.Moreover,thisisbecause,althoughtheevaluationmetricscouldbe acceptable,iftheexamplesarenotcompletelyreliable,theinternalvalidityoftheprototype couldbecontaminatedwithfalsepositivesornegatives.Forthisreason,wealsowanted togofurtherandconductanexternalvalidationtochecktheperformanceofourprimary prototypemoreeffectivelywhenappliedtonewrealexamples.Thisvalidationwithnew datahasservedtoverifythatourdeepmodelisreliableandhasanacceptableperformance whenusedinpractice,withnewrealcases,andcomparedwithanewvalidatedmanual classification.Nevertheless,fromthebeginning,thefocuswasongeneratingaquality andreliabledatasetthatcanbeusedasatrainingcorpus,since,inaddition,thequantity canalwaysbeimprovedbyincludingnewvalidatedexamplesintothecorpus.Insum, wehaveresolvedthatlogisticregressionandsupportvectormachinesaretheshallow algorithmsthatofferedthebestperformanceforthistaskoutofthesixtested.However,as indicated,wehaveconfirmedthatdeeplearningperformsbetterthanshallowlearning fordetectingracistandxenophobiconlinehatespeech.Thetwotestedarchitectures,the adhocdevelopedRNNandtheonebasedontransformers,presentedconsiderablybetter performancesthanthetraditionalmodels,bothintheSpanishlanguageandonTwitter, aswellasinotherlanguagesandsources.Nevertheless,itshouldbehighlightedthatthe BERT-basedmodelseemstoofferaslightlybetterperformancethantheadhocRNNmodel, asexpected.

Insummary,itcanbeconcludedthattheresultsofthisstudyarerelevant,significant, andtheyrepresentanoveltyinthescientificliteratureonthestudyofnewcomputational methodsappliedtothesocialsciencesand,specifically,tothedetectionandmonitoringof onlinehatespeech.Thisissobecause,untilnow,anti-immigrationonlinehatedetectorshad notbeentestedinSpanish,Greek,andItalianandinmultisourceusingadhocdeveloped

Sustainability 2022, 14,13094 12of16

datasetsanddeeparchitectures,includinganadhocRNNandaBERT-basedmodel.So, itmustbehighlightedthatthisstudyoffersacontributioninmethodologicalterms,of course,withthelarge-scaledetectionstrategy,withthegenerationoftheadhocdatasetsof validatedrealexamplesofracistandxenophobichatespeechinSpanish,Greek,andItalian retrieved,firstly,fromTwitter,butalsoandsecondarilyfromYouTube,Facebook,and potentiallyhate-spreadingwebsites,andlastly,withthemodelsdevelopedwithbothdeep learningtechniques.Inaddition,thisresearchprovidesatheoreticaladvanceinthestudy ofonlinehatespeechdirectedtowardsmigrantsandrefugees.Finally,apracticalandsocial contributionisalsopresentedsincethetechnologydevelopedherecanbeappliedindiverse publicandprivatespheres,beingabletobenefitfromprivatecompanies,researchgroups, nonprofitorganizations,aswellasgovernmentagencies,amongotherthings,todrawthe possiblesocialacceptanceorrejectionofmigrantsindifferentEuropeanregions,andthus aidinexecutingstrategiestoimprovelong-termintegrationinmigrationprocesses.Infact, theprototypesthathavebeenevaluatedarealreadybeingusedtodevelopnewprojects, suchastheonerecentlypublishedbyArcila-Calderónetal.[52]usingtheinfrastructure andthecomputationalstrategyvalidatedinthePHARMProjectthatallowsfortheretrieval ofgeolocatedtextmessagesfromonlinesourcesusingdifferentsearchqueriesandcriteria, anditsdirectandmassiveprocessingandclassificationthroughtheSupercomputingCentre ofCastillayLeón,Scayle,wherethemodelsareexecutedtolatergeneratedatasetswith themessagesfinallyclassifiedwithreliabilitythatcanbeconsultedandanalysed.

8.LimitationsandFutureLinesofResearch

Amajorlimitationoftheapproachcomesfromthefactthatcommentsandtweetsare treatedasstandalonetextswithouttakingintoconsiderationcontextinformation.Context informationmayrelyonnetworkanalysistodeterminethetopiconwhichacommentor replyisposteandtakesintoconsiderationpreviouscommentsandrepliesthatmaybe crucialtocompletethemeaningofthetextthatisanalysed.However,themetadatathat arecollectedalongwiththetextualinformationmaybeusefulforamorecomprehensive analysisinthefuture.Furthermore,thedetectorshouldbeusableforpeoplewhodo nothavecomputerskillsorarenotcapableofrunningormanipulatingascriptwithout complications.Withthispurpose,aGUIintegrationhasbeenpresentedintheworkof Vrysisetal.[49].ThisledtothecreationofthePHARMwebinterfacethatincorporatesthe modelsthatarepresentedinthecurrentpaper,providingavisualandfriendlyinterfacethat allowsfortheuseofthedetectorsbynonexperts,universalizingitsuseandapplications. Thiscanalsobroadentherangeofpracticalpossibilitiesaswellassocialbenefitssince thedetectorcouldbeimplementedbymoresocialactors.Thereisstrongevidencethat therobustnessandgeneralizationcapabilitiesofdeeplearningmodelsdependhighlyon thequalityandquantityofavailabledata.ThewidespreaduseofthePHARMinterface isexpectedtoleadtoanextensioninthedataset,sinceitmayallowtheretrainingof modelsinthefuturefollowingthepresentedmethodology.Finally,thefuturepossibility ofimplementinganearly-warningsystemisraised.Thiscouldwarnaboutincreasesin racist/xenophobichatespeechinageo-localizedwayandthuspreventandpredictpossible increasesinracist/xenophobichatecrimesincertainregions.Thiswarningsystemcould alsotakeadvantageoftheinterfacementionedabovetobevisualand,inthesameway, usablebyalltypesofusers.

AuthorContributions: Conceptualization,C.A.-C.andJ.J.A.;methodology,C.A.-C.,J.J.A.,L.V.and N.V.;validation,J.J.A.,L.V.andN.V.;formalanalysis,J.J.A.,L.V.andN.V.;investigation,J.J.A.,P.S.-H., M.O.A.,L.V.andN.V.;resources,C.A.-C.;writing—originaldraftpreparation,C.A.-C.andJ.J.A.; projectadministration,P.S.-H.;fundingacquisition,C.A.-C.Allauthorshavereadandagreedtothe publishedversionofthemanuscript.

Funding: ThisresearchwasfundedbytheEuropeanUnionthroughtheRights,Equality,and CitizenshipprogrammeREC-RRAC-RACI-AG-2019[GAn.875217].

Sustainability 2022, 14,13094 13of16

InstitutionalReviewBoardStatement: Notapplicable.Inthisstudy,authorshaveonlyworked withpublicdataextractedfromtheTwitterAPIv2.Therefore,theauthorsconsiderthatthereisno ethicalconflictthatneedsapproval.

InformedConsentStatement: Notapplicable.Nopersonexternaltotheprojectparticipatedinthis study,sonoconsentwasnecessary.

DataAvailabilityStatement: Allthematerialusedinthisworkwillbeavailableontheproject website: https://pharmproject.usal.es/,accessedon8September2022.

Acknowledgments: AndreasVeglisandSergioSplendore.

ConflictsofInterest: Theauthorsdeclarenoconflictofinterestinthiswork.

References

1. Anti-DefamationLeague.OnlineHateandHarassment.TheAmericanExperience2020.TheADLCenterforTechnologyand Society.2020.Availableonline: https://www.adl.org/media/14643/download (accessedon21February2022).

2. Anti-DefamationLeague.OnlineHateandHarassment.TheAmericanExperience2021.TheADLCenterforTechnologyand Society.2021.Availableonline: https://www.adl.org/media/16033/download (accessedon21February2022).

3. OrganizationforSecurityandCooperationinEurope:OSCE–ODIHR.HateCrimeReporting.2020.Availableonline: https: //hatecrime.osce.org (accessedon14April2022).

4. Müller,K.;Schwarz,C.Fanningtheflamesofhate:Socialmediaandhatecrime. J.Eur.Econ.Assoc. 2021, 19,2131–2167. [CrossRef]

5. Amores,J.J.;Arcila-Calderón,C.;Blanco-Herrero,D.Evolutionofnegativevisualframesofimmigrantsandrefugeesinthemain mediaofSouthernEurope. Prof.DeLaInf. 2020, 29,6.[CrossRef]

6. Amores,J.J.;Arcila-Calderón,C.A.;Stanek,M.VisualframesofmigrantsandrefugeesinthemainWesternEuropeanmedia. Econ.Sociol. 2019, 12,147–161.[CrossRef]

7. Pasta,S.Socialnetworkconversationswithyoungauthorsofonlinehatespeechagainstmigrants.In CyberhateintheContextof Migrations;PalgraveMacmillan:Cham,Switzerland,2022;pp.187–214.

8. MovimientocontralaIntolerancia.InformeRaxen:Racismo,Xenofobia,Antisemitismo,Islamofobia,Neofascismoyotras ManifestacionesdeIntoleranciaaTravésdelosHechos.Especial2016.DiscursodeOdioyTsunamideXenofobiaeIntolerancia; Madrid.2017.Availableonline: https://www.informeraxen.es/informe-raxen-especial-2016-2/ (accessedon28April2022).

9. MovimientocontralaIntolerancia.InformeRaxen:Racismo,Xenofobia,Antisemitismo,Islamofobia,Neofascismoyotras ManifestacionesdeIntoleranciaatravésdelosHechos.Especial2019.PorunPactodeEstadocontralaXenofobiayla Intolerancia;Madrid.2020.Availableonline: https://www.informeraxen.es/informe-raxen-especial-2019-por-un-pacto-deestado-contrala-xenofobia-y-la-intolerancia/ (accessedon14April2022).

10. Valdez-Apolo,M.B.;Arcila-Calderón,C.;Amores,J.J.Eldiscursodelodiohaciamigrantesyrefugiadosatravésdeltonoylos marcosdelosmensajesenTwitter. RAEICRev.DeLaAsoc.EspañolaDeInvestig.DeLaComun. 2019, 6,361–384.[CrossRef]

11. Arcila-Calderón,C.;Blanco-Herrero,D.;Valdez-Apolo,M.B.RejectionandhatespeechinTwitter:Contentanalysisoftweets aboutmigrantsandrefugeesinSpanish. Rev.EspañolaDeInvestig.Sociológicas(REIS) 2020, 172,21–40.[CrossRef]

12.Carmona,O.I.Internet2.0:Elterritoriodigitaldelosprosumidores. Rev.Estud.Cult. 2010, 5,43–64.

13. CouncilofEurope. RecommendationNo.R20oftheCommitteeofMinisterstoMemberStateson“HateSpeech”;CouncilofEurope: Strasbourg,France,1997.

14. EuropeanCommissionagainstRacismandIntolerance. ECRIGeneralPolicyRecommendationN.◦ 15onCombatingHateSpeech; CouncilofEurope:Strasbourg,France,2016.

15. MinistryoftheInteriorofSpain.InformedeEvolucióndelosDelitosdeOdioenEspaña.Madrid.2020.Available online: http://www.interior.gob.es/documents/642012/3479677/Informe+sobre+la+evolución+de+delitos+de+odio+en+ Espa{~{n}}a%2C%20a{~{n}}o+2019/344089ef-15e6-4a7b-8925-f2b64c117a0a (accessedon6April2022).

16. Miró-Llinares,F.TaxonomíadelacomunicaciónviolentayeldiscursodelodioenInternet. IDP.Rev.DeInternetDerechoYPolítica 2016, 22,82–107.[CrossRef]

17. Chetty,N.;Alathur,S.Hatespeechreviewinthecontextofonlinesocialnetworks. Aggress.ViolentBehav. 2018, 40,108–118. [CrossRef]

18. ElSherief,M.;Kulkarni,V.;Nguyen,D.;Wang,W.Y.;Belding,E.Hatelingo:Atarget-basedlinguisticanalysisofhatespeechin socialmedia. arXiv 2018,arXiv:1804.04257.

19. Mondal,M.;Silva,L.A.;Benevenuto,F.Ameasurementstudyofhatespeechinsocialmedia.InProceedingsofthe28thACM ConferenceonHypertextandSocialMedia,Prague,CzechRepublic,4–7July2017;pp.85–94.[CrossRef]

20.Malmasi,S.;Zampieri,M.Detectinghatespeechinsocialmedia. arXiv 2017,arXiv:1712.06427.

21. Salminen,J.;Hopf,M.;Chowdhury,S.A.;Jung,S.G.;Almerekhi,H.;Jansen,B.J.Developinganonlinehateclassifierformultiple socialmediaplatforms. Hum.Cent.Comput.Inf.Sci. 2020, 10,1–34.[CrossRef]

22.Davidson,T.;Warmsley,D.;Macy,M.;Weber,I.Automatedhatespeechdetectionandtheproblemofoffensivelanguage. arXiv 2017,arXiv:1703.04009.

Sustainability 2022, 14,13094 14of16

23. Badjatiya,P.;Gupta,S.;Gupta,M.;Varma,V.Deeplearningforhatespeechdetectionintweets.InProceedingsofthe26th InternationalConferenceonWorldWideWebCompanion,Perth,Australia,3–7April2017;pp.759–760.

24. Pereira-Kohatsu,J.C.;Quijano-Sánchez,L.;Liberatore,F.;Camacho-Collados,M.DetectingandmonitoringhatespeechinTwitter. Sensors 2019, 19,4654.[CrossRef][PubMed]

25. Mollas,I.;Chrysopoulou,Z.;Karlos,S.;Tsoumakas,G.Ethos:Anonlinehatespeechdetectiondataset. arXiv 2020, arXiv:2006.08328.

26. Mollas,I.;Chrysopoulou,Z.;Karlos,S.;Tsoumakas,G.ETHOS:Amulti-labelhatespeechdetectiondataset. ComplexIntell.Syst. 2022,1–16.[CrossRef]

27. Sanguinetti,M.;Poletto,F.;Bosco,C.;Patti,V.;Stranisci,M.AnItalianTwittercorpusofhatespeechagainstimmigrants.In ProceedingsoftheEleventhInternationalConferenceonLanguageResourcesandEvaluation,Miyazaki,Japan,7–12May2018.

28. Pitsilis,G.K.;Ramampiaro,H.;Langseth,H.Effectivehate-speechdetectioninTwitterdatausingrecurrentneuralnetworks. Appl.Intell. 2018, 48,4730–4742.[CrossRef]

29. Yenala,H.;Jhanwar,A.;Chinnakotla,M.K.;Goyal,J.Deeplearningfordetectinginappropriatecontentintext. Int.J.DataAnal. 2018, 6,273–286.[CrossRef]

30. Duwairi,R.;Hayajneh,A.;Quwaider,M.AdeeplearningframeworkforautomaticdetectionofhatespeechembeddedinArabic tweets. Arab.J.Sci.Eng. 2021, 46,4001–4014.[CrossRef]

31. Al-Hassan,A.;Al-Dossari,H.DetectionofhatespeechinArabictweetsusingdeeplearning. Multimed.Syst. 2021, 21,1–12. [CrossRef]

32. Al-Makhadmeh,Z.;Tolba,A.Automatichatespeechdetectionusingkillernaturallanguageprocessingoptimizingensemble deeplearningapproach. Computing 2020, 102,501–522.[CrossRef]

33. Mishra,S.;Prasad,S.;Mishra,S.Exploringmulti-taskmulti-linguallearningoftransformermodelsforhatespeechandoffensive speechidentificationinsocialmedia. SNComput.Sci. 2021, 2,1–19.[CrossRef]

34. Mohdeb,D.;Laifa,M.;Zerargui,F.;Benzaoui,O.EvaluatingtransferlearningapproachfordetectingArabicanti-refugee/migrant speechonsocialmedia. AslibJ.Inf.Manag. 2022, 74,1070–1088.[CrossRef]

35. Aldjanabi,W.;Dahou,A.;Al-qaness,M.A.;Elaziz,M.A.;Helmi,A.M.;Damaševiˇcius,R.Arabicoffensiveandhatespeech detectionusingacross-corporamulti-tasklearningmodel. Informatics 2021, 8,69.[CrossRef]

36. Chiril,P.;Pamungkas,E.W.;Benamara,F.;Moriceau,V.;Patti,V.Emotionallyinformedhatespeechdetection:Amulti-target perspective. Cogn.Comput. 2021, 14,322–352.[CrossRef][PubMed]

37. Bashar,M.A.;Nayak,R.;Luong,K.;Balasubramaniam,T.Progressivedomainadaptationfordetectinghatespeechonsocial mediawithsmalltrainingsetanditsapplicationtoCOVID-19concernedposts. Soc.Netw.Anal.Min. 2021, 11,1–18.[CrossRef] [PubMed]

38.Kovács,G.;Alonso,P.;Saini,R.Challengesofhatespeechdetectioninsocialmedia. SNComput.Sci. 2021, 2,1–15.[CrossRef]

39.Naseem,U.;Razzak,I.;Eklund,P.W.Asurveyofpre-processingtechniquestoimproveshort-textquality:Acasestudyonhate speechdetectiononTwitter. Multimed.ToolsAppl. 2020, 80,35239–35266.[CrossRef]

40. Amores,J.J.;Blanco-Herrero,D.;Sánchez-Holgado,P.;Frías-Vázquez,M.DetectandoelodioideológicoenTwitter.Desarrollo yevaluacióndeundetectordediscursodeodioporideologíapolíticaentuitsenespañol. Cuadernos.Info. 2021, 49,98–124. [CrossRef]

41. Arcila-Calderón,C.;Amores,J.J.;Sánchez-Holgado,P.;Blanco-Herrero,D.Usingshallowanddeeplearningtoautomatically detecthatemotivatedbygenderreasonsandsexualorientationonTwitterinSpanish. MultimodalTechnol.Interact.(MTI) 2021, 5, 63.[CrossRef]

42. Díez-Nicolás,J.Construccióndeun índicedeXenofobia-Racismo. Rev.DelMinist.DeTrab.EInmigr. 2009, 80,21–38.Available online: http://www.mitramiss.gob.es/es/publica/pub_electronicas/destacadas/revista/numeros/80/est01.pdf (accessedon12 February2022).

43.Cortina,A. Aporofobia,elRechazoalPobre:UnDesafíoParalaDemocracia;Paidós:Barcelona,Spain,2017.[CrossRef]

44. Kalampokis,E.;Tambouris,E.;Tarabanis,K.Understandingthepredictivepowerofsocialmedia. InternetRes. 2013, 23,544–559. [CrossRef]

45. Pedregosa,F.;Varoquaux,G.;Gramfort,A.;Michel,V.;Thirion,B.;Grisel,O.;Duchesnay,E.Scikit-learn:Machinelearningin Python. J.Mach.Learn.Res. 2011, 12,2825–2830.

46. Géron,A. Hands-onMachineLearningwithScikit-Learn,Keras,andTensorFlow:Concepts,Tools,andTechniquestoBuildIntelligent Systems;O’ReillyMedia,Inc.:Sebastopol,CA,USA,2019.

47. Cho,K.;VanMerriënboer,B.;Bahdanau,D.;Bengio,Y.Onthepropertiesofneuralmachinetranslation:Encoder-decoder approaches. arXiv 2014,arXiv:1409.1259.

48. Devlin,J.;Chang,M.W.;Lee,K.;Toutanova,K.Bert:Pre-trainingofdeepbidirectionaltransformersforlanguageunderstanding. arXiv 2018,arXiv:1810.04805.

49. Vrysis,L.;Vryzas,N.;Kotsakis,R.;Saridou,T.;Matsiola,M.;Veglis,A.;Arcila-Calderón,C.;Dimoulas,C.AWebInterfacefor AnalyzingHateSpeech. FutureInternet 2021, 13,80.[CrossRef]

50. Koroteev,M.V.BERT:Areviewofapplicationsinnaturallanguageprocessingandunderstanding. arXiv 2021,arXiv:2103.11943.

Sustainability 2022
15of16
, 14,13094

51. Plaza-del-Arco,F.M.;Molina-González,M.D.;Urena-López,L.A.;Martín-Valdivia,M.T.Comparingpre-trainedlanguagemodels forSpanishhatespeechdetection. ExpertSyst.Appl. 2021, 166,114120.[CrossRef]

52. Arcila-Calderón,C.;Sánchez-Holgado,P.;Quintana-Moreno,C.;Amores,J.J.;Blanco-Herrero,D.Hatespeechandsocial acceptanceofmigrantsinEurope.Analysisoftweetswithgeolocation. Comunicar 2022, 71,21–35.[CrossRef]

Sustainability 2022
16of16
, 14,13094

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.