A CNN Classifier for HTML Code Generation from GUIs

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

A CNN Classifier for HTML Code Generation from GUIs

University of Colombo School of Computing, Colombo 0700, Sri Lanka

Abstract - Internet usage has increased exceedingly and rapidly in the past decades. The World Wide Web (WWW) has emerged as the most important public communication portal for individuals, businesses and organizations. In recent years, web development has relied on website templates and Content Management Systems, which anyone can use to create a website by plugging in their own text and images. Similarly, web designers create a design mock-up for a web page and give it to a developer to implement in code. This process is challenging and time-consuming since the design and implementation are carried out by separate teams, which is costly. The overall outcome of such a design depends entirely on the web designer's design skills, and there are differences between two source codes that have been implemented in the same UI by two different developers. What if there was a mechanism or system that could detect HTML elements in GUI images and generate source code automatically? If such a system or tool can generate source code automatically, web developers can focus on functionalities rather than wasting time on front-end development. Consequently, this paper presents an approach that automatically converts the GUI design of a website into HTML code using image processing and deep learning. This study employed an experimental approach to realize two scenarios that operates as image processing and deep learning modules. The tag tree produced by the approach was compared to the original HTML tag tree, and the HTML code produced for a specific website was compared to the website's original HTML source code.

Key Words: deeplearning,imageprocessing,HTMLcodegeneration,webdesigning,automaticwebsitegeneration

1. INTRODUCTION

Front-endUIdevelopmentbasedonGUIdesignistheprimaryresponsibilityofdevelopersinwebsitedevelopment. Generally,thisprocesstakeslongerthanachievingsystemfunctionalityandlogic. Therefore,itpreservesdevelopersfrom focusingonimplementingkeyfunctionality.Also,themethodsofimplementingsuchaUIaredeterminedbythedeveloper's experienceandskills.Asaresult,evenwhentwodifferentdevelopersimplementthesameUI,therearedifferencesinthe sourcecode.WhatiftherewasamechanismorsystemthatcoulddetectHTMLelementsinGUIimagesandgeneratesource code automatically? If such a system or tool can generate source code automatically, web developers can focus on functionalitiesratherthanwastingtimeonfront-enddevelopment.Furthermore,thesystemcanfollowthestandardsand rules.Therefore,theoutputofthesystemwillproducestandards-compliantsourcecode,whichcanprovideadditionalbenefits tothewebsitethatimplementsit.

Userinterface(UI)designtakesintoaccounttheneedsofendusers,ensuringthatthesystemispackedwithelements thatmakeiteasyforuserstoaccessandunderstanditsfeatures. Inaddition,today'sprimaryfocusisonGUIdesignusing variousimages,effectsandanimations.GUIdesignandimplementation,however,arechallengingandtime-consuming[1]. FrontenddevelopmentforwebsitesorwebappsismorecomplexthanGUIimplementation.Itentailsworkingwithavarietyof technologiesandlanguages, includingHTML,JavaScript, PHP,ASP.NET,MySQL,and AJAX.Theremay beinstanceswhere developersbecometrappedforafewhoursordays[2].Whenitcomestofront-endlayoutimplementations,webelementsare classifiedintotwocategories:fixGUIelements(buttons,textinputs,paragraphs,etc.)anddynamicelements(dropdowns, drawermenus,etc.Inthisstudy,weconcentratedsolelyonconvertingfixedGUIelementsintosourcecodeasthepurposeof thisresearchistodevelopasolutionforconvertingGUIintosourcecodeusingimageprocessingandmachinelearning.

End-usersofasysteminteractwithitthroughtheuserinterface.Theuserinterfaceistheprimarygatewaythrough whichuserscommunicatewiththesystem.Itisvitalandfront-enddevelopmentreceivesmoreattentionfromtheSoftware DevelopmentLifeCycle(SDLC).Front-enddevelopersmustworkhardtobuildGUIelementstoaccomplishtheuniquedesign provided by graphic designers. Developers spend a lot of time choosing the appropriate HTML tags, planning the proper sequenceandstructure,thencoding.Furthermore,developersmayhavetorepeatthesamesortofcodelinesnumeroustimes toconstructasinglepageofGUIwhichisacommonissueinmarkuplanguages.Therefore,itistime-consuminganddependon thedeveloper'sexperienceandskills.

Theprimaryaimoftheresearchistodesign,develop,andevaluateanapproachforconvertingGUIdesigngraphics intoHTMLcode.Theresearchfocusedontechniquesfromimageprocessinganddeeplearningtodesignandimplementthe approach Thestudymakesthefollowingcontributions:

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

 ConstructingamechanismtoextractHTMLelementsfromaGUIdesign.

 ConstructingamechanismtoclassifytheHTMLtags.

 ConstructingaCNNtogeneratesourcefromHTMLtaghierarchytree.

Figure 1: Conceptual Design of The Proposed System

Figure1illustratestheoverviewoftheproposedapproachforHTMLCodeGenerationfromGUIs,whichisreferredto throughoutthedescription.AdetaileddescriptionoftheapproachisdescribedinSection3.Theremainderofthispaperis organizedasfollows:Section2conductsaliteraturereviewtocomprehendtheconceptsrelatedtotheproblemdomainand scope.Furthermore,similarstudiesinvolvingtheconversionofaGUIintosourcecodearediscussed.Section3describesthe proposedresearchmethodologyandthetheoreticalbasicofthedesignandconceptevaluationtoarriveataproperanswer Section4thoroughlycoversthedesignconceptandtheimageprocessinganddeeplearningtechniquesusedinthesolution. Section5expandsonthediscussionofresults,andfinally,Section6concludestheresearchbylookingatthelimitationsand futuredirectionsoftheproposedHTMLCodeGenerationapproach.

2. LITERATURE REVIEW

Thescopeofstudyisdividedintofoursub-scopesnamelygraphicaluserinterfaces,codegeneration,computervision anddeeplearning. Accordingly,theliteraturereviewsectionanalysestheeffectivenessofconvertingGUItosourcecodeand critically evaluates comparative work in the problem domain. Furthermore, it critically examines the various strategies, technologies,algorithms,methodologies,andtoolsavailablethatcanbeusedtoeffectivelyandefficientlyproducesourcecode fromGUIs.

2.1 Graphical User Interface (GUI)

The user interface connects end users to the system or program and allows them to interact with the system or software.Myers[1]showsthatsmalldetailsofthesystemneedtobedemonstratedandsignificanttohaveanimpactonthe overallsystemfortheendusers.Furthermore,hehasoutlinedmanyapproachestodeterminehowwellUIdesignsystems impressendusers.UIdesignplaysanimportantroleinrepresentingthesoftwareandallitspurposes.IfthesystemUIisnot designedwithsufficientamountofinformation,thefinalsoftwaresystemwillnotbeabletointeracteffectivelywiththeusers.

AGUIisanexpandedversionoftheUIdescribedinthepreviousparagraph Enduserscaninteractwithasystemor softwareusingvisualpromptsinsteadofatext-onlyUI[3].GUIintroducedatechniqueforgraphicallyinteractingwithsoftware systems.ItreplacedthemajorityofCommandLineInterfaces(CLIs)[4].GUIscombinesoftwareandhardwaretocreatean interactiveplatformforenduserstocollect,produce,anddisplaydataandinformation.Asaresult,GUIdevelopmentbecamea keypartofthesoftwaredevelopmentprocess.

Whensystemdevelopmentcomestotheuseofawebbrowser,itisusuallyreferredtoasawebsite.TheGUIoftheweb interfaceallowsuserstointeractwiththerespectivewebsite.Thewebsite'sGUIismainlyusedtorepresentdataaswellas inputtothewebsystem.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

2.1.1 User Interface Elements

ThepartsofaGUIareuserinterfaceelements.Thesearecomponentsthatrepresentdatainthesysteminastructured way. These components assist users to interact with the software system without in-depth experience or knowledge of technologymanipulationandcomputerskills.MostofthevisualagentsthatGUIsusetocapturedata,information,anduser inputarestaticcomponents,butsomeofthemaredynamic,meaningtheychangeperiodicallydependingonconditionsand interactdirectlywithenduserstomakethemfeelthereal-timecommunicationwithcomputersystems[4].Generally,GUI elementsincludethefollowing,

 Informational Components:tooltips,icons,progressbar,messageboxes,modalwindows,notifications

 Input Controls:textfields,checkboxes,radiobuttons,listboxes,buttons,toggles,datefield

 Navigational Components:breadcrumb,slider,searchfield,pagination,slider,tags,icons

 Containers:accordion

Most of these web GUI elements can be classified under two main categories, static UI elements and dynamic UI elements.Faureetal.[5]havebrieflydescribedthedifferencebetweenstaticelementarrangementanddynamicelement arrangement in a system UI. Static element arrangement refers to elements that are fixed components of a UI and these arrangementelementsaremainlyusedtorepresentsystemdatatousersandreceiveuseractionsasinputs.NoteElementsthat aredynamicallygeneratedwithanimation,timing,oruserinteractioncanbeincludedinthestaticelementcategory.Dynamic elementsarethosewithreal-timeinformationthatchangesfromtimetotimeinthesystemduetotime-to-timechangesoruser interaction

2.2 Markup Language

Structured method of text manipulation in a computer calls sign language. These methods make text easily recognizableandhumanreadable.Markuplanguageisnotaprogramminglanguage,butamethodofcreatingastructureof textoradocumentforpresentationonanelectronicdevice.Itcanbeusedbyanoperatingsystem,application,orprogramto presentdataastext,images,andothervisualcomponents.Markuplanguages[6]areclassifiedintothreecategoriesasfollows.

 Presentation markup: TraditionalwordprocessingsystemsusedwithWYSIWYG.Itishiddenfromhumanusers.It includeshorizontalandverticalspacingsuchaspagebreaks,doublelinespacing,andindentation.Itisgeneratedbya mechanicalorelectronictypewriter

 Procedural Markup: It integrates with text and provides text processing instructions to programs. Such text is handled visually by the author. Procedural markup systems include programming constructs where macros or subroutinesaredefinedandcalledbyname.

 Descriptive Markup: Thiscategoryisconcernedwiththelogicalstructureofadocumentratherthanitsappearance. Descriptivesymbolsaredesignedtobeeasyforpeopleandcomputerstoreadandunderstand.User-definedstylesin wordprocessorscanbeusedfordescriptivemarkup.

2.2.1 HTML

ThesyntaxassociatedwithHTMLtodayiswww.ItisanetworkofinformationresourcescalledtheWorldWideWeb. DaveRaggettetal.[7]haveproventhattherearethreemaincomponentsinvolvedinthisworldwidenetworkasfollows,and HTMLcomesundertheHypertextmechanism.

 Consistentnamingconventionforaccessingwebresourcesdirectly(URIs).

 Protocols(HTTP,HTTPS)toaccessresourcesovertheInternet.

 Hypertextmethodtosimplynavigatethroughresourcesandrepresentdata(HTML).

HTMListhemarkuplanguagemostusedbydeveloperstoimplement webGUIs.HTMLexplainsanddefineshowtext, imagesandmultimediaaredisplayedinwebbrowsers[8].TherearetwotypesoftagsinHTML:

 Paired Tags: Apairtagconsistsoftwotags,thefirstoneiscalledtheopeningtagandthesecondtagiscalledthe closingtag.Thesetagscontainthetextthatshouldapplytheeffectofthattag.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

Unpaired Tags: Anunpairedtagisasingletagthatdoesnotrequireacompaniontag.Thesetagsarewrittenas<>or </>toworkwitheachother.Whichstyletochooseisthedeveloper'schoice.

Mostpagestructuringtagsandformattingtagsarepairedtags.Therefore,eachofthemhasbothopeningand closingtags.WhenimplementingaGUIwithHTMLtags,itcanberepresentedasahierarchicaltreeasshowninFigure2

2.2.2 Scripts

Developersgenerallyuseclient-sidescriptingbyembeddingitwithHTMLinwebapplicationdevelopmenttoreducethe timeofdataprocessinganddatarenderingofawebsiteorwebpage.Thisscriptrunsontheclient-sidecomputer.Therefore,the systemdoesnotblocktheuserfromwaitingfortheserver-sideprocess.Manyanimations,simplelogic,andevenlocalstorage values are handled with these server scripting methods. Similarly, scripts are used to create GUI elements in web design. Furthermore,DaveRaggettetal.[7]haveexplainedhowtocomeupwiththisscriptformtoprocessinputswhilehandlinguser input

2.2.3

Style sheets

DaveRaggettetal.[7]haveshownhowHTMLcodehandlestheresponsibilityofrenderingwebviewsbysimplifying HTML withstylesheetsinwebdevelopment. Stylesheetsallowuserstocontrol alignment,font,color,andlayoutsettings. DeveloperscanusetheminHTMLcodeorasaseparatefilethatcanbereferencedexternally.Themethodofassociatingthestyle sheetwiththewebpageisindependentofthestylinglanguagetheyuse.

2.2

Code Generation

Generally,incomputerscience,SDLCisthestructureimposedonsoftwaredevelopmentbyadevelopmentmethod[9]. Synonymsincludesoftwaredevelopmentandsoftwareprocess.Similarly,intheUIfield,UIDevelopmentLifeCycle(UIDLC) consistsofthedevelopmentpathdefinedbyaUIdevelopmentmethodfordevelopingUI[10].Theimplementationphasecreates anactualwebsitefromsitedesign.Asafirststep,theelementsandrelationshipshighlightedinthedesignaremappedontothe constructsprovidedbythechosenimplementationtechnique[11].TurntheGUIdesigncreatedbyagraphicdesignerintoa sourcecodethatcomesunderfront-endUIdevelopment.GUIimplementationisoneofthemaintasksofdevelopersinthe websitedevelopmentprocess.

2.2.1 Human coding and Front-End Development

TraditionallyGUIdevelopmentandimplementationareahumantask,andittakestimetofullyconvertaGUIdesigninto sourcecode.Often,thisprocesstakesmoretimethanrealizingsystemfunctionalityandsystemlogic.Moreover,thisprocess

Figure 2: Hierarchical tree structure of HTML tags

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

takesmoretimeandpreventsdevelopersfromfocusingonimplementingkeyfunctionality[12].Thetimespentmayvaryonthe sameimplementation,dependingontheskilllevelofthedeveloper.IntheabsenceofadisciplinedapproachtoWeb-based systemdevelopment,thatWeb-basedapplicationsarenotdeliveringdesiredperformanceandquality,andthatdevelopment processbecomesincreasinglycomplexandchallengingtomanageandrefineandalsoexpensiveandgrosslybehindschedule [13].

2.2.2 User Interface Tools

Generally,developersuseuserinterfaceimplementationtoolsforGUIimplementations.Developersgetmorebenefits byusingthesetools.TherearetwomaincategoriesofUItoolsasdescribedbelow[14].

First, the quality of the resulting user interfaces can increase because designs can be rapidly prototyped and implementedevenbeforeapplicationcodeiswritten.Becauseofthathighiterationofrapiddesign,it'sanimportantpartof achievinguserinterfacequality.ThecodeoftheUIisautomaticallycreatedwithhigh-levelspecifications,sothereliabilityofthe UIwillincrease.WhencreatingUIsusingUIgeneratortools,mostoftheendproductsarethesame,eveniftheapplicationsare different.Thosetoolsareeasytouse,sonon-programmerscanworkonUIimplementation.Peoplesuchasgraphicartists, usabilitystudiesspecialists,andcognitivepsychologistscanalsobeinvolvedinthisprocess.

Second,becauseinterfacespecificationscanberepresented,validated,andevaluatedmoreefficiently,UIcodecanbe easierandmorecost-effectivetocreateandmaintain.SinceUItoolsplayakeyroleinimplementingtheUI,writingcodewillbe lesswork.Also,therecanbebettermodularizationduetoseparateimplementationbetweenUIcomponentandapplication implementation.GeneratingwebUIsusesdifferenttoolsthanbuildingtypicaldesktopormobilesoftwareGUIs.Simpleweb pagesarecreatedwithconsistenttext,graphicsandlinks.ThesesimplecomponentscanbecreatedbywritingHTMLcodes directly.ButwhenUIgenerationtoolsareusedtodesignwebsystems,theycancreatepagesthatcontaintext,buttons,input fields,etc.AlsoscriptinglanguagesshouldbeusedinHTMLtoimplementdynamicpages.

2.2.3 Transforming Graphical Designs into Code

Many of the techniques used in converting graphical images into codes can be categorized into machine learning streams.Morerecently,therehavebeenseveralattemptstousemachinelearningtechniquessuchasdeeplearningneural networks.Amongtheseefforts,somefocusedonstructuredmarkuplanguages[16].

AliDavodyetal.[17]haveintroducedaReinforcementLearning(RL)methodthatgeneratesHTMLsourcecodeswhen theimageisprovidedonthewebpage.RLframeworkagentistypicallytrainedtomaximizeexpectedevaluationmetricsinstead ofmaximizingconditionalprobabilitytokens.Theagentgeneratescodethatrendersawebviewthatbettermatchesit.The modeltheyproposedisanagentthatcangenerateHTMLsourcecodethattakesawebsiteimageasinputandthenrendersit backinabrowser.TheyuseanRLapproach,wheretheagentmodelreceivesatokenateachtimestampandcreatesamodelthat cansimplygenerateaDSLsourceandistrainedtocaptureDSLtokens.Theyalsoexplainedamethodtoanalyzethegenerated websitescreenwiththetargetimageanditsresults.Fortheirmethod,theyonlyusethewebbrowsertocalculatethereward, makingthewholeprocessindependent.TheyalsoconductedsomeexperimentsonasyntheticdatabasebuiltwithDSLsupport. ItgeneratedsimpleHTMLwebpagesforafewtokens[17].

AlexanderL.etal.[18]havedevelopedanothertechniquebasedondeeplearningtosolvethefrontendUIgeneration problem. They used predictions from neural networks to augment search methods including an SMT-based solver and enumerationsearches.Fromfurtherobservations,theyhaveshownthattheirmethodleadstonon-robustbaselinesandan orderofmagnitudefasterthantheRNNmethod.Theyhaveproposedtwomainideastogeneratecomputerprogramsusing mathematicalpredictionstoaugmentcommonsearchmethods.Thefirstideaishowthesystemlearnstoproduceprogramsthat useacollectionofprograminitializationdifficultiestostudypoliciesthatterminateondifficulty.Thelatterexplainshowneural network architecture can be implemented by prioritizing search-based methods over replacement methods. For solving InductiveProgramSynthesis(IPS),theyhaveshownthatmachinelearningcanprovidesignificantvalue.Learningfromthe associationofinputandoutputsamplesacrossdifferentinterpreterscanenablesourcecodegeneration.TheyalsousedDSLto reducethecomplexityoftheprogramminglanguageanditwasabletoreducethesearchspacearea[19].

Moreover,Lingetal.[20]haverecentlystudiedinputsthroughamixtureofnaturallanguageandstructuredprogram specificationsandthenexplainedtheorganizationoftheprogram.ThemostcommonfactorwastheuseofDSLsdesignedto targetspecificdomain. Theydifferfromfull-fledgedcomputerlanguagesandaremorerestrictive.Itreducesthecomplexityof thelanguageusedinDSLs.Generatingsourcecodefromvisualinputsisanunexploredareaofresearch.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

AnothercloselyrelatedworkusingreverseengineeringtechniqueswasimplementedbyNguyenetal.[21],focusingon Android mobile application developments. It was able to regenerate the source code of the UI by taking the Android app screenshotasinput.Theirmethodiscompletelydependentonengineeringbackgroundandrequiresgooddomainknowledge. pix2code [12]isthefirstattempttosolvetheproblemofcreatingwebpageUIcodesthroughvisualsasinputbytaking machinelearningsupporttofindpotentialvariablesotherthanimplementingcomplexmethods.ItfocussesonanotherCNNbasedapproach,whichgeneratestokensforthefinalstagefromasingleGUIscreenshotinput.Furthermore,itcomparesthe processofgeneratingsourcecodefromanimageorscreenshottodescribinganimageinEnglishandwritingit.Bothofthese processesrequirethecreationofvariablesoftokensbyanalyzingtheimagepixelbypixel.Ittriedonasmalldatasetbyscraping scenesandtrainingafewparameters[22,23].

2.3Computer Vision

Allofthetechnologiesandmethodologiesmentionedaboverelyoncomputervision-relatedrecognitionandvisual classification.Thissubsectiondiscussestheimageprocessingtechniquesthatwillbeemployedinthefollowingsteps.Other researchers have conducted several trials and attempts in image processing and edge detection, with the LoG approach providingsuperioranswers[24].ImageedgedetectionwiththeGaussianLaplaceoperatorcanmimicthevisualpropertiesof human eyes. In practice, noise has a significant impact on an image since the scale factor is incapable of self-adaptive modification[25].

2.4 Deep Learning

Deeplearningprinciplesmustbeconsideredwithimageprocessingtechniques.Intermsofimageidentificationand classification,CNNrankshighlyamongneuralnetworks.Currently,therearevariousmachinelearningapproachestocomputer visionproblems,butCNNisthemethodofchoiceandisusedinawiderangeofapplications.Thisarchitecturehelpsthenetwork tolearnsharpandrichpotentialrepresentationsthroughthetrainingimages.[26][27].

3. METHODOLOGY

Toattaintheendgoals,thestudyusedanexperimentalapproach.TheprocessofconvertingGUIintosourcecodeis dividedintotwomodules.Thefirstmoduleistheimageprocessingmodule,whichextractsHTMLelementsfromtheGUIimage anddeterminesthehierarchyofelements.ThesecondmoduleisforrecognizingHTMLtagsderivedfromaGUIimage

3.1 Image processing module

TheMATLABimageprocessingtoolkitwasusedtoconstructtheimageprocessingmodule,whichwasdesignedto extractHTMLelementsfromaGUIimage.TheextractormodulewasdesignedusingimageprocessingtechniquesincludingLoG, dilate,anderode.Experimentswereconductedbyadjustingthevariablesettingsforeachstageandcomparingtheprocessed outputimagestotheinputGUIimages.Furthermore,experimentswereconductedtoincreasethenumberofextractedHTML elementsfromtheGUIimageandcomparedittothenumberofHTMLelementtagsintheoriginal,whichprovidedtheinputGUI fortheexperiment.Figure3summarizestheconceptandexperimentoftheimageprocessingmodule.TheNotepad++utilityis usedtocomparethecountsoftwocodesegments.

Figure 3: Overview of the Image Processing Module

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

3.2 Deep learning module

TensorFlow and Python technologies were used to create the deep learning module. PyCharm IDE was used to implementthePythoncodeforCNNdevelopmentonalocalMacPCrunningLinux.Recently,themodulewashostedonanAWS cloudinstancetoimprovepredictionaccuracy.Figure4describesthehigh-leveldesignofthedeeplearningmodule.

4. EXPERIMENTAL APPROACH

Theproposedsolutionisorganizedintothreemaincomponents.Figure5depictstheflowofimageprocessing,markup tagstructure,andHTMLsourcecodeproductionatahighlevel.Theproposedsystem'sfinalproductwillcompriseanHTML stylingunit,whichmayincludeCSSorSass.However,thisscopeonlyincludedHTMLtagproduction.

5: High-Level Architecture of the Approach

Figure 4: Deep Learning Module Architecture
Figure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

4.1 GUI Image Processing and Elements Detection

Thefirstmoduleoftheproposedsolutioncanbenamedasimageprocessingunit.WhenaGUIimageisgivenasinput, thisunitprocessestheimageinseveralstepsandreturnsanimagewiththedetectedHTMLelements.Figure6illustratesthe unitinputandpredictedprocessingoutput.ToextractHTMLelementsfromtheinputGUIimage,theedgedetectionalgorithmof MATLAB[28]wasused. Furthermore,theunitofthisproposedsolutioncanbedescribedasfourimageprocessingstepsleading totheoutputofthisimageprocessingsubunit.Eachstepisdescribedbelow.

4.1 1 Laplacian of Gaussian filter (LoG)

TheinputGUIimagewassubjectedtoaLoGfilterintheinitialstageoftheimageprocessingsubunit.Ituses fspecial methodwithparsing 'log' argumenttocreatea2-Dfilterontheinputimage.Thisderivativefilterisusedtolocateedgesin imagesbyidentifyingareasoffastchange[29].

4.1.2 Convert Image into Binary Image

Thesecondstageoftheimageprocessingunitistogenerateabinaryimage.Thepreviousstep'soutputisusedasinput forthisoperation. Im2bw wasusedtoconverttheindexeddataintobinaryimages.Itfirstconvertstheinputimagetograyscale formatandthenconvertsittobinaryimage.Theimagegeneratedbythisstepcontainsonlyblackpixels.Tocompletethisstep, useMATLAB’s im2bw function[30]toparsethe 'level' input.Thelevelcanbespecifiedintherange[0,1],andthevalueusedis about80.0/256.0.

4.1.3

Dilate & Erode Image

Theimageprocessingunit'sfollowingphasedilatesthebinaryimageproducedbythepreviousstepusingMATLAB’s imdilate algorithm[31]. Thismethoddilatesgrayscale,binary,orpackedbinaryimagesbyparsingastructuringelementobject, orarrayofstructuringelementobjects,asreturnedbyMATLAB’s strel functionasthesecondinput.Theimageprocessingunit uses square asthestructuringelementobject.Additionally,itmakesuseofthebreakdownofastructuringelementobject.The resultingamplifiedimageisfurtherprocessedusinganotherapproachcallederosion.

Thisphaseerodestheimageexpandedinthepreviousphase.Itusesthe imerode algorithmfromtheimageprocessing toolboxofMATLAB[32].Thisfunctiontakesthesameargumentsasthe imidylate method,butituses square asthestruct elementobject.Asthenextstage,imageprocessingunitexpandstheerodedimagefromthepreviousprocedureusingthe previouslymentionedmethod.ThisstageresultsinHTMLcomponentsbeingrecognizedandasmoothlyexpandedimage.

4.1.4 Images of Element Highlighted

Finally,therelatedcomponentsinthebinaryimageproducedbytheprecedingstepsarediscovered.Thenittransfersto anewimagewithrandomcolorsquares.The bwconncomp techniquefromMATLAB’simageprocessingtoolbox[33]wasutilized forthisfinalstage.Thefinaloutputofthisimageprocessing,madebydrawingrandomcolorboxes,wasusedinthefollowing stagesoftheproposedapproachtoconverttheimagesintosourcecodes.

4.1 5 Locating Detected Squares on Image

Thenumber2sectionofFigure7depictstheunitthatobtainsthepositionsofeachelementascoordinationvaluesof detectedsquares.ItgeneratesaJSONfilewiththecoordinationvaluesofeachelement.Furthermore,thesameJSONfileis

Figure 6: Sample Input GUI and Expected Output

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

generatedbasedonthehierarchyofthediscoveredelementsquares.ItusedaPythonscripttogeneratethestructuredJSONfile, thenMATLABreadtheprocessedimagewithrandomlycoloredimagesfromtheimageprocessingunit.Figure7depictsan exampleofasimplehierarchytreecreatedusingasimpleUI.

The taghierarchy iscreatedin parallel with the placement of squares. Thesame JSON file is generatedusing this structureofelements.ThisprocedureofgeneratingthehierarchicaltreeofitemsmadeadvantageoftheBreadthFirstSearch (BFS)traversal.Figure7depictshowBFSusesthehierarchytreetodeterminetheorderoftagitems.

Furthermore,thissectioninputstheoverallGUIimageandassignscoordinatevaluestoeachhighlightedelement.Figure 8illustratestheHTMLelementdetectionflowindetail.Inthefirststage,itselectsaspecificelementfromtheentireimagebased onthecoordinatevaluesanddeterminesthenameoftheHTMLtagcorrespondingtothatelement.TheoutputJSONfileisthen updatedwiththeactualHTMLtagvalueratherthantheunknowntag.Finally,thehierarchytreeofunknowntagsisupdatedto anHTMLtaghierarchytree.Figure9summarizesthemainpicture.

Figure 7: Unknown Tag Hierarchy of a Simple Input
Figure 8: Unknown Tags Identify by CNN Module
Figure 9: Use of Convolutional Neural Network

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

4.2. CNN Module

WeimplementedtheHTMLtagrecognitionmoduleusing TensorFlow [34]and Python technology.Themodulewas trainedona TensorFlow environmenthostedlocallyonaMac(Linux-based)system,using Python scriptsimplementedusingthe PyCharm tool[35].ThetrainingdatasetforHTMLtagsiscreatedusinginternetresourcesandhumancollection.Recently,the modulewashostedinacloudenvironmenttotraintheCNNfurtherandimprovethepredictionaccuracy.

5. RESEARCH EVALUATION

Inthisstudy,twodifferentmodulesareevaluated Theyareanimageprocessingmoduleandadeeplearningmodule. Theyaretestedindependently.Thefollowingscenariosareconsideredinevaluatingthemodules

5.1 Image Processing Module

TheimageprocessingmodulewasdevelopedusingtheMATLABImageProcessingToolkit.Overall,thismodulecan outputasourcecodefilecontainingonlyunknowntagsfromtheintermediatehierarchytree.TheinputGUIimagewasevaluated bycomparingthenumberofunknowntagsfoundbytheimageprocessingmoduletothenumberofactualHTMLtagsinthe sourcecode.BycomparingtheinputGUIimagewiththeoutputimage,whichconsistsofrandomcoloredsquares,thehumaneye candetectsignificantHTMLelementrecognitionerrors.Figure11illustratestheresultsfortheinputGUIimageinFigure10.

Figure 10: A Sample Input to Image Processing Module
Figure 11: Sample of Generated Output Image

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

Toevaluatetheimageprocessingmodule,weusedmultipleGUIimagescontainingrealHTMLsourcecode.Themissing tagcounts were thencompared withtheavailable source codecreated bya developer, andtheunknown tagcountswere obtainedfromtheoutputpictureoftheimageprocessingmodule.

5.2. Deep Learning Module

Toevaluatethedeeplearningmodule,weusedacomparativemethodusingthesameHTMLsourcecode.Wecompared thetagnamesintheoriginalHTMLsourcecodeandthemodifiedonegeneratedbytheCNNmodule.SincetheCNNwasonly trainedtorecognizeafewtypesofcomponents,wehadtoextracttherecognizedelementsfromtheCNNmoduleandmatch themtothecorrespondingelementsprovidedbythedeveloperfortheinputGUIimage.Weconcentratedsolelyonthetrained elementstologicallyevaluatethedeeplearningmodule.

Toevaluatetheresults,ademonstrationwasconductedasdescribedhere.Asimpleone-pagewebpagewascreatedby handusingHTMLandascreenshotofitwasinitiallycaptured,asshowninFigure12.

The screen capture was then sent via theimage processing moduleto extract thecorrespondingHTML elements. Figures13and14illustratetheresultsofthefirstandlaststagesoftheimageprocessingmodule.

Figure 12: Sample Basic Web Page Created from Code
Figure 13: Image After First Step of Image Processing Module Figure 14: Image After Completing All the Steps in Image Processing Module
ThedetectedareaelementsarethenaddedtothetrainingCNNhostedonAWSandpredictstheHTMLcodeasshownin Figure15.
Figure 15: Output code from the CNN

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

6. FUTURE DIRECTIONS AND CONCLUSIONS

TheaimofthestudyistoimplementamethodtoautomaticallygeneratesourcecodefromGUIimages.Basedonthe qualitativeanalysisofthefinallydetectedHTMLtags,itcanbeconcludedthattheaccuracyoftheoutputoftheimageprocessing moduleaffectsthefinaldesiredresultofthedeeplearningmodule.Afteranalyzingtheresultsofcomparingthenumberof unknowntagswiththenumberofexpectedrealHTMLtags,theimageprocessingstepshouldbefurtherimprovedtoextract HTMLelementsfromtheGUIimage.

CNNwastrainedforonlytenmainHTMLelements(heading,paragraph,image,hyperlink,button,textfield,textarea, icon,searchbar,andlabel).AccordingtothecomparisonbetweentheresultsgeneratedbytheCNNmoduleandtheHTML sourcecodethatgeneratedtheexperimentalGUIimages,theCNNismoreaccuratewiththetrainingdataset.Toimprovethe overalldesignoftheproposedsolution,attentionshouldbepaidtoimprovingtheimageprocessingmoduleratherthanthedeep learningmodule.TodetectthehighestnumberofHTMLelementsfromtheinputGUIimage,itwasconcludedthatweneedto improvetheimageprocessingmodule,astheCNNonlyretrieveselementsextractedfromtheGUIimage.

Thedesignandevaluationapproachoftheproposedsolutionhasbeenabletoinitiateanopendiscussion.Theapproach offirstextractingHTMLelementsfromtheGUIandsecondarilythroughdeeplearningandgeneratingsourcecodebyprocessing BFSonahierarchicaltreeshouldreceivemoreattentionfromtheresearchcommunity.Asafutureworktoimprovethisdesign, researchingandextractingwebsitetaxonomycanprovidesourcecodetogenerateUI,improvepredictionofHTMLtagsandeven givesuggestionstoevolveGUIimplementations.SincetheresearchonlyfocusedonHTMLtagsinsteadofstyling,improvements areneededbyintegratingstylesintotheidentifiedelements.

ACKNOWLEDGEMENT

Thisresearchreceivednospecificgrantfromfundingagenciesinthepublic,commercial,ornot-for-profitsectors.

REFERENCES

[1] Myers,B.A.(1993). Why are human-computer interfaces difficult to design and implement? Carnegie-MellonUniversity. DepartmentofComputerScience.

[2] Kumari,P.,&Nandal,R.(2017). A Research Paper on Website Development Optimization Using Xampp/PHP.International JournalofAdvancedResearchinComputerScience,8(5).

[3] Stephenson,N.(1999). In the beginning... was the command line (pp.1-60).NewYork:AvonBooks.

[4] Graphicaluserinterface–Wikipedia:https://en.wikipedia.org/wiki/Graphical_user_interface

[5] Faure,D.,& Vanderdonckt,J.(2010,June). User interface extensible markup language.InProceedingsofthe2nd ACM SIGCHIsymposiumonEngineeringinteractivecomputingsystems(pp.361-362).

[6] Coombs, J. H., Renear, A. H., & DeRose, S. J. (1987). Markup systems and the future of scholarly text processing CommunicationsoftheACM,30(11),933-947.

[7] Raggett,D.,LeHors,A.,&Jacobs,I.(1997). HTML 4.01 Specification.IETFHTMLWG.

[8] Berners-Lee,T.,&Connolly,D.(1995). Hypertext markup language-2.0 (No.rfc1866).

[9] Pressman,R.S.(2005). Software Engineering: a practitioner’s approach.PressmanandAssociates.

[10]Khaddam,I.,Barakat,H.,&Vanderdonckt,J.(2016). Enactment of User Interface Development Methods in Software Life Cycles.InRoCHI(pp.26-35).

[11]Coda,F.,Ghezzi,C.,Vigna,G.,&Garzotto,F.(1998,April). Towards a software engineering approach to web site development InProceedingsNinthInternationalWorkshoponSoftwareSpecificationandDesign(pp.8-17).IEEE.

[12]Beltramelli,T.(2018,June). pix2code: Generating code from a graphical user interface screenshot.InProceedingsoftheACM SIGCHIsymposiumonengineeringinteractivecomputingsystems(pp.1-6).

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

[13]Murugesan,S.,Deshpande,Y.,Hansen,S.,&Ginige,A.(2001). Web engineering: A new discipline for development of webbased systems.Webengineering:ManagingdiversityandcomplexityofWebapplicationdevelopment,3-13.

[14]Myers,B.A.(2004).51. Graphical User Interface Programming.GraphicalUserInterfaceProgramming.

[15]McFarland,D.S.(2007). Dreamweaver CS3: The Missing Manual."O'ReillyMedia,Inc.".

[16]Deng,Y.,Kanervisto,A.,Ling,J.,&Rush,A.M.(2017,July). Image-to-markup generation with coarse-to-fine attention.In InternationalConferenceonMachineLearning(pp.980-989).PMLR.

[17]Davody, A., Davoudi, H., Baba, M. S., & Florian, R. V. (2018). Learning to generate HTML code from images with no supervisory data.

[18]Balog,M.,Gaunt,A.L.,Brockschmidt,M.,Nowozin,S.,&Tarlow,D.(2016). Deepcoder: Learning to write programs.arXiv preprintarXiv:1611.01989.

[19]Gaunt,A.L.,Brockschmidt,M.,Singh,R.,Kushman,N.,Kohli,P.,Taylor,J.,&Tarlow,D.(2016). Terpret: A probabilistic programming language for program induction.arXivpreprintarXiv:1608.04428.

[20]Ling,W.,Grefenstette,E.,Hermann,K.M.,Kočiský,T.,Senior,A.,Wang,F.,&Blunsom,P.(2016). Latent predictor networks for code generation.arXivpreprintarXiv:1603.06744.

[21]Nguyen,T.A.,&Csallner,C.(2015,November). Reverse engineering mobile application user interfaces with REMAUI (t).In 201530thIEEE/ACMInternationalConferenceonAutomatedSoftwareEngineering(ASE)(pp.248-259).IEEE.

[22]Bahdanau,D.(2014). Neural machine translation by jointly learning to align and translate.arXivpreprintarXiv:1409.0473.

[23]Xu, K. (2015). Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044.

[24]SotakJr,G.E.,&Boyer,K.L.(1989). The Laplacian-of-Gaussian kernel: a formal analysis and design procedure for fast, accurate convolution and full-frame output.Computervision,graphics,andimageprocessing,48(2),147-189.

[25]Mallick,A.,Roy,S.,Chaudhuri,S.S.,&Roy,S.(2014,January). Optimization of Laplace of Gaussian (LoG) filter for enhanced edge detection: a new approach.InProceedingsofthe2014InternationalConferenceonControl,Instrumentation,Energy andCommunication(CIEC)(pp.658-661).IEEE.

[26]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks Advancesinneuralinformationprocessingsystems,25.

[27]Sermanet, P. (2013). Overfeat: Integrated Recognition, Localization and Detection Using Convolutional networks. arXiv preprintarXiv:1312.6229.

[28]JohannaPingel, Edge Detection with MATLAB MathWorks:https://in.mathworks.com/help/images/edge-detection.html

[29]Huang,M.,Mu,Z.,Zeng,H.,&Huang,H.(2015). A Novel Approach for Interest Point Detection via Laplacian‐of‐Bilateral Filter.JournalofSensors,2015(1),685154.

[30]TheMathWorks,Inc Image Processing Toolbox - im2bw: https://in.mathworks.com/help/images/ref/im2bw.html

[31]TheMathWorks,Inc ImageProcessingToolbox–imdilate:https://in.mathworks.com/help/images/ref/imdilate.html

[32]TheMathWorks,Inc ImageProcessingToolbox–imerode:https://www.mathworks.com/help/images/ref/imerode.html

[33]The MathWorks, Inc, Image Processing Toolbox – bwconncomp: https://www.mathworks.com/help/images/ref/bwconncomp.html

[34]Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). TensorFlow: a system for Large-Scale machine learning.In12thUSENIXsymposiumonoperatingsystemsdesignandimplementation(OSDI16)(pp.265-283).

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

[35]Shunan Zhang. Installing tensorflow on Pycharm (Mac). Stackoverflow: https://stackoverflow.com/questions/36998018/installing-tensorflow-on-pycharm-mac BIOGRAPHIES

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072 © 2024, IRJET | Impact Factor value: 8.315 | ISO 9001:2008

ThisaranieKaluarachchireceivedaB.Sc.degreeinComputerSciencefromtheFacultyofScienceatthe UniversityofPeradeniyainKandy,SriLanka.SheispursuingherPhDinComputingattheUniversityof ColomboSchoolofComputinginColombo,SriLanka.HerresearchinterestsincludeArtificialIntelligence, MachineLearning,DeepLearning,andComputerVision.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.