Page 1

             

GRANT AGREEMENT  NO.  :     608775   PROJECT  ACRONYM:   INDICATE   PROJECT  TITLE:   Indicator-­‐based  Interactive  Decision  Support  and   Information  Exchange  Platform  for  Smart  Cities   FUNDING  SCHEME:   STREP   THEMATIC  PRIORITY:   EeB.ICT.2013.6.4   PROJECT  START  DATE:   1st  October  2013   DURATION:   36  Months      

DELIVERABLE 6.1   Evaluation  Methodology  and  Benchmarking  Framework          

Date 24-­‐2-­‐15   02-­‐03-­‐15   06-­‐03-­‐15   09-­‐03-­‐15    

Review History   Submitted  By   Reviewed  By   John  Loane  on  behalf  of  DKIT  team     First  Review   Aidan  Melia  IES   Second  Review   Stephen  Purcell  FAC   John  Loane  on  behalf  of  DKIT  team        

Version 1   1   1   2    

  Dissemination  Level PU PP RE CO  

Public Restricted to  other  programme  participants  (including  the  Commission  Services) Restricted  to  a  group  specified  by  the  consortium  (including  the  Commission  Services) Confidential,  only  for  members  of  the  consortium  (including  the  Commission  Services)

X      

This project  has  received  funding  from  the  European  Union’s  Seventh  Framework  Programme  for  research,  technological  development   and  demonstration  under  grant  agreement  no.  608775  


Table of  Contents   EXECUTIVE  SUMMARY  .................................................................................................................................  4   1  METHODOLOGY  ........................................................................................................................................  5   1.1   Introduction  ...................................................................................................................................................  5   1.2  The  User,  User  Categories  and  User  Expectations  ............................................................................................  5   1.2.1  Urban  Planners  ...........................................................................................................................................  6   1.2.2  Public  Authorities  .......................................................................................................................................  6   1.2.3  Developers  (Architects/Engineers/Designers)  ............................................................................................  7   1.2.4  Main  Contractors  ........................................................................................................................................  7   1.2.5  Technology  Providers  (ICT  and  RET)  ...........................................................................................................  7   1.2.6  Material  and  Solution  Manufacturers  ........................................................................................................  8   1.2.7  Energy  Utility  Companies  ...........................................................................................................................  8   1.2.8  R&D  ............................................................................................................................................................  9   1.3   Overview  of  Evaluation  Methodology  and  Benchmarking  Framework  .......................................................  10   2  HEURISTIC  EVALUATION  ..........................................................................................................................  12   2.1  Introduction  .....................................................................................................................................................  12   2.2  Heuristics  and  Experts  who  will  evaluate  INDICATE  ........................................................................................  12   3  METRICS  FOR  VALIDATING  INDICATE  ......................................................................................................  15   3.1  Introduction  .....................................................................................................................................................  15   3.2  Usability  Metrics  ..............................................................................................................................................  15   3.2.1  Performance  Metrics  ....................................................................................................................................  15   3.2.1.1  Task  Success  ..............................................................................................................................................  15   3.2.1.2  Time-­‐on-­‐task  .............................................................................................................................................  17   3.2.1.3  Errors  .........................................................................................................................................................  18   3.2.1.4  Efficiency  ...................................................................................................................................................  19   3.2..1.5  Learnability  ...............................................................................................................................................  19   3.2.2  Issue-­‐Based  Metrics  ......................................................................................................................................  20   3.2.3  Self-­‐Reported  Metrics  ..................................................................................................................................  21   3.3  Ethics  ...............................................................................................................................................................  22   3.4  Number  of  Participants  ...................................................................................................................................  23   3.5  Combining  Metrics  to  Give  Single  Usability  Score  ...........................................................................................  24   4  QUESTIONNAIRES  ....................................................................................................................................  25   4.1  Introduction  .....................................................................................................................................................  25   4.2  System  Usability  Scale  (SUS)  applied  to  INDICATE  ..........................................................................................  25   4.3  Intuitiveness  ....................................................................................................................................................  27   4.4  Microsoft  Desirability  ......................................................................................................................................  29   4.5  INDICATE  Survey  Management  and  Processing  Application  ...........................................................................  29   5  INTERVIEWS  ............................................................................................................................................  31   5.1  Introduction  .....................................................................................................................................................  31   5.2  Interview  Data  Analysis  ...................................................................................................................................  31   6  BENCHMARKING  .....................................................................................................................................  33   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            2  


6.1 Introduction  .....................................................................................................................................................  33   6.2  Predicted  versus  Real  ......................................................................................................................................  33   6.3  INDICATE  vs  OTHER  TOOLS  ..............................................................................................................................  34  

7 ORGANIZATION  OF  EVALUATION  AND  BENCHMARKING  ACTIVITIES  .....................................................  35   8  CONCLUSIONS  .........................................................................................................................................  36   REFERENCES  ...............................................................................................................................................  37    

 

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            3  


EXECUTIVE SUMMARY   This   document   presents   the   intended   evaluation   methodology   and   benchmarking   framework   that   will   be   used   to   evaluate   the   software   tools   produced   in   the   INDICATE   project.   As   the   project   evolves   it   is   intended   that   this   document  will  also  evolve  to  better  evaluate  the  actual  tool  produced  in  the  project.   In  chapter  one  we  summarize  results  from  D.1.1  and  D1.3  which  identified  stakeholders  and  their  expectations  of   the  INDICATE  project.  These  users  and  their  expectations  are  a  crucial  part  of  evaluating  the  project,  as  they  are   the   people   that   the   tool   is   being  designed  for.   If  the  stakeholders  are  not  able  to  use  the  tool  or  are  not  satisfied   with   its   performance   then   the   tool   has   failed.   They   also   serve   a   very   important   purpose   in   giving   formative   feedback  on  the  prototype  tool  during  the  development  process.  Finally  in  chapter  one,  we  give  an  overview  of   the  evaluation  methodology  and  benchmarking  framework.  The  evaluation  methodology  involves  coming  up  with   the  metrics  that  will  be  used  to  assess  INDICATE,  expert  heuristic  evaluation  of  the  tool  and  user  evaluation  of  the   tool  implemented  using  task  based  assessment,  questionnaires  and  interviews.   Chapter  two  details  the  heuristic  evaluation.  The  goal  is  to  have  experts  identify  any  serious  usability  problems   before   end-­‐user   testing.   Using   heuristic   evaluation   prior   to   user   testing   will   reduce   the   number   and   severity   of   usability  problems  discovered  by  users.  However,  the  issues  found  in  a  heuristic  evaluation  are  usually  different  to   those  identified  by  user  testing,  so  one  cannot  replace  the  other.   Chapter   three   details   the   metrics   and   user   evaluation   of   the   system.   The   metrics   are   broken   down   into   three   broad   categories   –   performance   metrics,   issue-­‐based   metrics   and   satisfaction   metrics.   Performance   metrics   include   task   success,   time-­‐on-­‐task,   errors,   efficiency   and   learnability   and   are   measured   by   observing   the   user   trying   to   complete   a   series   of   tasks   with   the   tool.   Issue-­‐based   metrics   identify   anything   that   causes   the   user   problems  in  completing  the  tasks.  These  issues  will  be  identified  by  observing  the  user  performing  the  tasks  and   asking  the  user  at  the  end  of  each  task  about  any  difficulties  encountered.  Each  issue  will  be  categorised  as  low,   medium  or  high  severity  depending  on  its  affect  on  user  performance  of  the  task.  Satisfaction  metrics  measure   how   users   feel   about   the   system   and   will   be   measured   using   a   number   of   questionnaires,   which   produce   both   qualitative   and   quantitative   data.   The   System   Usability   Scale   produces   a   single   number,   which   measures   the   usability   of   the   system.   The   INTUI   scale   produces   four   numbers,   which   measure   Gut   Feeling,   Verbalizability,   Effortlessness   and   Magical   Experience.   The   Microsoft   Desirability   toolkit   asks   the   user   to   associate   words   with   their  experience  with  the  software,  rank  them  and  them  leads  to  an  interview  follow  up  where  qualitative  data   about  the  experience  is  gathered.  Finally  in  this  chapter  we  detail  the  ethics  that  will  be  used  in  interacting  with   users,  the  number  of  test  users  needed  and  how  we  will  combine  metrics  to  produce  a  single  usability  score  for   the  tool.   Chapters   four   and   five   detail   the   questionnaires   and   interview   formats   that   will   be   used   and   how   the   data   will   be   analysed.     Chapter   six   details   how   the   tool   will   be   benchmarked.   We   approach   this   from   two   sides.   We   will   use   real   time   energy  usage  data  gathered  in  the  Living  Lab  based  in  DKIT  to  assess  the  Virtual  City  Model,  Dynamic  Simulation   Model  and  algorithms  developed  in  the  INDICATE  project.  We  will  ask  the  test  users  during  the  interviews  about   their  experience  of  the  INDICATE  tool  versus  other  tools  that  they  use  to  carry  out  similar  tasks.   Finally  Chapter  seven  details  the  organization  of  the  evaluation  and  benchmarking  activities.      

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            4  


1 METHODOLOGY   In  this  chapter  we  describe  the  methodology  that  will  be  used  to  evaluate  and  benchmark  the  INDICATE  tool.  In   section  1.1  we  give  an  introduction  to  usability  testing  and  note  the  importance  of  choosing  the  right  test  users.  In   section   1.2   we   give   a   summary   of   work   presented   in   D1.1   and   D1.3   which   detail   the   users   that   will   test   the   INDICATE  tool  and  their  expectations  of  the  tool.  These  individuals  will  be  the  test  users  we  will  use  to  carry  out   the  tasks  detailed  in  Chapters  3,  4  and  5.  Finally  in  section  1.3  we  give  an  overview  of  the  evaluation  methodology   and  benchmarking  framework  that  will  be  used  to  assess  the  INDICATE  tool.  

1.1 Introduction Usability  testing  is  an  essential  part  of  the  software  development  process.  Usability  has  been  defined  by  the  ISO   as  the  ‘effectiveness,  efficiency  and  satisfaction  with  which  a  specified  set  of  users  can  achieve  a  specified  set  of   tasks  in  a  particular  environment’.  Usability  is  an  essential  quality  of  any  application  and  it  is  recognised  that  an   iterative   user-­‐centred   usability   testing   process,   whereby   designs   are   iteratively   evaluated   and   improved,   is   essential  for  developing  usable  applications  (Hartson,  Andre  &  Williges,  2003).  Research  has  considered  usability   experts,   intended   end   users,   novice   evaluators   and   domain   experts   as   participants   of   usability   testing.   Each   participant   brings   a   certain   level   of   technical   expertise,   domain   knowledge   and   motivation   (Tullis   &   Albert,   2008).   As   such,   the   participant   plays   a   vital   role   in   the   usability   problems   that   are   discovered.   Indeed,   there   are   other   factors  to  consider,  which  may  influence  the  usability  issues  identified,  including  the  tasks  participants  are  asked   to   complete,   the   environment   used,   the   observers   etc.   While   some   research   suggests   five   expert   users   is   the   ‘magic   number’   for   identifying   roughly   80%   of   the   usability   problems   in   web   applications   (Nielsen,   2000),   it   is   thought   by   others   that   this   view   is   naïve   and   a   higher   number   of   users   is   desirable   (Spool   &   Schroeder,   2001;   Woolrych   &   Cockton,   2001).   However,   there   is   still   no   agreement   across   usability   practitioners   on   how   many   users  is  enough.   A  number  of  usability  testing  methods  exist  and  there  are  numerous  varied  opinions  on  both  their  practicality  and   their  effectiveness.  Usability  inspection  methods  (UIMs),  such  as  heuristic  evaluations  and  cognitive  walkthroughs   for  example,  are  relatively  cheap  to  carry  out  (Bias,  1994;  Nielsen,  1994b;  Wharton  et  al,  1994).  While  heuristic   evaluations  are  thorough,  providing  lots  of  ‘hits’  in  terms  of  problems,  there  can  also  be  many  false  alarms  as  it  is   difficult  to  determine  which  problems  will  actually  impede  a  user’s  ability  to  successfully  complete  a  given  task.   More   recently,   it   has   been   argued   that   UIM’s   are   not   as   effective   as   traditional   user   testing   with   real   users,   as   UIM’s   ‘predict’   problems   rather   than   report   on   observed   problems   with   real   users   (Liljegren,   2006).   However,   using   heuristic   evaluation   prior   to   user   testing   will   reduce   the   number   and   severity   of   usability   problems   discovered   by   users.   Furthermore,   the   issues   found   in   a   heuristic   evaluation   are   usually   different   to   those   identified  by  user  testing,  so  one  cannot  replace  the  other.  Thus,  both  heuristic  evaluation  with  expert  users  and   usability  testing  with  potential  INDICATE  users  will  constitute  the  INDICATE  evaluation  framework.  

1.2 The  User,  User  Categories  and  User  Expectations   In   this   section   we   will   summarize   work   carried   out   in   work   packages   1.1   and   1.3   to   detail   the   users   and   their   expectations  of  the  INDICATE  tool.  These  users  and  their  expectations  of  the  tool  will  be  crucial  to  the  testing  that   is  detailed  in  chapters  3,  4  and  5.  As  part  of  the  evaluation  process  we  will  ask  these  users  to  classify  themselves   along  two  axes,  knowledge  about  the  domain  and  extent  of  computer  experience.  This  classification  will  be  used   in   the   analysis   of   the   results   to   see   if   there   is   a   difference   between   novice   and   expert   user’s   experience   of   the   INDICATE  tool.  The  users  fall  into  eight  different  categories,  which  we  summarize  below.  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            5  


1.2.1 Urban  Planners   At   the   Genoa   test   site   Pier   Paolo   Tomiolo   is   an   architect   who   is   responsible   for   urban   planning   in   the   regional   administration.   Dr.   Gabriella   Minervini   works   for   the   Department   of   Environment   in   the   Regione   Liguria.   She   is   responsible   for   the   environmental   impact   assessment   of   the   new   Galliera   project.   Maurizio   Sinigaglia   and   Silvia   Capurro   are   architects   who   are   responsible   for   the   development   plan   of   the   city   of   Genoa.   Rita   Pizzone   is   an   architect   with   the   Architectural   heritage   and   Landscape   of   Liguria   who   is   involved   as   the   New   Galliera   project   involves   architectural   listed   buildings   and   is   in   a   region   of   landscape   protection.   Simon   Brun   is   an   engineer   who   is   responsible  for  new  hospitals  in  Liguria.   At  the  Dundalk  test  site  Catherine  Duff  us  an  executive  town  planner  at  Louth  County  Council  and  has  extensive   experience  in  planning  including  projects  such  as  smarter  planning.     Table  1  below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will   be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.     Table  1:  Urban  Planners  Target  Group  

Urban Planners   Needs   Expectations  for  INDICATE   Sustainable   urban   planning;   optimize   community’s   land   Holistic  vision  of  a  city  to  enhance  urban  sustainability.   use  and  infrastructure.     Optimise  efficiency  and  minimize  energy  use.   Simulations  to  balance  load  and  demand  in  real  time.   Understand   interactions   between   buildings,   RET,   local   Dynamic   Simulation   Modelling,   which   allows   to   model   distribution  networks  and  the  grid.   the  interactions  between  the  city  and  its  subsystems.  

1.2.2  Public  Authorities   At  the  Genoa  test  site  Franco  Giodice  is  an  architect  who  is  vice  director  of  sector  investments  in  the  health  and   social  services  department  of  the  Liguria  region.   At   the   Dundalk   test   site   Louth   County   Council   are   partners   in   the   INDICATE   project.   Padraig   O’Hora   is   a   senior   executive  engineer  in  the  European  and  Energy  office.   Table  2  below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will   be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.     Table  2:  Public  Authorities  Target  Group  

Public Authorities   Needs   Plan   a   sustainable   Smart   city   (Smart   environment,   Smart  mobility,  Smart  living).   Reduce  energy  consumption  and  carbon  emissions   Integrate  Renewable  Energy  Technologies  (RET).  

Expectations for  INDICATE   Holistic  vision  of  a  city  to  enhance  urban  sustainability.     Dynamic  Simulation  Modelling.   3D   urban   modelling   to   assess   the   impact   of   the   integrated  technologies.   Optimise   existing   systems   and   increase   energy   Solutions   to   model   the   interactions   between   building,   efficiency.   installed  systems  and  the  grids.   Support   to   define   and   validate   regulations   and   Tools   able   to   understand   how   the   regulatory   directives  in  urban  environment.   requirements,  the  policies  and  the  standards  influence   the   approach   taken   to   scheme   development   and   the   selection  of  any  methodology.   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            6  


1.2.3 Developers  (Architects/Engineers/Designers)   At   the   New   Galliera   project   in   Genoa   Paola   Brecia   (OBR)   has   designed   the   urban   scale   and   building   envelope,   Riccardo   Curci   (Steam)   has   designed   the   energy   systems   and   their   integration   and   Lisiero   (D’Appalonia)   is   the   author  of  the  environmental  impact  analysis.     In   Dundalk   David   McDonnell   is   Chief   Executive   of   the   Smart   Eco   Hub   and   is   responsible   for   creating   business   opportunities,  living  labs  and  stimulating  innovation  in  the  Dundalk  region.   Table  3  below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will   be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.       Table  3:  Developers  Target  Group  

Developers Needs   Support   to   optimise   existing   buildings   and   integrate   new  technologies  in  the  city  environment.    

Expectations for  INDICATE   Tool   to   analyse   the   buildings’   and   the   districts’   environment   and   understand   how   the   different   infrastructures   of   the   city   are   related   to   one   another;   3D   urban   modelling   to   assess   the   impact   of   the   integrated  technologies   Centralized   technology   portfolio   to   evaluate   different   Dynamic   Simulation   Modelling,   which   allows   solutions.   evaluations  of  different  solutions.    

1.2.4 Main  Contractors   In  Dundalk  Kingspan  and  Glen  Dimplex  have  been  involved  in  carrying   out  many  upgrades  to  local  council  houses.   Table  4  below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will   be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.       Table  4:  Main  Contractors  Target  Group  

Main Contractors   Needs   Efficient  project  coordination.  

Expectations for  INDICATE   Simulation   and   energy-­‐based   DSM   taking   account   buildings   and   their   interactions   with   urban   environment.   Clear   and   global   view   of   the   process   and   actors   Solutions   to   connect   decision   makers   and   experts   to   involved.   enable  exchange  of  experience  and  best  practice.  

1.2.5  Technology  Providers  (ICT  and  RET)   In  Genoa  engineer  Borgiorni  of  SIRAM,  the  energy  management  company  of  the  hospital  has  installed  a  combined   engine  for  electricity  and  heating  and  is  developing  a  program  of  HVAC  centralized  control.   In   Dundalk   Derek   Roddy   is   CEO   of   Climote,   a   company   who   provide   remote   control   for   home   heating.   Damian   McCann  is  a  corporate  account  manager  with  Viatel,  Digiweb  group.  He  as  been  involved  with  the  Louth  County   Council  Broadband  Forum  and  is  assiting  with  developing  Dundalk  as  a  smart  town.   Table   5   on   the   next   page   summarizes   the   expectations   that   these   users   have   of   the   INDICATE   tool.   These   expectations  will  be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.       XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            7  


Table 5:  Technology  Providers  Target  Group  

Technology Providers   Needs   Increase  market  share  of  their  technologies.    

Expectations for  INDICATE   Software   to   simulate   and   demonstrate   the   increase   of   energy   efficiency   with   the   integration   of   new   technologies.   Support   to   the   development   of   new   technologies   and   Solutions   to   analyse   and   compare   the   efficiency   of   solutions.   different  technologies  and  estimate  ROI;  Software  that     provides  a  holistic  vision  of  a  city.  

1.2.6  Material  and  Solution  Manufacturers   In   Dundalk   Derek   Roddy   is   CEO   of   Climote,   a   company   who   provide   remote   control   for   home   heating.   Table   6   below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will  be  used   to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.       Table  6:  Material  and  Solution  Manufacturers  Target  Group  

Material and  Solution  Manufacturers   Needs   Increase  market  share  of  their  products.  

Expectations for  INDICATE   Software   to   simulate   and   demonstrate   the   increase   of   energy  efficiency  with  the  integration  of  new  products.   New  market  opportunities.   Solutions   to   analyse   and   compare   the   efficiency   of     different  technologies  and  estimate  ROI.   Support   to   test   products   and   solution   in   city   3D   urban   modelling   to   assess   the   impact   of   the   environment.   installed  solution.  

1.2.7  Energy  Utility  Companies   In  Genoa  engineer  Borgiorni  of  SIRAM,  the  energy  management  company  of  the  hospital  has  installed  a  combined   engine  for  electricity  and  heating  and  is  developing  a  program  of  HVAC  centralized  control.   At   the   Dundalk   site   Declan   Meally   has   worked   with   SEAI   for   10   years   and   is   head   of   Department   of   Emerging   Sectors  since  2012.     Table  7  below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will   be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.     Table  7:  Energy  Utilities  Target  Group  

Energy Utility  Companies   Needs   Support   to   develop   new   energy-­‐related   solutions   and   test  and  implement  solutions.   More  competitive  prices  and  tariff  plans.    

Expectations for  INDICATE   Tool   to   estimate   the   revenues   and   ROI   for   each   infrastructure  improvement.   Tool  able  to  simulate  the  balance  of  load  and  demand   in  real  time  and  to  evaluate  different  tariffs  plans.  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            8  


1.2.8 R&D   Barry  Grennan  (Xerox3  )  is  the  Business  Centre  Manager  with  Xerox  since  2002  with  responsibility  for  the  colour   toner   manufacturing   operation   in   Xerox’s   Dundalk   Plant.   Barry   has   a   particlar   interest   in   Demand   Side   Management  and  is  currently  implementing  it  at  the  Xerox  plant  in  Dundalk.   Table  8  below  summarizes  the  expectations  that  these  users  have  of  the  INDICATE  tool.  These  expectations  will   be  used  to  inform  the  tasks  carried  out  to  evaluate  the  INDICATE  tool  in  chapter  3.     Table  8:  R&D  Target  Group  

R&D Needs   Expectations  for  INDICATE   Support   to   test   new   research   and   new   solution   in   city   Dynamic   Simulation   Modelling,   which   allows   to   environment.   evaluate  different  solution.  

 

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            9  


1.3

Overview  of  Evaluation  Methodology  and  Benchmarking  Framework  

Figure 1   below   details   the   evaluation   methodology.   First   we   started   with   a   review   of   the   literature   and   ISO   9241-­‐ 11  usability  guidelines  to  come  up  with  the  metrics  that  we  will  use  to  evaluate  the  INDICATE  tool.  Of  particular   use  here  were  (Tullis  &  Albert,  2008)  and  (Nielsen,  1993)  both  of  which  give  pragmatic  common  sense  approaches   to   evaluating   software.   Having   identified   the   metrics   we   wish   to   measure   we   then   detail   how   we   go   about   measuring  them.  First  we  carry  out  a  heuristic  evaluation  where  a  usability  expert  assesses  the  tool  before  it  is   used   in   user   testing.   In   order   to   carry   out   user   testing   we   first   identify   appropriate   tasks   and   then   ask   the   user   to   use  the  think  aloud  technique  while  carrying  out  the  tasks.  When  all  tasks  and  questionnaires  are  completed  we   then  interview  the  user  to  further  investigate  the  user  experience.       Literature  

 

Usability Guidelines  

ISO 9241  -­‐  11     Usability  Goals   Heuristic  Evaluation   Guidelines  (Nielsen)   Heuristic  Evaluation   Chapter  2  

Feedback to     Prototype  D6.2  

Performance and   Satisfaction  Metrics  

 

Task Design  Guidelines  

Questionnaires Chapter  4  

Task Design       Chapter  3  

Selection of  best   technique  

Think Aloud   Technique  

Interviews                           Chapter  5  

 

Feedback  to  Final   Tool    D6.3,  D6.4  

 

Thematic Analysis  of   Interviews  

Analysis of  Results     D6.2  

List of  fixes  and  improvements   required  

Figure 1:  Evaluation  Methodology  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            10  


Figure 2  below  details  the  benchmarking  framework.  Here  we  aim  to  benchmark  the  algorithms  developed  as  part   of  the  INDICATE  tool  by  comparing  their  predictions  against  real  time  energy  usage  data  gathered  in  the  Living  Lab   located  in  DKIT.  Access  to  the  real  energy  usage  data  will  allow  us  to  benchmark  the  prediction  algorithms  and   models  including  the  Virtual  City  Model  and  the  Dynamic  Simulation  Model.   We  will  benchmark  INDICATE  against   other   tools   by   asking   the   users   during   the   interview   to   compare   their   experiences   with   INDICATE   against   tools   that   they   have   previously   used.   We   will   ask   users   to   compare   them   in   terms   of   speed,   user   interface,   data   requirements  and  portability.     Benchmarking  

   

INDICATE vs  Other   Tools  

Predicted vs  Real  

Figure 2:  Benchmarking  Framework  

 

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            11  


2 HEURISTIC  EVALUATION   2.1  Introduction   Heuristic   evaluation   is   a   usability   evaluation   method   (UEM)   typically   employed   to   identify   usability   problems   in   interactive   systems.   It   is   one   of   many   expert   review   methodologies.   With   heuristic   evaluation   “The   expert   reviewers   critique   an   interface   to   determine   conformance   with   a   short   list   of   design   heuristics”   (Schneiderman   and   Plaisant,   2005).   It   is   important   that   the   experts   are   familiar   with   the   heuristics   and   capable   of   interpreting   and  applying  them.   Formal,   expert   reviews   have   proven   to   be   effective   as   a   starting   point   for   evaluating   new   or   revised   interfaces   (Nielsen   and   Mack,   1994).   The   expertise   of   such   users   may   be   in   the   application   area,   or   in   user   interface   design.   Typically,  expert  reviews  can  happen  both  at  the  start  and  end  of  a  design  phase.  The  output  is  usually  a  report   highlighting  any  identified  problems  and  recommending  design  changes  that  should  be  integrated  before  system   deployment.     Heuristics   are   so   called,   as   they   are   rules   of   thumb   rather   than   specific   usability   guidelines.   However,   they   are   well  reported  in  the  literature  and  often  used  in  expert  evaluation  of  interactive  systems.  Heuristic  evaluation  has   a  number  of  advantages,  including  the  ability  to  provide   quick  and  relatively  inexpensive  feedback  to  designers  at   an  early  stage  of  the  design  process.    

2.2 Heuristics  and  Experts  who  will  evaluate  INDICATE   Within   the   INDICATE   project,   heuristic   evaluation   will   take   place   at   the   beginning   of   the   evaluation   phase,   once   a   pilot   system   is   available   for   testing   and   before   usability   testing   with   end   users   begins.   The   goal   is   to   have   experts   identify  any  serious  usability  problems  before  end-­‐user  testing.  Using  heuristic  evaluation  prior  to  user  testing  will   reduce   the   number   and   severity   of   usability   problems   discovered   by   users.   However,   the   issues   found   in   a   heuristic  evaluation  are  usually  different  to  those  identified  by  user  testing,  so  one  cannot  replace  the  other.   Guidelines  that  INDICATE  will  be  evaluated  against   There  exist  numerous  guidelines  and  heuristics  for  evaluating  interactive  systems.  However,  the  most  well  known   and   often   used   are   heuristics   developed   by   Nielsen   (Nielsen,   1994)   and   Schneiderman   (Schneiderman   and   Plaisant,  2005).   Jacob  Nielsen  outlined  10  general  principles  of  interaction  design  (Nielsen  1994):   1 2

3

4 5

Visibility of  system  status  –  The  system  should  always  keep  users  informed  about  what  is  going  on,  through   appropriate  feedback  within  a  reasonable  time.   Match   between   system   and   the   real   world   –   The   system   should   speak   the   users’   language,   with   words,   phrases  and  concepts  familiar  to  the  user,  rather  than  system-­‐oriented  terms.  Follow  real-­‐world  conventions,   making  information  appear  in  a  natural  and  logical  order.   User  control  and  freedom  –  Users  often  choose  system  functions  by  mistake  and  will  need  a  clearly  marked   “emergency  exit”  to  leave  the  unwanted  state  without  having  to  go  through  an  extended  dialogue.  Support   undo  and  redo.   Consistency  and  standards  –  Users  should  not  have  to  wonder  whether  different  words,  situations  or  actions   mean  the  same  thing.  Follow  platform  conventions.   Error  prevention  –  Even  better  than  good  error  messages  is  a  careful  design  that  prevents  a  problem  from   occurring   in   the   first   place.   Either   eliminate   error-­‐prone   conditions   or   check   for   them   and   present   users   with   a  confirmation  option  before  they  commit  to  the  action.  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            12  


6 Recognition rather   than   recall   –   Minimise   the   user’s   memory   load   by   making   objects,   actions   and   options   visible.   The   user   should   not   have   to   remember   information   from   one   part   of   the   dialogue   to   another.   Instructions  for  use  of  the  system  should  be  visible  or  easily  retrievable  whenever  appropriate.     7 Flexibility   and   efficiency   of   use   –   Accelerators   –   often   unseen   by   the   novice   user   –   may   speed   up   the   interaction  for  the  expert  user  such  that  the  system  can  cater  to  both  inexperienced  and  experienced  users.   Allow  users  to  tailor  for  frequent  actions.   8 Aesthetic   and   minimalist   design   –   Dialogues   should   not   contain   information   which   is   irrelevant   or   rarely   needed.   Every   extra   unit   of   information   in   a   dialogue   competes   with   the   relevant   units   of   information   and   diminishes  their  relative  visibility.   9 Help   users   recognize,   diagnose   and   recover   from   errors   –   Error   messages   should   be   expressed   in   plain   language  (no  codes),  precisely  indicate  the  problem  and  constructively  suggest  a  solution.   10 Help  and  documentation  –  Even  though  it  is  better  if  the  system  can  be  used  without  documentation,  it  may   be  necessary  to  provide  help  and  documentation.  Any  such  information  should  be  easy  to  search,  focused  on   the  user’s  task,  list  concrete  steps  to  be  carried  out  and  not  be  too  large.     Schneiderman  also  outlined  8  golden  rules  of  interface  design  –  8  rules  or  principles  that  are  applicable  in  most   interactive   systems   and   which   were   defined   based   on   experience   over   two   decades   of   design   and   evaluation   work.  These  rules  can  be  fine-­‐tuned  for  the  specific  system  being  evaluated.  They  include:   1 2

3 4 5 6 7

8

Strive for  consistency  –  E.g.  identical  terminology  used  throughout,  consistent  colour,  layout,  fonts  etc.     Cater   to   universal   usability   –   Recognise   the   needs   of   diverse   users   and   design   for   plasticity.   Novice-­‐expert   differences,   age   ranges,   disabilities,   and   technology   diversity   should   guide   the   design   requirements.   Implementing   features   for   novices   (such   as   explanations)   and   experts   (such   as   short   cuts)   can   improve   the   design.     Offer   informative   feedback   –   For   every   action   the   user   carries   out,   there   should   be   feedback   from   the   system.  This  can  be  modest  for  frequent  actions,  or  more  substantial  for  infrequent  actions.   Design   dialogs   to   yield   closure   –   Ensure   that   sequences   of   actions   are   grouped   and   have   a   beginning,   a   middle  and  an  end.     Prevent  errors  –  As  much  as  possible,  design  the  system  so  that  users  cannot  make  serious  errors.  If  a  user-­‐ error  occurs,  provide  simple,  constructive  and  specific  instructions  to  recover  from  that  error.     Permit  easy  reversal  of  actions  –  Actions  should  be  reversible,  as  much  as  possible.  Allowing  a  user  to  ‘undo’   something  reduces  anxiety  and  encourages  exploration  of  unfamiliar  options.     Support   internal   locus   of   control   –   Experienced   users   like   to   sense  that   they   are   in   charge   of   the   system   and   that   the   system   responds   to   their   actions.   Inability   to   perform   a   certain   action,   get   the   required   data,   or   tedious  sequences  of  actions  can  cause  dissatisfaction  and  frustration.   Reduce   short   term   memory   load   –   Human   short   term   memory   is   limited   in   terms   of   its   processing   power,   and   research   has   shown   that   humans   can   adequately   remember   seven   plus   or   minus   two   chunks   of   information.  This  means  that  interfaces  should  be  kept  simple  and  not  overloaded  with  information.  

Conducting  the  Evaluation   Define  the  Tasks   A  starting  point  for  the  heuristic  evaluation  will  be  the  evaluation  of  the  tasks  and  actions  (nouns  and  verbs)  of   the   INDICATE   system.   These   tasks   will   be   defined   prior   to   the   heuristic   evaluation   and   evaluators   will   be   asked   to   carry  them  out  followed  by  comments  on  the  corresponding  interface  objects  and  actions.       XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            13  


Identify the  Users   As   with   all   UEMs,   there   is   no   consensus   on   how   many   experts   are   required   to   carry   out   a   heuristic   evaluation.   Some   research   suggests   one   expert   is   enough,   whereas   other   research   reports   that   different   experts   tend   to   find   different   problems   with   interfaces,   and   so   recommend   that   3-­‐5   expert   reviewers   are   recruited.   While   user   interface  design  experts  are  familiar  with  the  field  of  evaluating  interactive  systems,  they  may  not  be  familiar  with   the   application   area.   Therefore,   we   will   recruit   1   interface   expert   as   well   as   1   application   domain   expert   to   conduct  our  heuristic  evaluation.     Review  the  Heuristics     With   an   expert   review,   evaluators   should   know   and   understand   the   above   heuristics   to   be   able   to   assign   a   problem  to  one  of  them.  Each  evaluator  will  review  the  INDICATE  system  individually.  INDICATE  HCI  researchers   will   combine   Nielsen’s   and   Schneiderman’s   heuristics,   removing   duplicates.   Evaluators   will   be   provided   with   a   template  with  each  heuristic  outlined,  asking  the  evaluator  to  evaluate  the  system  against  that  particular  heuristic   and   outline   to   what   degree   it   has   been   satisfied.   Evaluators   will   be   asked   to   provide   comments   beside   each   heuristic  and  to  use  screen  grabs  if  they  feel  this  will  help  to  illustrate  their  point.     Writing  the  Report   Once  the  evaluators  have  completed  their  evaluation  template,  all  the  feedback  will  be  combined,  the  usability   problems  will  be  clustered  into  thematic  areas  and  categorised  into  different  levels  of  severity.  Usability  problems   will  be  categorized  into  critical,  serious,  medium  or  low,  using  a  decision  tree  (Travis  2009).  The  report  will  rank   recommendations   by   importance   and   expected   effort   level   for   redesign/   implementation.   The   report   will   also   outline  a  summary  of  all  findings,  including  a  list  of  all  usability  problems  in  a  table,  with  its  severity  ranking,  ease   of   fixing   and   heuristic   violated.   Specific   problem   areas   will   be   highlighted,   including   evidence   of   that   problem   occurring  in  the  interface  and  a  recommendation  on  how  to  resolve  the  issue.  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            14  


3 METRICS  FOR  VALIDATING  INDICATE   3.1  Introduction   In  this  chapter  we  will  consider  the  metrics  that  will  be  used  to  assess  the  INDICATE  tool.  In  section  3.2  we  will   introduce  three  kinds  of  metrics,  performance  metrics,  issue-­‐based  metrics  and  self-­‐reported  metrics.  In  section   3.2.1,2,3  we  will  describe  the  metrics  that  will  be  used  in  the  evaluation,  how  they  will  be  gathered  and  how  the   data   will   be   analysed.   Finally   we   will   consider   how   all   of   the   data   will   be   combined   to   produce   an   overall   usability   score  for  the  INDICATE  tool.  This  chapter  follows  methods  detailed  in  (Tullis  &  Albert,  2008).  

3.2 Usability  Metrics   We  can  break  usability  metrics  down  into  three  broad  categories,  performance  metrics,  issue-­‐based  metrics  and   self-­‐reported   metrics.  Performance   is   all   about   what   the  user   actually   does   in   interacting   with   the   product.   We   will  measure  this  by  asking  the  users  to  perform  specific  tasks  with  the  INDICATE  tool  and  will  give  the  details  in   section   3.2.1.   Issue-­‐based   metrics   will   be   gathered   by   questioning   users   after   they   have   performed   each   task   and   are   detailed   in   section   3.2.2.   Self-­‐reported   or   satisfaction   metrics   will   assess   user’s   overall   experience   with   the   feelings  for  the  final  product  and  will  be  gathered  by  asking  users  to  complete  standard  questionnaires  after  all   tasks  have  been  completed.  The  details  are  given  in  section  3.2.3  below.  

3.2.1 Performance  Metrics   Five  types  of  performance  metrics  will  be  used  in  assessing  INDICATE.   1. Task   success   measures   how   effectively   users   are   able   to   complete   a   given   set   of   tasks.   We   will   detail   sample  tasks  and  report  results  as  binary  success.   2. Time-­‐on-­‐task  measures  how  much  time  is  spent  on  each  task.     3. Errors   are   mistakes   made   during   a   task   and   will   be   used   to   find   confusing   or   misleading   parts   of   the   interface.   4. Efficiency  will  be  calculated  using  task  success  and  time-­‐on-­‐task  measures.   5. Learnability  will  allow  us  to  measure  how  performance  changes  over  time.    

3.2.1.1 Task  Success   The  tasks  detailed  below  are  informed  by  user  expectations  detailed  above  in  section  1.2.  For  each  task  we  will   give  a  clear  end  state  and  define  the  criteria  for  success.  Users  will  be  asked  to  verbally  articulate  the  answer  after   completing  the  task.  Tasks  will  be  given  to  the  user  one  at  a  time  to  give  a  clear  start  condition  to  each  task  and   facilitate   the   timing   of   each   task.   These   tasks   will   be   refined   when   a   prototype   version   of   the   INDICATE   tool   is   available.  Table  9  below  gives  some  sample  tasks.   Table  9:  Tasks  for  Evaluation  

Task No.   1  

2

3

Task Details  with  Clear  End  Condition     Aim:  Holistic  vision  of  a  city   Task:  What  are  total  population,  energy  usage  and  renewable   energy  production  in  the  city?   End  Condition:  Three  values     Aim:  Simulations  to  balance  load  and  demand  in  real  time   Task:  At  what  time  of  the  day  does  max  import  from  the  grid   happen?    What  are  3  options  to  meet  this  demand   End  Condition:  Time,  3  options   Aim:   Dynamic   simulation   modelling   to   model   interactions  

Success Criteria   Correct  three  values    

Time, 3   options   from   a   list   of   all   possible   options   available   before   the  task  starts   2   sets   of   rankings   to   be   compared  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            15  


4

5

6

7

8

9

between the  city  and  its  subsystems   Task:   Rank   each   subsystem   on   demand   (buildings,   transport,   public   services)   and   supply   sides   (centralized,   distributed)   in   terms  of  their  energy  consumption  or  production.   End  Condition:  Two  sets  of  rankings   Aim:   3D   urban   modelling   to   assess   the   impact   of   integrated   technologies   Task:   Add   100kWp   of   solar   PV   to   buildings   and   assess   their   impact   on   the   city   –   what   is   the   expected   output   from   the   panels?   How   does   this   affect   the   city’s   import   from   the   grid?   How  will  these  panels  affect  the  cityscape?   End  Condition:  PV  output,  impact  on  grid,  impact  on  cityscape   Aim:   Understand   regulatory   requirements,   policies   and   standards   Task:   Add   a   new   school   to   the   model   and   lists   restrictions   imposed  on  the  model   End  Condition:  A  list  of  restrictions   Aim:   Solutions   to   connect   decision   makers   and   experts   to   enable  the  exchange  of  experience  and  best  practice   Task:   Following   on   from   task   5   list   the   experts   suggested   by   the  software  to  help  with  this  task.     End  Condition:  List  of  experts   Aim:   Demonstrate   the   increase   in   energy   efficiency   with   the   integration  of  new  technologies   Task:   If   we   retrofit   all   homes   in   the   city   with   triple   glazed   windows   how   would   this   affect   the   energy   demand   of   the   city?   End  Condition:  kWh  value  for  decrease  in  demand   Aim:   Analyse   and   compare   the   efficiency   of   different   technologies   and   estimate   the   ROI,   ROI   for   infrastructure   investment   Task:  Compare  task  7  with  adding  a  solar  panel  for  hot  water   to  the  roof  of  all  homes  in  the  city.  What  would  kWh  saving  be   in  this  case?  What  is  ROI  for  both  tasks?  Which  is  better  value?   End  Condition:  kWh  saving,  ROI,  which  is  better   Aim:  Use  DSM  to  evaluate  different  tariff  plans   Task:  Given  two  local  tariff  plans  for  public  services  in  the  city   use  DSM  to  access  which  is  better  value  for  the  local  authority   End   Condition:   A   statement   about   which   of   the   tariff   plans   is   better  value  

to values  worked  out  before  task  

Correct values   for   PV   production,   decrease   in   import   from   grid   and   whether   the   panels   will   have   any   impact  on  the  cityscape  

List of   restrictions   from   a   list   of   options   available   before   the   task   starts  

List of  experts  from  a  list  of  options   available  before  the  task  starts  

kWh value  for  decrease  in  demand  

kWh saving,  ROI,  which  is  better  

Statement about  which  tariff  plan  is   better  value  for  the  public  authority  

Users will  be  given  either  a  success  (1)  or  failure  (0)  for  each  task.  We  will  use  the  numeric  score  to  calculate  the   average  as  well  as  confidence  intervals  for  the  success  of  each  task  as  detailed  in  table  10  on  next  page.           XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            16  


Table 10:  Summary  measures  for  task  completion   Participant   P1   P2   P3   P4   P5   Average   Confidence   Interval  (95%)  

Task 1   1   1   1   1   0   80%   39%  

Task 2   0   0   1   1   0   40%   48%  

Task 3   1   1   1   1   1   100%   0%  

Task 4   0   0   1   1   1   60%   48%  

Task 5   0   1   1   1   1   80%   39%  

Results  will  be  presented  on  a  per  task  level  with  average  completion  rate  as  well  as  a  confidence  interval  for  each   measure  reported  as  in  Figure  3  below.  

Task Success   120%  

% Successful    

100% 80%   60%   40%   20%   0%   Task  1  

Task 2  

Task 3  

Task 4  

Task 5  

Figure 3:  Presentation  of  Results  showing  mean  and  95%  confidence  interval  

3.2.1.2 Time-­‐on-­‐task   The  time-­‐on-­‐task  measure  will  give  us  information  about  the  efficiency  of  the  tool.  The  shorter  the  time  it  takes  to   complete  any  of  the  tasks  the  better.  Time  in  task  will  be  measured  as  the  time  from  the  user  receiving  the  task  to   the  time  that  they  verbalize  the  answer.  A  stopwatch  will  be  used  to  measure  the  time  in  seconds.  Also  a  screen   recording   or   the   users   screen   and   audio   during   the   test   will   be   gathered   and   this   can   be   used   to   check   timed   events.     Time   data   for   each   task   will   be   tabulated   and   summary   data   including   average,   median,   maximum,   minimum   and   confidence  intervals  will  be  reported  as  in  table  11  on  next  page.           XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            17  


Table 11  Summary  Results  for  Time-­‐on-­‐task  

Participant P1   P2   P3   P4   P5   Average   Median   Upper  bound   Lower  Bound   Confidence  Interval    

Task 1   259   253   42   38   33   125   42   259   33   105  

Task 2   112   64   51   108   142   95   108   142   51   33  

Task 3   135   278   60   115   66   130   115   278   60   77  

Task 4   58   160   57   146   47   93   58   160   47   48  

The time-­‐on-­‐task   results   will   be   presented   by   finding   the   average   time   for   each   task   as   well   as   confidence   intervals  and  the  results  will  be  graphed  as  in  Figure  4.  

Time-­‐on-­‐task Time  (sec)  to  completre  task  

250 200   150   100   50   0   Task  1  

Task 2  

Task 3  

Task 4  

Figure 4:  Time-­‐on-­‐task  showing  mean  times  and  95%  confidence  interval  

3.2.1.3 Errors   Errors  are  incorrect  actions  that  may  lead  to  task  failure  or  inefficiency.  Errors  can  include  entering  incorrect  data,   making  the  wrong  choice  in  a  menu  or  dropdown  list,  taking  an  incorrect  sequence  of  actions  or  failing  to  take  a   key   action.   Once   we   have   a   working   prototype   of   the   tool   we   will   make   a   list   of   all   possible   actions   a   user   can   do   with  the  tool  and  then  define  the  different  types  of  errors  that  can  be  made  with  the  product.     We  will  organize  errors  by  task,  with  each  task  having  multiple  opportunities  for  error.  We  will  record  the  number   of  errors  for  each  task  for  each  user  so  the  number  of  errors  for  each  task  will  be  between  zero  and  the  maximum   number  of  errors  for  that  task.  The  errors  will  be  counted  while  observing  the  users  completing  each  task  and  can   be  verified  from  the  screen  recordings.   In  order  to  see  the  tasks  that  are  producing  the  most  errors  we  will  take  the  total  number  of  errors  per  task  and   divide   it   by   the   total   number   of   error   opportunities   to   give   an   error   rate.   We   will   also   calculate   the   average   number  of  errors  made  by  each  participant  for  each  task.   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            18  


3.2.1.4 Efficiency   The   Common   Industry   Format   (CIF)   for   Usability   Test   Reports   (NIST,   2001)   specifies   that   the   “core   measure   of   efficiency”  is  the  ratio  of  the  task  completion  rate  to  the  mean  time  per  task  where  time  per  task  is  commonly   expressed  in  minutes.  The  efficiency  metric  is  calculated  as  the  ratio  of  the  task  completion  to  the  task  time  in   minutes.  An  example  is  given  in  Table  12  below.   Table  12:  Calculating  Efficiency  Metric  

Task 1   2   3   4    

Completion Rate  Percentage   80   60   100   40  

Task Time  (mins)   1.5   1.7   1   2.1  

Percent Efficiency   53   35   100   19  

The results  can  be  presented  in  graph  form  by  showing  the  average  efficiency  metric  for  each  task  as  shown  in   Figure  5  below.  

Percent Efficiency   Efficiency  (compleeon/eme)  

120.00% 100.00%   80.00%   60.00%   40.00%   20.00%   0.00%   Task  1  

Task 2  

Task 3  

Task 4  

Figure 5:  Efficiency  Metric  

3.2..1.5 Learnability   Learnability   is   a   measure   of   how   easy   it   is   to   learn   something   and   can   be   measured   by   examining   how   much   time   and  effort  is  required  to  become  proficient  at  something.  We  will  measure  this  by  carrying  out  the  tasks   detailed   in  Table  3.1  five  times  with  a  single  user  with  a  gap  of  two  weeks  between  each  trial.  For  each  trial  we  will  use   average   time-­‐on-­‐task,   averaged   over   all   tasks.   The   data   will   be   presented   as   in   Figure   6   on   next   page   and   the   slope   of   the   curve   will   indicate   how   difficult   the   system   is   to   learn.   We   also   hope   to   see   flattening   out   of   the   graph,   which   indicates   that   users   have   no   more   to   learn   about   the   system   and   have   reached   maximum   performance.  We  hope  that  five  iterations  will  be  enough  to  reach  this  point.  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            19  


Learnability 60  

Time-­‐on-­‐task(sec)

50 40   30   20   10   0   Trial  1  

Trial 2  

Trial 3  

Trial 4  

Trial 5  

Figure 6:  Learnability  

3.2.2 Issue-­‐Based  Metrics   In  order  to  gather  issue-­‐based  metrics  we  will  use  an  in  person  task  based  study  and  ask  the  user  to  think  aloud.   Users   will   be   asked   to   verbalize   their   thoughts   as   they   work   through   the   tasks   reporting   what   they   are   doing,   what  they  are  trying  to  accomplish,  how  confident  they  are  about  their  decisions,  their  expectations  and  why  they   performed  certain  actions.     At   the   end   of   each   task   from   Table   3.1   users   will   be   asked   to   rate   the   usability   of   the   system   for   the   task.   Observers  will  also  look  out  for  verbal  expressions  of  confusion,  frustration,  dissatisfaction,  pleasure  or  surprise  as   well   as   non-­‐verbal   behaviours   such   as   facial   expressions.   If   the   user   provides   a   low   usability   score   they   will   be   asked  to  explain  what  the  problem  was  and  why  they  rate  the  system  that  way.   We  will  use  a  three  level  system  to  classify  the  severity  of  usability  issues.    Severity  ratings  will  be  assigned  by  the   observer  based  on  observation  of  the  user  and  questioning  of  the  user  after  each  task  is  complete.  The  three  level   system  we  will  use  is:   Low:   Any   issue   that   annoys   or   frustrates   users   but   does   not   play   a   role   in   task   failure.   This   issue   may   only   reduce   efficiency  or  satisfaction  a  small  amount.   Medium:  Any  issue  that  contributes  to  but  does  not  directly  cause  task  failure.  These  issues  have  an  impact  on   effectiveness,  efficiency  and  satisfaction.   High:  Any  issue  that  directly  leads  to  task  failure.    These  issues  have  a  big  impact  on  effectiveness,  efficiency  and   satisfaction.   We   will   report   on   the   usability   issues   with   the   number   of   unique   issues   identified   classified   by   severity   rating.   These   unique   issues   will   also   be   documented   and   form   part   of   the   report   for   WP6.2   which   will   feed   back   into   the   design  of  the  final  tool.  We  will  use  graphs  such  as  Figure  7,  next  page  to  summarize  these  issues.  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            20  


Number of  Unique  Usability  Issues  

Usability Issues   18   16   14   12  

Low

10 8  

Medium

6

High

4 2   0   Design  1  

Design 2  

Figure 7:  Number  of  unique  usability  Issues  per  design  iteration  ranked  by  severity  

3.2.3 Self-­‐Reported  Metrics   Self-­‐reported  metrics  give  information  about  user’s  perception  of  the  tool.  They  express  how  users  feel  about  the   system  and  whether  they  enjoy  using  it  or  not.  We  will  use  three  standard  questionnaires  to  gather  these  metrics.   These  questionnaires  are:   System   Usability   Scale:   It   consists   of   ten   statements   to   which   users   rate   their   level   of   agreement.   Half   the   statements   are   positively   worded   and   half   negatively   worded.   A   5-­‐point   scale   is   used   for   each.   A   technique   for   combining  the  ten  ratings  into  an  overall  score  (on  a  scale  from  0  to  100)  is  given.  This  gives  an  overall  usability   score  with  100  representing  a  perfect  score.  We  give  full  details  in  section  4.2.   Intuitive   Interaction:     The   INTUI   model   explores   the   phenomenon   of   intuitive   interaction   from   a   User   Experience   (UX)   perspective.   It   combines   insights   from   psychological   research   on   intuitive   decision   making   and   user   research   in   HCI   as   well   as   insights   from   interview   studies   into   subjective   feelings   related   to   intuitive   interaction.   This   phenomenological   approach   acknowledges   the   multi-­‐dimensional   nature   of   the   concept   and   also   reveals   important   influencing   factors   and   starting   points   for   design.   The   INTUI-­‐model   suggests   four   components   of   intuitive  interaction,  namely,  Gut  Feeling  (G),  Verbalizability  (V),  Effortlessness  (E)  and  Magical  Experience  (X).  We   give  full  details  in  section  4.3. Microsoft  Desirability  Toolkit:  This  is  made  up  of  118  product  reaction  cards  containing  words  such  as  “Useful”,   “Consistent”  and  “Sophisticated”.  On  completion  of  a  usability  test,  users  are  asked  to  sort  through  the  cards  and   pick   the   top   five   that   most   closely   match   their   personal   reactions   to   the   system   the   have   just   used.   These   five   cards  then  become  the  basis  of  a  post-­‐test  interview.  We  give  full  details  in  section  4.4.   These   three   questionnaires   will   be   administered   after   the   user   has   completed   the   usability   test.   Full   details   of   how  the  results  will  be  processed  are  given  in  Chapter  4  below.  

    XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            21  


3.3 Ethics   We  will  follow  guidelines  from  (Nielson,  1993)  in  conducting  the  usability  tests.       Before  the  test  we  will:   • Have  everything  ready  before  the  user  shows  up.   • Emphasize  that  it  is  the  system  that  is  being  tested,  not  the  user.   • Acknowledge  that  the  software  is  new  and  untested,  and  may  have  problems.   • Let  users  know  that  they  can  stop  at  any  time.   • Explain  the  screen  recording  and  actions  of  the  observer  that  will  be  used.   • Tell  the  user  that  the  test  results  will  be  kept  completely  confidential.   • Make  sure  that  we  have  answered  all  of  the  user’s  questions  before  proceeding.   During  the  test  we  will:   • Try  to  give  the  user  an  early  success  experience.   • Hand  out  the  test  tasks  one  at  a  time.   • Keep  a  relaxed  atmosphere  in  the  test  room,  serve  tea/coffee  and  take  breaks.   • Avoid  disruptions:  Close  the  door  and  post  a  sign.  Disable  telephone.   • Never  indicate  in  any  way  that  the  user  is  making  mistakes  or  is  too  slow.   • Minimize  the  number  of  observers  at  the  test.   • Not  allow  the  user’s  management  to  observe  the  test.   • If  necessary,  stop  the  test  if  it  becomes  too  unpleasant.   After  the  test  we  will:   • • •

End by  stating  that  the  user  has  helped  us  find  areas  of  improvement.   Never  report  results  in  such  a  way  that  individual  users  can  be  identified.   Only  show  screen  recordings  outside  the  usability  group  with  the  user’s  permission.  

                           

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            22  


3.4 Number  of  Participants   There  will  be  two  distinct  rounds  of  testing  in  order  to  evaluate  the  tool.  The  first  round  will  result  in  Deliverable   6.2  in  month  25  and  can  be  characterized  as  a  formative  usability  test.  For  this  test  we  will  use  5  users  in  Dundalk   and   5   users   in   Genoa.   This   number   comes   from   research   that   shows   that   about   80%   of   usability   issues   will   be   observed  with  the  first  five  participants  (Lewis,  1994;  Nielsen  &  Laudauer,  1993;  Virzi,  1992).  As  shown  in  Figure  8   below   with   10   users   we   will   have   a   90%   chance   of   detecting   a   problem   that   affects   31%   of   users   and   a   65%   chance  of  detecting  a  problem  that  affects  10%  of  users.  We  will  also  use  the  two  groups  to  assess  any  differences   between  users  in  Dundalk  and  Genoa  and  any  differences  between  novice  and  expert  users.  

Figure  8:  Difference  in  sample  sizes  needed  to  have  an  85%  chance  of  detecting  a  problem  that  affects  10%  of  users  vs  32%   of  users,  (Source  http://www.measuringu.com/five-­‐users.php).  

For the   final   round   of   testing,   which   will   result   in   Deliverables   6.3   and   6.4   in   month   35   and   is   a   summative   assessment  of  the  tool,  we  will  use  9  users  in  Dundalk  and  9  users  in  Genoa.    This  will  give  us  an  85%  chance  of   detecting  a  problem  that  affects  10%  of  users  and  again  allow  analysis  of  Dundalk  and  Genoa  users  and  novice   and  expert  users.  

     

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            23  


3.5 Combining  Metrics  to  Give  Single  Usability  Score   In   order   to   combine   the   different   metrics   into   an   overall   usability   score   we   will   compare   each   data   point   to   a   usability  goal  and  present  one  single  metric  based  on  the  number  of  users  who  achieved  a  combined  set  of  goals.   These   goals   will   be   finalized   when   a   prototype   version   of   the  software   is   available.   Table   13   below   gives   a   sample   of  how  this  will  be  calculated.    For  this  table  the  goals  for  each  task  are  80%  task  completion,  average  time  on  task   of   less   then   410   seconds,   an   average   of   less   than   5   errors   per   task,   efficiency   of   75%,   a   SUS   score   above   66%   and   gut  feeling,  verbalizability,  effortlessness  and  magical  experience  scores  of  more  than  5.    With  this  sample  data   the  overall  usability  score  is  50%,  representing  the  fact  that  two  out  of  four  users  have  met  all  of  the  goals  for   each  task.   Table  13  Combined  Metrics  

Participa nt Number  

Task Completio n  

Tim Error e on   s   Tas k  

Efficienc y

SU S

Gut Feelin g  

Verbalisabili ty

Effortlessne ss

Magical Experienc e  

Goal Met ?  

1 2   3   4   Average    

85% 70%   90%   82%   81.75%  

300 250   400   450   350  

80 80   85   90   83.75  

80 77   60   66   71  

6 5   7   3   5.25  

6 5   6   4   5.25  

6 5   7   6   6  

6 5   6   6   5.75  

1 0   1   0   50%  

2 4   3   5   3.5  

 

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            24  


4 QUESTIONNAIRES   4.1  Introduction   We   will   use   three   questionnaires,   the   System   Usability   Scale,   Intuitive   Interaction   and   Microsoft   Desirability   Toolkit   to   gather   users’   personal   feelings   about   the   system   after   they   have   completed   the   usability   test.   These   questionnaires  will  result  in  a  mixture  of  quantitative  and  qualitative  scores  as  detailed  in  the  sections  below.  

4.2 System  Usability  Scale  (SUS)  applied  to  INDICATE   The   System   Usability   Scale   is   a   ‘quick   and   dirty’   method   that   allows   reliable,   low   cost   assessment   of   usability.   It   is   a  simple  10-­‐item  scale  giving  a  global  view  of  subjective  assessments  of  usability.  SUS  covers  a  variety  of  aspects   of   system   usability,   such   as   the   need   for   support,   training   and   complexity.   Within   INDICATE,   SUS   will   be   administered   to   participants   who   have   just   evaluated   the   INDICATE   system,   and   before   any   interview   or   debriefing  takes  place.  SUS  yields  a  single  number  representing  a  composite  measure  of  overall  usability  of  the   system  being  studied.  Scores  for  individual  items  are  meaningless  on  their  own.  SUS  scores  have  a  range  of  0  to   100,  where  100  indicates  a  more  usable  system.  The  questions  from  SUS  are  listed  below.   1. I  think  that  I  would  like  to  use  this  application  frequently.     Strongly  disagree              

Strongly agree  

4

2. I found  the  application  unnecessarily  complex.     Strongly  disagree            

Strongly agree  

4

               1    

               1    

2

2

3

3

5

5

3. I thought  the  application  was  easy  to  use.     Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

4. I think  that  I  would  need  the  support  of  a  technical  person  to  be  able  to  use  this  application.       Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

5.   I  found  the  various  functions  in  this  system  were  well  integrated   Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

  XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            25  


6.   I  thought  there  was  too  much  inconsistency  in  this  system   Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

7. I  would  imagine  that  most  people  could  learn  to  use  this  application  very  quickly.   Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

8. I  found  the  application  very  cumbersome  to  use.   Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

9. I  felt  confident  using  the  application.   Strongly  disagree    

Strongly agree  

2

3

4

               1    

5

10. I  needed  to  learn  a  lot  of  things  before  I  could  get  going  with  the  application.   Strongly  disagree      

               1    

Strongly agree  

2

3

4

5

In order  to  interpret  the  SUS  scores  we  will  average  the  scores  from  all  users  and  generate  confidence  intervals.   These   numbers   will   then   be   compared   to   the   data   in   Figure   9   (Tullis   and   Albert,   2008).   In   a   comprehensive   evaluation   of   50   studies   that   reported   average   SUS   scores   across   a   total   of   129   conditions   they   found   that   the   average  SUS  score  was  66%  with  a  median  of  69%.  The  25th  percentile  was  57%  and  the  75th  percentile  was  77%.   So  we  will  think  of  an  average  SUS  score  of  under  60%  as  poor  while  one  over  80%  will  be  considered  as  good.  

Frequency

Frequency Distribution of SUS Scores for 129 Conditions from 50 Studies 50 45 40 35 30 25 20 15 10 5 0 <=40

41-50

51-60

61-70

71-80

81-90

91-100

Average SUS Scores

Figure  9:  Average  SUS  Scores  (Source:  measuringuserexperience.com)   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            26  


4.3 Intuitiveness   The   following   details   of   the   INTUI   http://intuitiveinteraction.net/model/  

model

come

from

the

intuitive

interaction

website

“The INTUI  model  explores  the  phenomenon  of  intuitive  interaction  from  a  User  Experience  (UX)  perspective.  It   combines   insights   from   psychological   research   on   intuitive   decision   making   and   user   research   in   HCI   as   well   as   insights   from   interview   studies   into   subjective   feelings   related   to   intuitive   interaction.   This   phenomenological   approach   acknowledges   the   multi-­‐dimensional   nature   of   the   concept   and   also   reveals   important   influencing   factors  and  starting  points  for  design.  The  INTUI-­‐model  suggests  four  components  of  intuitive  interaction,  namely,   Gut  Feeling  (G),  Verbalizability  (V),  Effortlessness  (E)  and  Magical  Experience  (X).   Intuitive   interaction   is   typically   experienced   as   being   guided   by   feelings.   It   is   an   unconscious,   non-­‐analytical   process.   This   widely   parallels   what   we   know   from   research   in   psychology   about   intuitive   decision   making   in   general.   For   example,   (Hammond,   1996)   describes   intuition   as   a   "cognitive   process   that   somehow   produces   an   answer,   solution,   or   idea   without   the   use   of   a   conscious,   logically   defensible   step-­‐by-­‐step   process."   In   consequence,  the  result  of  this  process,  i.e.,  the  insight  gained  through  intuition  is  difficult  to  explain  and  cannot   be   justified   by   articulating   logical   steps   behind   the   judgment   process.   Despite   the   complex   mental   processes   underlying   intuitive   decisions,   the   decision   maker   is   not   aware   of   this   complexity,   and   the   process   of   decision   making  is  perceived  as  rather  vague,  uncontrolled  and  guided  by  feelings  rather  than  reason.  Intuition  is  simply   perceived   as   a   "gut   feeling".   This   also   became   visible   in   our   user   studies   and   peoples’   reports   on   intuitive   interaction  with  different  kinds  of  products.  Many  participants  based  their  judgment  on  a  product’s  intuitiveness   on  the  fact  that  they  used  it  without  conscious  thinking  and  just  followed  what  felt  right.   Users   may   not   be   able   to   verbalize   the   single   decisions   and   operating   steps   within   intuitive   interaction.   Researchers  in  the  field  of  intuitive  decision  making  discussed  different  mechanisms  that  could  also  be  relevant   for   users'   decisions   while   interacting   with   technology.   For   example,   (Wickens   et   al,   1998)   argue   that   this   is   because  intuitive  decisions  are  based  on  stored  memory  associations  rather  than  reasoning  per  se.  Another  factor   is   implicit   learning.   (Gigerenzer,   2013)   argues   that   especially   persons   with   high   experience   in   a   specific   subject   make   the   best   decisions   but,   nevertheless,   are   the   most   incapable   when   it   is   about   explaining   their   decisions.   They  apply  a  rule  but  are  unaware  of  the  rule  they  follow.  This  is  because  the  rule  was  never  learnt  explicitly  but   relies   on   implicit   learning   and   this   missing   insight   into   the   process   of   knowledge   acquisition   implies   that   it   is   hardly   memorable   or   verbalizable.   The   aspect   of   decision   making   without   explicit   information   also   becomes   visible   in   the   position   by   (Westcott,   1968),   stating   that   “intuition   can   be   said   to   occur   when   an   individual   reaches   a  conclusion  on  the  basis  of  less  explicit  information  that  is  ordinarily  required  to  reach  that  conclusion.”  Similarly,   (Vaughan,   1979)   describes   the   phenomenon   of   intuition   as   “knowing   without   being   able   to   explain   how   we   know”.   (Klein,   1998)   rather   sees   the   reasons   for   missing   verbalizability   of   intuitive   decisions   in   the   nature   of   human  decision  making  per  se.  He  claims  that  people  in  general  have  difficulties  with  observing  themselves  and   their   inner   processes   and,   thus,   obviously   have   troubles   with   explaining   the   basis   of   their   judgments   and   decisions.   Intuitive   interaction   typically   appears   as   quick   and   effortless.   In   our   studies,   many   users   emphasized   that   they   handled  the  product  without  any  strains.  Before  starting  conscious  thinking,  they  had  already  their  goal.  This  is   also  mirrored  in  the  descriptions  of  intuitive  decision  making  in  psychology.  For  example,  (Hogarth,  2001),  claims   that   “The   essence   of   intuition   or   intuitive   responses   is   that   they   are   reached   with   little   apparent   effort,   and   typically  without  conscious  awareness.”  In  general,  intuition  produce  quick  answers  and  tendencies  of  action,  it   allows   for   the   extraction   of   relevant   information   without   making   use   of   slower,   analytical   processes.   On   a   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            27  


neuronal basis,   the   quick   decision   process   may   be   explained   by   the   much   quicker   processing   of   unconscious   processing  (Baars,  1988;  Clark  et  al.,  1997).   Intuitive  interaction  is  often  experienced  as  magical.  In  our  studies  in  the  field  of  interactive  products,  this  was   reflected   by   enthusiastic   reactions   where   users   emphasized   that   the   interaction   was   something   "special",   "extraordinary",   "stunning",   "amazing",   "absolutely   surprising"   -­‐   or   even   "magical".   Research   in   the   field   of   intuitive   decision   making   reveals   a   number   of   mechanisms   that   may   add   to   this   impression.   First   of   all,   most   people  are  not  aware  of  the  cognitive  processes  and  their  prior  knowledge  underlying  intuition,  so  that  intuition   appears   to   be   a   supernatural   gift   (Cappon,   1994).   They   are   not   aware   that   they   acquired   that   knowledge   by   themselves  rather  than  receiving  it  by  magic  or  revelation.  And  even  if  one  knows  about  intuitive  processing  and   the   role   of   prior   knowledge,   it   is   still   not   directly   perceivable.   As   (Klein,   1998)   argues,   the   access   to   previously   stored  memories  usually  does  not  activate  single,  specific  elements  but  rather  refers  to  sets  of  similar  elements.   This   aggregated   form   of   knowledge   makes   one’s   own   contribution   to   intuition   hard   to   grasp,   and   people   possibly   become  not  aware  of  the  actual  source  of  their  intuition.  In  the  field  of  interactive  products,  the  experience  of   magical  interaction  may  further  be  supported  by  introducing  a  new  technology  or  interaction  concept,  so  far  not   applied  in  this  product  domain  (e.g.,  introducing  the  scroll  wheel  in  the  domain  of  mp3  players).”   The  questions  from  INTUI  are  listed  below  in  Figure  10.  

Figure  10:  INTUI  questionnaire   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            28  


The INTUI   survey   will   result   in   metrics   for   Gut   Feeling   (G),   Verbalizability   (V),   Effortlessness   (E)   and   Magical   Experience   (X).   For   each   metric   we   will   report   average   and   confidence   intervals.   We   don’t   have   similar   data   to   that  available  for  SUS  so  we  will  use  these  metrics  to  compare  different  iterations  of  the  INDICATE  tool  against   each  other  and  hope  for  an  increase  in  these  values  from  prototype  to  final  system.    

4.4 Microsoft  Desirability   Traditional   usability   testing   is   an   excellent   way   of   measuring   whether   users   can   complete   tasks   efficiently.   However,   it   has   been   less   successful   in   measuring   intangible   aspects   of   user   experience,   such   as   desirability   to   continue  to  use  a  product.  During  the  post-­‐evaluation  interview,  we  will  integrate  a  measure  based  on  Microsoft’s   Desirability   Toolkit   (Benedek   &   Miner,   accessed   2008),   whereby   users   will   be   presented   with   a   list   of   118   adjectives  (both  positive  and  negative),  presented  on  separate  cue  cards.  Participants  will  be  asked  to  choose  all   adjectives  they  feel  applied  to  their  usage  of  INDICATE.  The  evaluator  will  record  the  choices.  From  this  list,  the   participant   will   be   asked   to   choose   those   5   adjectives   that   most   closely   match   their   personal   reactions   to   the   system.  These  five  adjectives  will  then  be  used  by  the  evaluator  as  the  basis  for  a  guided  interview.  This  type  of   usability   measure   is   a   particularly   good   way   to   detect   usability   problems   as   it   can   potentially   uncover   user   reactions   and   opinions   that   might   not   come   to   light   with   solely   a   questionnaire.   Furthermore,   presenting   users   with   both   positive   and   negative   adjectives   encourages   critical   responses,   which   is   important   when   uncovering   usability   problems.   This   method   of   data   capture   will   be   embedded   as   a   20   minute   workshop   session   after   task   completion.   This   technique   results   in   qualitative   data.   The   most   important   data   from   the   tool   comes   from   the   discussion   with   the   participant   to   determine   their   reaction   to   an   item   and   how   they   apply   that   item   to   the   product   being   evaluated.  

4.5 INDICATE  Survey  Management  and  Processing  Application   In   order   to   administer   the   tasks   and   questionnaires   we   have   developed   a   survey   management   and   processing   framework.   This   is   a   Ruby   on   Rails   web   application   framework   using   a   MySql   database   and   HTML5   compatible   user  interface.  An  overview  of  the  application  architecture  is  given  in  Figure  11  below.    

Surveys

MySql DB  

Schedules

Question  List  

Answers

Question

Users

Figure 11:  Survey  Management  and  Processing  Application  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            29  


A survey   consists   of   a   set   of   scheduled   questions   that   are   created   and   managed   using   a   web-­‐based   interface   which  stores  entries  in  a  MySQL  relational  database.  Users  are  added  to  the  survey  and  are  asked  questions  at   intervals  determined  by  the  schedule.       As  survey  questions  are  answered  the  results  are  stored  in  the  application  database  and  are  processed  to  produce   final   scores   according   to   standard   questionnaire   rules   and   can   be   visualised   using   the   web   interface.   For   example   for  the  SUS  questionnaire  we  first  sum  the  score  contributions  for  each  item.  Each  item’s  score  contribution  will   range  from  0  to  4.  For  items  1,  3,  5,  7,  and  9,  the  score  contribution  is  the  scale  position  minus  one.  For  items  2,  4,   6,   8,   and   10,   the   contribution   is   5   minus   the   scale   position.   We   then   multiply   the   sum   of   the   scores   by   2.5   to   obtain  the  overall  SUS  score.       Innovative  features  of  this  application  are  instant  evaluation  of  survey  results,  removal  of  transcription  errors  and   Inclusion  of  task  timing  into  survey  results.  The  schema  for  the  database  is  shown  in  Figure  12  below.  

Figure 12:  Database  schema  for  Survey  Management  App  

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            30  


5 INTERVIEWS   5.1  Introduction   When  each  participant  has  completed  the  usability  test  and  questionnaires,  we  will  hold  a  short  semi-­‐structured   interview  to  gauge  their  experience  of  using  the  INDICATE  tool.  We  will  take  a  semi-­‐structured  approach  so  that   the  level  of  questioning  can  probe  the  user  on  more  interesting  issues  as  they  arise.  We  will  follow  a  top-­‐down   approach,  starting  with  a  general  question  regarding  the  overall  task  and  progressing  to  more  leading  questions   to   encourage   the   user   to   elaborate   on   their   responses.   Interviews   are   a   good   evaluation   method   for   eliciting   user   preferences  and  attitudes.  They  may  also  reveal  problems  that  were  not  observed  during  task  completion.     The  INDICATE  evaluation  questions  will  be  planned  in  advance,  with  a  series  of  questions  centred  on  an  overall   evaluation   question.   Interviews   are   not   a   controlled   experimental   technique.   We   will   ensure,   however,   that   interviews  between  different  participants  are  consistent  as  much  as  possible   -­‐  the  evaluator  may  choose  to  adapt   the  questions  for  different  participants  to  get  the  most  benefit.    

5.2 Interview  Data  Analysis   Each  INDICATE  participant  interview  will  be  audio  recorded  and  transcribed  verbatim.  The  interviews  will  produce   qualitative  data  for  analysis.  We  will  perform  thematic  analysis  on  the  data,  using  a  grounded  theory  approach.   We  will  use  NVivo  to  manage  the  data.     The  general  approach  to  the  analysis  of  qualitative  data  involves  four  stages:   1. 2. 3. 4.

Collect the  data,  organise  and  prepare  it  for  analysis.   Code  or  ‘describe’  the  data,     Classify  or  categorise  the  data  into  themes,   Look  for  connections  in  the  data,  interpret  and  providing  explanation  or  meaning.  

Qualitative   data   analysis   is   more   susceptible   to   bias   than   quantitative   data,   as   people   perform   the   coding.   To   control   the   impact   of   individual   researcher   interpretation,   we   will   employ   a   commonly   used   coding   technique   (emergent   coding),   have   two   researchers   (experienced   in   thematic   analysis   and   coding)   perform   coding   on   transcripts  and  employ  statistical  methods  to  evaluate  the  validity  and  reliability.  This  approach  is  recommended   and   outlined   in   the   textbook   “Research   Methods   in   Human-­‐Computer   Interaction   –   Chapter   11   Analyzing   Qualitative  Data”  and  discussed  in  more  detail  below.  

5.2.1 Analysing  Text  Content   Coding  is  the  term  used  to  denote  analysis  of  text-­‐based  content.  Coding  involves  “interacting  with  data,  making   comparisons   between   data   and   in   doing   so,   deriving   concepts   to   stand   for   those   data   ,   then   developing   those   concepts   in   terms   of   their   properties   and   dimensions”   (Corbin   and   Strauss,   2008;   p.66)   We   will   employ   emergent   coding  in  the  analysis  of  our  data.  Two  researchers  will  independently  examine  a  subset  of  the  text-­‐based  data   from   the   interview   transcripts   (specifically,   one   interview   transcript),   and   each   will   develop   a   list   of   coding   categories   based   on   their   own   interpretation   of   the   data.   Both   researchers   will   then   compare   their   lists,   examine   and  discuss  the  differences  and  then  decide  on  a  list  that  both  agree  on.  Next,  the  codes  of  both  coders  will  be   compared   and   reliability   measures   computed.   If   a   high   reliability   score   is   achieved,   both   researchers   can   move   onto   coding   the   entire   data   set.   Otherwise,   the   above   process   needs   to   be   repeated   until   a   satisfactory   reliability   score  is  achieved.   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            31  


The next   step   is   to   identify   concepts   or   categories   from   the   codes.   We   will   use   a   mixed-­‐methods   approach   to   identifying  coding  categories,  including     Examining  existing  theoretical  frameworks.  A  number  of  taxonomies  have  been  created  in  the  HCI  field  to   help  understand  data  from  usability  studies,  for  example  Norman’s  taxonomy  of  mistakes  and  slips  that   includes  categories  such  as  ‘description  errors’,  ‘data-­‐driven  errors’,  ‘mode  errors’  etc.  (2002).     • Researcher-­‐denoted  concepts  (new  concepts  that  arise  that  might  not  be  covered  in  existing  taxonomies   –  for  example,  concepts  that  might  be  specific  to  INDICATE).   We  will  build  a  code  structure  –  a  hierarchy  of  concepts  with  each  level  representing  more  detail.  This  will  support   comparison  of  the  data.  Comparisons  will  be  made  within  each  coding  category,  between  different  participants   (e.g.  experts  vs  novices)  and  with  the  literature.     •

5.2.2 Ensuring  validity  and  reliability   Given   the   possibility   of   researcher   bias   in   performing   thematic   analysis   of   interview   data,   we   must   ensure   the   analysis   is   valid   and   reliable.   Validity   refers   to   using   well-­‐documented   procedures   to   increase   the   accuracy   of   results.   We   will   follow   the   procedure   outlined   in   Lazar   et   al.,   2012.   Reliability   refers   to   consistency.   If   two   researchers  independently  coding  the  data  come  to  the  same  conclusions,  then  the  analysis  is  considered  reliable.   This  can  be  measured  by  calculating  the  percentage  of  agreement  between  coders.  

5.2.3 Interview  analysis  report   The   output   of   this   piece   of   evaluation   will   be   a   report   outlining   the   categories   of   findings,   a   description   of   the   relationships  amongst  the  data  and  a  reliability  measure.    

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            32  


6 BENCHMARKING   6.1  Introduction   In  this  chapter  we  will  detail  the  benchmarking  of  the  INDICATE  tool.  We  will  approach  this  in  two   ways.  First  we   will  use  real  time  energy  usage  data  gathered  in  the  Living  Lab  at  DKIT  to  assess  how  accurate  the  predictions  of   the   INDICATE   tool   are.   Secondly,   while   conducting   the   interview   with   the   users   we   will   ask   them   how   the   INDICATE  tool  compares  to  existing  tools  that  they  already  use.  

6.2 Predicted  versus  Real   The   first   stage   in   benchmarking   the   INDICATE   tool   will   be   to   compare   its   predictions   for   energy   consumption   against  real  time  energy  usage  data  gathered  in  the  Living  Lab  at  DKIT.       The   Energy   Monitoring   in   the   Living   Lab   based   in   DKIT   in   Dundalk   consists   of   real   time   montioring   of   16   apartments  in  Great  Northern  Haven,  10  council  houses  in  Muirhevena  Mór,  a  local  school,  O’Fiaich  College,  and   DKIT   campus.   All   of   these   sources,   other   than   DKIT   campus,   have   monitoring   installed   which   allows   real   time   energy  usage  data  to  be  sent  to  a  cloud  based  data  aggregation  service  which  gathers  the  data  and  stores  it  in  a   database   in   DKIT.   The   data   in   this   database   is   then   processed   automatically   to   extract   daily   and   hourly   energy   usage  information.      

!

!

Figure 13:  Monitoring  kit  in  Muirhevena  Mór  homes  showing  clockwise  from  top  left   Gas  meter,  temperature  and  humidity  sensor,  coms  kit  and  electricity  sensors  

!

Figure   13   above   shows   some   of   the   hardware   used   to   collect   energy   usage   data,   including   gas,   electricity,   temperature  and  humidity  at  one  of  the  test  sites.     All  of  the  above  systems  other  than  DKIT  campus  send  real  time  energy  usage  data  to  a  cloud-­‐based  energy  data   aggregator   (Figure   14).   Data   from   DKIT   campus   is   currently   manually   entered   into   the   system   but   this   is   something   that   we   are   actively   looking   to   automate.   This   is   essentially   a   database   where   the   usage   data   is   stored   along  with  a  number  of  scripts  that  process  the  data  and  extract  hourly  and  daily  usage  data.  Again  this  processed   data  is  stored  in  a  database.     XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            33  


Figure 14:  Community  Energy  Data  Store  

This  data  can  be  graphed  and  presented  as  in  Figure  15  below.  We  will  use  this  data  to  test  the  predictions  and   algorithms  developed  in  the  INDICATE  tool.    

Figure 15:  O’Fiaich  College  solar  PV  production  for  2012  

6.3 INDICATE  vs  OTHER  TOOLS   We  will  benchmark  the  INDICATE  tool  against  other  tools  by  asking  the  users  at  interview  about  their  experiences   of  using  both  and  which  they  would  prefer  for  specific  tasks.    We  will  ask  them  to  compare  the  tools  along  a  range   of  metrics  including  speed,  user  interface,  data  requirements  and  portability.    

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            34  


7 ORGANIZATION  OF  EVALUATION  AND  BENCHMARKING  ACTIVITIES        

Pilot of  All  Tasks,  Questionnaires   and  Interviews  with  Prototype   GUI    

D5.1 due  in  month  24  will  deliver  the   prototype  GUI,  D4.2  due  in  month  19   will  deliver  the  DSM  model  and  D4.3   due  in  month  24  will  d eliver  the  first   prototype  of  the  VCM  

Heuristic  Evaluation  of   Prototype  GUI  

         

Formative assessment  of  tool   using  user  testing,   questionnaires  and  interview  

     

Summative Assessment  of  Final   INDICATE  tool  

The Heuristic  evaluation  and   formative  assessment  will  result  in   D6.2  INDICATE  Functional  Testing,   Usability  and  Performance   evaluation  d ue  in  month  25.  This  will   give  a  list  of  improvements  n eeded   for  the  final  tool   D5.2  due  in  month  34  will  deliver   the  final  GUI,  D4.4  due  in  month   33  the  final  VCM,  D3.4  due  in   month  27  will  deliver  the  CCI,  D3.3   due  in  month  33  will  d eliver  the   Sustainable  Urban  Indicators   The  summative  assessment  will   result  in  D6.3  and  D6.4  the  final   evaluations  of  the  tool  due  in   month  35  

         

     

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            35  


8 CONCLUSIONS   In  this  document  we  have  presented  a  pragmatic  evaluation  methodology  and  benchmarking  framework  for  the   INDICATE  tool.  At  the  heart  of  the  evaluation  and  benchmarking  are  the  users  and  their  expectations  of  the  tool.   Users   will   be   classified   based   on   their   experience   using   GIS   tools   as   well   as   their   domain   expertise.   This   will   allow   the   identification   of   groups   within   the   user   base   and   whether   the   INDICATE   tool   is   more   or   less   useful   for   different  user  groups.   We  have  identified  the  metrics  that  will  be  used  in  the  evaluation  and  how  these  will  be  gathered  and  analysed.  A   heuristic   evaluation   will   be   carried   out   by   an   expert,   in   order   to   catch   problems   with   the   interface   before   it   is   presented  to  users.  The  results  of  the  heuristic  evaluation  will  be  fed  back  to  the  developers  so  that  the  prototype   tool  can  correct  problems  highlighted.  Then  we  will  carry  out  a  formative  user  test  in  order  to  further  assess  the   tool.  We  will  use  real  world  tasks,  based  on  users’  expectations  and  aim  for  a  broad  task  base  that  captures  the   full  functionality  of  the  tool.  Then  by  assessing  these  tasks  with  a  limited  number  of  users  we  hope  to  get  very   good  quality  feedback  with  a  manageable  workload  for  those  carrying  out  the  assessments.     We   will   present   the   metrics   both   individually   and   also   combine   all   of   the   metrics   together   to   create   an   overall   usability   measure   for   the   project.   This   will   allow   developers   to   see   in   detail   where   the   problems   are   with   the   project  but  will  also  give  project  managers  an  overall  metric  for  measuring  the  progress  of  the  project.   In   the   assessment   of   the   final   tool   we   will   use   more   test   users   to   help   us   find   less   common   problems   with   the   tool.   As   the   tool   progresses   from   prototype   to   final   version   we   will   need   more   users   to   find   subtle   issues   that   remain.   We  have  developed  a  custom  survey  tool  that  can  be  used  to  administer  all  of  the  tasks  and  questionnaires  in  the   user  evaluation.  This  tool  allows  the  customization  of  the  questions  that  are  asked  and  stores  the  results  directly   into   a   database.   We   can   further   develop   the   tool   to   automate   the   processing   of   the   results   of   the   qualitative   survey  data.   This   is   still   a   working   document   and   once   a   prototype   of   the   software   is   available   we   will   refine   the   tasks   and   goals  for  those  tasks.  

 

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            36  


REFERENCES Baars,  B.  J.  (1988).  A  Cognitive  Theory  of  Consciousness.  Cambridge:  Cambridge  University  Press.   Benedek,   J.   and   Miner,   T.   (accessed   2008)   Measuring   Desirability:   New   Methods   for   Evaluating   Desirability   in   a   Usability  Lab  Setting.    Available  at:  www.microsoft.com/usability/uepostings/desirabilitytoolkit.doc   Bias,  R.  (1994)  The  Pluralistic  Usability  Walkthrough:  Coordinated  Empathies.  In  Usability  Inspection  Methods,  J.   Nielsen  and  R.  Mack  Eds,  Wiley,  63-­‐76.   Cappon,  D.  (1994).  A  new  Approach  to  Intuition.  Omni,  16(1),  34-­‐38.   Clark,  A.,  &  Boden,  M.  A.  (1997).  Being  there:  putting  brain,  body,  and  world  together  again.  Cambridge,  MA:  MIT   Press.   Corbin,  J.  and  Strass,  A.  (2008)  Basics  of  qualitative  research,  3rd  edition,  Los  Angeles,  California,  Sage   publications.   Gigerenzer,  G.  (2013).  Interview.  HaysWorld  Magazine,  1/2013.   Hammond,  K.  R.  (1996).  Human  judgment  and  social  policy:  Irreducible  uncertainty,  inevitable  error,  unavoidable   injustice.  New  York,  USA:  Oxford  University  Press.   Hartson,   H.R.,   Andre,   T.S.   and   Williges,   R.C.   (2003)   Criteria   for   Evaluating   Usability   Evaluation   Methods.   In  International  Journal  of  Human-­‐Computer  Interaction,  15,  1,  145-­‐181.   Hogarth,  R.  M.  (2001).  Educating  intuition.  Chicago:  University  of  Chicago  Press.   Klein,  G.  (1998).  Sources  of  Power:  How  People  Make  Decisions.  Cambridge,  MA:  MIT  Press.   Lazar,   J.,   Feng,   J.H.   and   Hoheiser,   H.   Research   Methods   in   Human   Computer   Interaction,   Wiley   and   Sons   Ltd.   (2012).   Lewis,  J.  R.  (1994).  Sample  sizes  for  usability  studies:  Additional  considerations.  Human  Factors,  36,  368-­‐378   Liljegren,  E.  (2006)  Usability  in  a  Medical  Technology  Context  Assessment  of  Methods  for  Usability  Evaluation  of   Medical  Equipment.  In  International  Journal  of  Industrial  Ergonomics,  36,  4,  345-­‐352.   Nielsen,   J.   (2000)   Why   you   only   need   In  Alertbox,  http://www.useit.com/alertbox/20000319.html  

to

test

with

5

users.

Nielsen, J.   (1994)   Heuristic   evaluation.   In   Nielsen,   J.   and   Mack,   R.L.   (Eds.),   Usability   Inspection   Methods,   John   Wiley  and  Sons,  New  York,  NY.   Nielsen,  J.  (1994b)  Heuristic  Evaluation.  In  Usability  Inspection  Methods,  J.  Nielsen  and  R.  Mack  Eds,  Wiley,  25-­‐62.   Nielsen,   J.   (1993).  Usability   Engineering.   Academic   Press,   Boston,   ISBN   0-­‐12-­‐518405-­‐0   (hardcover),   0-­‐12-­‐518406-­‐9   (softcover)   Nielsen,  J.,  &  Landauer,  T.  K.  (1993).  A  mathematical  model  of  the  finding  of  usability  problems.  In  Proceedings  of   the  SIGCHI  conference  on  Human  factors  in  computing  systems  (pp.206-­‐213).    Amsterdam:  ACM.   XX/XX/XXXX                                                                                                                                          Grant  No.  608775                                                                                                                                                                            37  


Nielsen, J.  and  Mack,  R.L.  (1994)  Usability  Inspection  Methods,  John  Wiley  and  Sons,  New  York,  NY.   Norman,  D.  (2002)  The  design  of  everyday  things.  New  York,  Basic  Books.   Schneiderman,   B.,   Plaisant,   C.   (2005)   Designing   the   User   Interface,   4th   edition   -­‐   Strategies   for   Effective   Human   Computer  Interaction,  Addison-­‐Wesley.     Spool,   J.   and   Schroeder,   W.   (2001)   Testing   Websites:   Five   Users   is   Nowhere   Near   Enough.   In  CHI   ’01   Extended   Abstracts,  ACM,  285-­‐286   Travis,   D.   (2009)   How   to   Prioritise   http://www.userfocus.co.uk/articles/prioritise.html  

usability

problems.

Available

at

Tullis, T.   and   Albert,   B.   (2008)   Measuring   the   User   Experience.   Morgan   Kaufmann   Series   in   Interactive   Technologies.     Vaughan,  F.  E.  (1979).  Awakening  intuition.  Garden  City,  USA:  Anchor  Press.   Virzi,  R.  A.  (1992).  Refining  the  test  phase  of  usability  evaluation:  How  many  subjects  is  enough?    Human  Factors,   34,  457-­‐471.   Westcott,  M.  R.  (1968).  Toward  a  contemporary  psychology  of  intuition:  a  historical,  theoretical,  and  empirical   inquiry.  New  York,  USA:  Holt,  Rinehart  and  Winston.   Wharton,   C.,   Bradford,   J.,   Jeffries,   R.   &   Franzke,   M.   (1992)   Applying   Cognitive   Walkthroughs   to   more   Complex   User  Interfaces:  Experience,  Issues  and  Recommendations.  In  CHI  ’92,  ACM  press,  381-­‐388.   Wickens,  C.D.,  Gordon,  S.E.,  &  Liu,  Y.  (1998).  An  Introduction  to  Human  Factors  Engineering.  New  York,  USA:   Addison-­‐Wesley  Educational  Publishers  Inc.   Woolrych,  A.  and  Cockton,  G.  (2001)  Why  and  When  Five  Test  Users  aren’t  Enough.  In  IHM-­‐HCI,  105-­‐108.      

XX/XX/XXXX                                                                                                                                        Grant  No.  608775                                                                                                                                                                            38  

Evaluation Methodology and Benchmarking Framework  

This document presents the intended evaluation methodology and benchmarking framework that will be used to evaluate the software tools pr...

Read more
Read more
Similar to
Popular now
Just for you