Polish-British Workshops Computer Systems Engineering Theory & Applications by Paweł K

POLISH-BRITISH WORKSHOPS

COMPUTER SYSTEMS ENGINEERING THEORY & APPLICATIONS

Editors: Keith J. BURNHAM Leszek KOSZALKA Radoslaw RUDEK Piotr SKWORCOW Organised jointly by: Control Theory and Applications Centre, Coventry University, UK Department of Systems and Computer Networks, Wroclaw University of Technology, Wroclaw, Poland with support from The IET Control and Automation Professional Network The Institute of Measurement and Control

Reviewers: Keith J. BURNHAM Grzegorz CHMAJ Andrzej KASPRZAK Leszek KOSZALKA Tomasz LARKOWSKI Iwona POZNIAK-KOSZALKA Radoslaw RUDEK Przemyslaw RYBA Henry SELVARAJ Dragan SIMIC Piotr SKWORCOW Ventzeslav VALEV Krzysztof WALKOWIAK Michał WOŹNIAK

Cover page designer Aleksandra de’Ville

Typesetting: Camera-ready by authors Printed by: Drukarnia Oficyny Wydawniczej Politechniki Wrocławskiej, Wrocław 2014 Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland

ISBN 978-83-933924-0-7

POLISH-BRITISH WORKSHOP was held in Szklarska Poreba, Poland, June 2010 Jugow, Poland, June 2011 Zlotniki Lubanskie, Poland, May 2012 and Srebrna Gora, Poland, June 2013 International Steering Committee 2010 Keith J. Burnham Iwona Pozniak-Koszalka Leszek Koszalka Andrzej Kasprzak Tomasz Larkowski Pawel Podsiadlo Radu-Emil Precup Radoslaw Rudek Henry Selvaraj Piotr Skworcow Gwidon Stachowiak Ventzeslav Valev Dawid Zydek

2011 Keith J. Burnham Henry Selvaraj Iwona Pozniak-Koszalka Leszek Koszalka Emilio Corchado Karol Gega Andrzej Kasprzak Tomasz Larkowski Pawel Podsiadlo Radoslaw Rudek Piotr Skworcow Gwidon Stachowiak Ventzeslav Valev Michal Wozniak Dawid Zydek 2012 2013 Keith J. Burnham Keith J. Burnham Leszek Koszalka Leszek Koszalka Iwona Pozniak-Koszalka Iwona Pozniak-Koszalka Henry Selvaraj Henry Selvaraj Grzegorz Chmaj Grzegorz Chmaj Emilio Corchado Emilio Corchado Karol Gega Karol Gega Andrzej Kasprzak Andrzej Kasprzak Tomasz Larkowski Tomasz Larkowski Pawel Podsiadlo Lars Ludnberg Radoslaw Rudek Mariusz Nowostawski Dragan Simic Pawel Podsiadlo Piotr Skworcow Radoslaw Rudek Gwidon Stachowiak Przemyslaw Ryba Ventzeslav Valev Dragan Simic Michal Wozniak Piotr Skworcow Marina Yashina Vaclav Snasel

Jan Zarzycki Dawid Zydek

Gwidon Stachowiak Ventzeslav Valev Krzysztof Walkowiak Michal Wozniak Marina Yashina Jan Zarzycki Dawid Zydek

Local Organizing Committee 2010 Wojciech KMIECIK Tomasz KUCOFAJ Bartosz DOLATA Michal PLESZKUN

2011 Michal HANS Wojciech KMIECIK Miroslawa GOZDOWSKA Tomasz KUCOFAJ

2012 Wojciech KMIECIK Piotr FRANZ Tomasz KUCOFAJ Miroslawa GOZDOWSKA

2013 Wojciech KMIECIK Tomasz KUCOFAJ Roza GOSCIEN Miroslawa GOZDOWSKA Michal AIBIN Piotr CAL Lukasz GADEK

Conference Proceedings Editors Keith J. BURNHAM – editor Leszek KOSZALKA – editor Piotr SKWORCOW – editor Radoslaw RUDEK – editor Iwona POZNIAK-KOSZALKA – co-editor

IET Control & Automation Professional Network The IET Control and Automation Professional Network is a global network run by, and on behalf of, professionals in the control and automation sector with the specialist and technical backup of the IET. Primarily concerned with the design, implementation, construction, development, analysis and understanding of control and automation systems, the network provides a resource for everyone involved in this area and facilitates the exchange of information on a global scale. This is achieved by undertaking a range of activities including a website with a range of services such as an events calendar and searchable online library, faceto-face networking at events and working relationships with other organisations. For more information on the network and how to join visit http://www.theiet.org/.

Preface It is with great pleasure that we as Editors write the preface of the Proceedings for the tenth, eleventh, twelfth and thirteenth Polish-British Workshops and the first International Students Workshop on Computer Systems Engineering: Theory and Applications. The Polish-British Workshops have been organized jointly by the Department of Systems and Computer Networks, Wroclaw University of Technology, Wroclaw, Poland and the Control Theory and Applications Centre, Coventry University, Coventry, UK since the year 2001 and have now become a traditional and integral part of the long-lasting collaboration between Wroclaw University of Technology and Coventry University, with the Workshops taking place every year since 2001. However, we have witnessed a steady growth of the Workshops, both in terms of participant numbers and the diversity of represented institutions from all over the World. This was reflected by extending the name of the Workshops from the 2013 edition onwards to International Students Workshop. The Workshops bring together young researchers from different backgrounds and at different stages of their career, including undergraduate and MSc students, PhD students and post-doctoral researchers. It is a truly fantastic and quite unique opportunity for early-stage researchers to share their ideas and learn from the experience of others, to become inspired by the work carried out by their elder colleagues and to receive valuable feedback concerning their work from accomplished researchers, all in a pleasant and friendly environment surrounded by the picturesque mountains of Lower Silesia. The Workshops covered by this book took place in Szklarska Poreba (2010), Jugow (2011), Zlotniki Lubanskie (2012) and Srebrna Gora (2013) and as usually, the theme was focused on solving complex scientific and engineering problems in a wide area encompassing computer science, control engineering, information and communication technologies and operational research. Number of papers were presented by young researchers and engineers, however only the best papers were chosen to be published in this book. The problems addressed and the solutions proposed in the papers presented at the Workshops and included in the Proceedings are closely linked to the issues currently faced by society, such as efficient utilisation of energy and resources,

design and operation of communication networks, modelling and control of complex dynamical systems and handling the complexity of information. We hope that these Proceedings will be of value to those researching in the relevant areas and that the material will inspire prospective researchers to become interested in seeking solutions to complex scientific and engineering problems. We realise that none of this would be possible without the continued efforts and commitment of the Polish-British Workshop founders: Dr Iwona PoźniakKoszałka, Dr Leszek Koszałka and Prof. Keith J. Burnham. On behalf of all researchers who have attended the Polish-British Workshop series, including ourselves, we would like to express our sincere gratitude for making the Workshop series such a tremendous success, for sharing with others their extensive knowledge and experience, and for providing valuable guidance to young researchers at this crucial stage of their careers.

Dr Piotr Skworcow, Water Software Systems, De Montfort University, Leicester, UK and Dr Radosław Rudek, Department of Information Technology, Wrocław University of Economics, Poland Editors of the Proceedings and Members of the International Steering Committees for the Polish-British Workshops 2010, 2011, 2012 and 2013, and the International Students Workshop 2013.

Contents Rafał SEKUŁA

SELF-ADAPTIVE EVOLUTIONARY ALGORITHM IN NETWORK DESIGN PROBLEM

Jacek BORKOWSKI Andrzej KASPRZAK

ALGORITHMS FOR BASE STATION LOCATION IN WIRELESS NETWORKS WITH COST MINIMIZATION

Michał HALACZKIEWICZ Piotr FRANZ Leszek KOSZAŁKA

OPTIMIZED FIRST FIT ALGORITHM COMBINED WITH SIMPLE SCHEDULING FOR SOLVING STATIC ALLOCATION PROBLEM IN MESH NETWORKS

Daniel PALUSZCZYSZYN Piotr SKWORCOW Bogumil ULANICK

IMPROVING NUMERICAL EFFICIENCY OF WATER NETWORK MODELS REDUCTION ALGORITHM

Grzegorz BOROWIEC Aleksandra POSTAWKA Leszek KOSZAŁKA

STATIC TASK ALLOCATION ALGORITHMS AND INFLUENCE OF ARCHITECTURE ON MESH STRUCTURED NETWORKS

Łukasz GĄDEK Keith J. BURNHAM Iwona POŹNIAK-KOSZAŁKA

CONTROLLING A NON-LINEAR SYSTEM BILINEAR APPROACH VS INNOVATIVE HEURISTIC LINEARISATION METHOD

Róża GOŚCIEŃ

NEW APROACHES FOR CFA AND TCFA PROBLEMS IN WAN

Małgorzata MICHAŁUSZKO Dawid POLAK Piotr RATAJCZAK Iwona POŹNIAK-KOSZAŁKA

LINE PLANNING PROBLEM – SOLVING USING EVOLUTIONARY APPROACH

106

Anna STRZELECKA Piotr SKWORCOW Bogumil ULANICKI

MODELLING OF UTILITY-SERVICE PROVISION FOR SUSTAINABLE COMMUNITIES

116

Grzegorz CHMAJ

HEURISTIC ALGORITHM FOR CLIENT-SERVER DISTRIBUTED COMPUTING

129

Mariusz HUDZIAK

COMPARING DIFFERENT APPROACHES OF FINDING AN OPTIMAL PATH IN CONTINUOUS AREAS WITH OBSTACLES

137

Justyna KULIŃSKA

COMPARING EFFECTIVENESS OF CLASSIFIERS IN PROBLEM OF BODY GESTURE RECOGNITION

147

Aleksandra OŚWIECIŃSKA

SIGN ORIENTATION PROBLEM IN TRAFFIC SIGN RECOGNITION SYSTEM

167

Roberto Vega RUIZ Hector QUINTIAN Vicente VERA Emilio CORCHADO

INTELLIGENT MULTIPLATFORM APP FOR AN INDUSTRIAL PROCESS OPTIMIZATION

175

Computer Systems Engineering 2010

Keywords: capacity and flow allocation; cost problem; evolutionary computation;

Rafał SEKUŁA*

SELF-ADAPTIVE EVOLUTIONARY ALGORITHM IN NETWORK DESIGN PROBLEM

The paper concerns the capacity and flow assignment problem for obtaining widest bottleneck. Self-adaptive evolutionary algorithm was used to achieve the goal. Algorithm is two-level – firstly flow assignment problem is solved then basing on its solution – capacity assignment. The objective of the paper was to find dependency between network build total cost and highest bottleneck value. Results were obtained using experimentation system implemented in Java environment and special package for evolutionary computation – JGAP.

1. INTRODUCTION Capacity and flow assignment (or network design) problem is probably the most common problem in teleinformatics. Computer networks' sizes are increasing and increasing [1]. Natural course of events are increasing costs of these networks. There is a need for more complex algorithms which can deal with optimizing of large networks. Classical methods like linear or nonlinear programming are not sufficient because they are time consuming for networks with more than 20 nodes [2]. In order to deal with this situation metaheuristics [3] like evolutionary algorithms can be applied. The aim of this paper is to find how change of total cost affects throughput of network [4], [5]. It is important particularly to people who are investing in extension of existing networks. Theoretical considerations can be confirmed by results of experiments. This relation seems to be polynomial with degree close to average number of neighbours. To obtain results some kind of evolutionary computation is used. Algorithm consists of two-level evolutionary algorithms using self-adaptation. This means that parameters of algorithms are also obtained in artificial selection __________ *

Department of Systems and Computer Networks, Wrocław University of Technology, Poland.

manner â&#x20AC;&#x201C; they depend on fitness of previous solutions. Two-level architecture forces to use parallel computation. Each of individual is related to process solving subproblem. Realizing investigations, original problem formulation was developed and Java application was built. Paper is organized in following order. Section II is a statement of used terms and formulation of problem. In Section III evolutionary algorithm used in experiments are described. Section IV contains specification of experimentation system. In Section V results of some investigations appears. Finally, Section VI contains conclusion. 2. PROBLEM FORMULATION Before problem formulation, explanation of several terms is necessary to accurate understanding of mathematical formulas. Demand is a volume of unicast traffic which is transported from source node to destination nodes. Full description of demand includes source node s, destination node d and volume of demand hd. Path is a sequence of nodes such that from each of its nodes there is an edge to the next node in the sequence. We consider paths without loops. Terminal nodes are pair of start and end nodes â&#x20AC;&#x201C; they correspond to source and destination nodes. For each demand exists a set of paths which connect source and destination. Total cost is an amount of money to dispose for building a network. Capacity of edge is a maximal sum of all flows assigned to particular edge. Bottleneck of edge is a difference between capacity and sum of allocated flows. Widest bottleneck is the highest value of bottleneck in network built for given total cost. We consider our problem as a two-level optimization problem. Problem was divided into capacity assignment problem and flow allocation problem [2]. Problem formulation is original and uses notation from [2]. Flow of information between modules is shown in Fig.1.

Fig. 1. Data flow diagram

Capacity allocation problem (widest bottleneck) formulation indices d = 1,2,… , D demands

p = 1,2, … , Pd candidate paths for demand d

e = 1,2,… , E network links constants hd volume of unicast demand d

ξ e cost of unit link capacity for link e

b network total cost variables y e capacity of link e objective W = max B y

(1)

subject to B = s ( y, h, p ) y e ∈ ℜ

(2)

∑ξ

(3)

ye ≤ b

Flow allocation problem (network bottleneck) formulation indices d = 1,2,… , D demands p = 1,2, … , Pd candidate paths for demand d

e = 1,2,… , E network links constants δedp = 1, if link e belongs to path p realizing demand d; 0 otherwise

hd volume of unicast demand d ce capacity of link e variables xdp variable corresponding to the flow allocated to path p for demand d objective

  B = mein  c e − ∑∑ δedp x dp hd  d p   subject to

∑ x = 1 p = 1,2, … , P ∑∑ δ x h ≤ c e = 1,2,… , E dp

(4)

(5)

edp

(6)

3. EVOLUTIONARY ALGORITHM Problem is solved using self-adapting evolutionary algorithm. There are some terms related to evolutionary computation and evolution in general that need to be explained. Evolutionary algorithm can described in general as it is shown in Fig.2. Evolution procedure varies depending on algorithm class. Individual is a single organism (natural processes) or object (artificial processes). In evolutionary computation individuals are vectors (called chromosomes) which contain genetic information as real values [6], [7]. Population is a set of individuals. Knowing meaning of these definitions we can introduce evolutionary algorithm procedures as it is shown in Fig.3.

Fig. 2. Evolution diagram

Selection is a process of rejecting individuals which do not fit to the environment. Reproduction is a process of generating new individuals. Evaluation is a computation of fitness function of particular individuals to ensure that they fit to environment. Procedure of evaluation is shown in Fig.4. Roulette selection (proportional selection) is a selection where probability of individual selection is proportional to its value of fitness function [8]. Procedure of roulette selection is shown in Fig.5. Crossover is an equivalent of sexual reproduction in natural systems. It is a process where information from two individuals is taken into account to obtain offspring [9]. Mutation is an equivalent of asexual reproduction in natural systems. It is a process where some information is changed in one individual at random. Parameters are constants characterizing evolution process. Standard parameters are: population size, crossover probability and mutation probability. Parameter tuning depends on finding such evolution parameters before starting evolution that allow to obtain best possible convergence of population.

Fig. 3. One generation diagram

Parameter control [10] is a similar to parameter tuning but instead of tuning, parameters are changing in time. That change is programmed and occurs without additionally information form output. Parameter adaptation is a change of parameters which depends on population fitness history, so it uses feedback for obtaining new evolution parameters. Parameter self-adaptation [11], [12], [13], [14] is an application of evolutionary algorithms to finding evolution parameters. Evolution parameters are part of all features in feature vector of individual. Individual is an instance of capacity and flow assignment problem. It is a feature vector of the form [ µc , µm , c 1 , c 2 ,…, c E ] where µc is probability of crossover, µm is probability of mutation and ce is capacity of edge e. Population size is a fixed parameter.

Fig. 4. Evaluation diagram

Fitness function is a value of bottleneck divided by cost and is equal to 0 when costs exceed the total cost. When capacities do not provide feasible solution, bottleneck is equal to 0, so in effect non-feasible solutions have no chance to be selected in roulette.

0, when   f = B  ξ e ye  ∑ e

∑ξ

ye > b

(7)

otherwise

Crossover depends on generating weight average coefficient and finding two individuals to obtain offspring. Coefficient z has a unit probability distribution and offspring o is weight average of its parents p1 and p2. This allows us to save cost feasibility. (8) z = U [0,1]

o = zp1 + (1 − z ) p 2

(9)

Fig. 5. Roulette selection diagram

Mutation depends on adding random amount of capacity from one feature and subtracting same value from another feature in feature vector. This allows us to save cost feasibility. Added value m is from normal distribution with dispersion depending on capacity value. (10) m = N (0, 0.05c ) It is almost impossible to reach negative capacity because of small. Each individual is related to another evolution process solving flow assignment problem to obtain highest possible bottleneck for capacities in feature vector of that individual. 4. EXPERIMENTATION SYSTEM Experimentation system is designed for obtaining dependency between total cost and bottleneck. The block diagram of the system is shown in Fig. 6. Algorithm uses threads representing one instance of problem which itself is an evolutionary algorithm.

Fig. 6. Experiment diagram

GUI application was built using JGAP [15]. Experiment system was built upon [16].

5. EXAMPLES Before description of experiments results we concentrate on theoretical considerations. We assume that we have network with N nodes and E edges. Maximal number of edges is equal to E max =

1 N ( N â&#x2C6;&#x2019; 1) . Every edge has its capacity we can increase 2

using some total cost. When solution of stated problem is not feasible, we assume bottleneck equal to 0, otherwise it is positive. When we are increasing total cost the bottleneck becomes positive and from certain moment it is linear. We state that this relationship is close to polynomial to that moment. To obtain model nonlinear regression was applied. We assume that dependency has following form: (11) f ( x ) = ax n where x is total cost, and n are real constants. Additionally we compare polynomial degree with network degree:

logN logE

(12)

Dependency between total cost and widest bottleneck is shown in Fig. 7. Results of performed experiments are shown in Table 3.

Example 1: Experimentation system parameters are presented in Table 1 and Table 2.

Table 1. Example 1 - problem parameters

Table 2. Example 1 – algorithm parameters

Problem parameters Nodes

Edges

Demands

Unit cost

Algorithm parameters Crossover Mutation Population prob. prob. 0.1 0.01 100

Example 2: Experimentation system parameters are presented in Table 3 and Table 4. Table 3. Example 2 – problem parameters

Table 4. Example 2 – algorithm parameters

Problem parameters Nodes

Edges

Demands

Unit cost

Algorithm parameters Crossover Mutation Population prob. prob. 0.1 0.01 100

Table 3. Example 2 – problem parameters

Table 4. Example 2 – algorithm parameters

Dependency between total cost and widest bottleneck is shown in Fig. 7. Results of performed experiments are shown in Tables 5 and 6. Results: Table 5. Example 1 – results

Model parameters

Table 6. Example 2 – results

Network degree

Model parameters

Network degree

0.7621

0.6380

0.6888

1.7564

0.6527

0.6559

Fig. 7. Experiments (1) and (2) results

We can notice that in second case polynomial degree is close to network degree (1% in logarithmic scale). First experiment gives us large difference (20% in logarithmic scale). Because of complexity of problem is hard to determine general relationship between them.

6. CONCLUSIONS For designers it can be helpful that for feasible solution capacity enhancement bottleneck grows slowly with allocated total cost cost when it is necessary (small values of widest bottleneck) and grows linearly (quite fast) when bottleneck is great (when it is redundant). Experiments should be repeated for networks with fixed number of nodes and different number of edges to find relationship relationship between network and model degree. Another possible future work is to implement two-level branch & cut and comparison performance with self-adaptive genetic algorithms.

REFERENCES [1] NEUMAN I., GAVISH B., A System for Routing and Capacity Assignment in Computer Communication Networks, IEEE Transactions on Communication, 1989

[2] PIÓRO M., MEDHI D., Routing, Flow, and Capacity Design in Communication and Computer Networks, Morgan Kaufman Publishers 2004

[3] ŁĘSKI J., Neuro-fuzzy systems, WNT, 2008, (in Polish) [4] KASPRZAK A., Design of Wide Area Networks, Wroclaw University of Technology Press, Wroclaw, 1989 (in Polish)

[5] KASPRZAK A., Joint Optimization of Topology, Capacity and Flows in Teleinformatic Networks, Wroclaw University of Technology Press,, Wroclaw, 1989 (in Polish)

[6] ARABAS J., Lectures in evolutionary algorithms, WNT, 2004, (in Polish) [7] GOLDBERG D. E., Genetic algorithms in search, optimization and machine learning, Addison-Wesley, 1989

[8] HOLLAND J.H., Adaptation in Natural and Artificial Systems, MIT Press, 1992 [9] MICHALEWICZ Z., Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, 1996

[10] EIBEN A. E., HINTERDING R., and MICHALEWICZ Z., Parameter Control in Evolutionary Algorithms, IEEE Transactions on Evolutionary Computation , 1999

[11] EIBEN A.E., SCHUT M.C. and DE WILDE A.R., Boosting Genetic Algorithms with SelfAdaptive Selection, IEEE Congress on Evolutionary Computation , 2006

[12]GLOGER B., Self-adaptive Evolutionary Algorithms, Universitat Paderborn , 2004 [13]HINTERDING R., MICHALEWICZ Z., PEACHEY T.C., Self-Adaptive Genetic

Algorithm for Numeric Functions, Department of Computer and Mathematical Sciences , Victoria University of Technology , 1996 [14]MURATA Y., SHIBATA N., YASUMOTO K. and ITO M., Agent Oriented Self Adaptive Genetic Algorithm, IEEE Transactions on Evolutionary Computations, 1999 [15]JGAP FAQ, jgap.sourceforge.net, (accessed 2011) [16]OHIA D., “Computer experimentation system for making efficiency analysis of algorithms to computer system optimization”, Master Thesis, 2008

Computer Systems Engineering 2011

Keywords: wireless LAN, Access Points placement, network deployment

Jacek BORKOWSKI* Andrzej KASPRZAK**

ALGORITHMS FOR BASE STATION LOCATION IN WIRELESS NETWORKS WITH COST MINIMIZATION

Wireless Local Area Networks are becoming increasingly popular for providing high data rate network access to mobile computers. Unlike traditional cellular telephony systems, WLANs are deployed in an ad-hoc fashion, often based on an educated guess by the person installing the base stations. This typically results in coverage gaps or capacity loss due to misplaced Access Points. In this paper author examine a problem of base stations placement in wireless networks. The problem analysis was made, and five algorithms were implemented in order to solve this problem. The system performance is evaluated using an objective function which aims to minimize cost of installation and maximize overall signal quality. The authors examine which of the algorithms gives the best results in different cases.

1. INTRODUCTION While projecting wireless computer networks it is necessary to account for many factors. One of the most important of them is the optimal use of a network infrastructure, the so called Access Point (AP) and its best placement. In connection to this arises a problem of the proper choice of equipment. The most important parameter of Base station (BS) is its range. Naturally with BS range rising its price rises too. This project makes an attempt to model a certain area on which there are users (Subscriber stations, SS). The rest of the papers is organized as follows:

Department of Systems and Computer Networks, WrocĹ&#x201A;aw University of Technology, Poland, e-mail: jacek.borkowski87@gmail.com ** Department of Systems and Computer Networks, WrocĹ&#x201A;aw University of Technology, Poland, e-mail: andrzej.kasprzak@pwr.wroc.pl

Problem description – problem description with defined input and output parameters, symbols and sizes. Description of the processes taking into account assumptions and restrictions. • Problem solution - description of methods and implemented algorithms • Experimentation system – description of ways of carry on research, description of a product in system interpretation, and its characteristic. • Research results - formulated research theses, proposed plans of experiments, exemplary solutions, summary of each thesis, and conclusions of theses. • Conclusions and references 2. PROBLEM DESCRIPTION In this section, we give a general description of the problem that we are interested in. We need to generate BS placement in such a way as to cover each user in its range while being the cheapest solution to the problem. Model of the problem must include following input elements: • research area, • subscriber stations placement, • types of BS. Program task is to generate following output elements: • base stations placement, • used base station types, • cost of proposed installation. Naturally, the created application is a quite simplified model, due to the high complexity of this problem. The program does not take into account certain parameters such as; reliability, disturbances, physical obstacles (relief). The research area is represented by a 640 x 400 pixels rectangle– modeled in program as two– dimensional array. To bring about algorithms we used following classes: BS_type{ double range; double cost; int type; };

//type of base station // range // cost // type

Class above represents the types of BS that are defined by a user. This includes the information about the type (in practice it is the following number of a given type number), cost and range of given BS. 23

The next one is the Receiver class, that represents a subscriber station in the program. Objects of this type include information about the SS position and its number (each SS has a unique number). This object also has a number of the group (not used by the each algorithm). Receiver{ int num; int x; int y; int group; };

//SS station //number //coordinate x //coordinate y //number of group

The task of Transmitter class is describing base stations. That includes the information about their number, position, and type. (class BS_type). The object of this class also includes an object of class vector (includes objects of type Receiver), stores data about subscriber stations, that are in the range of a given base station. Transmitter{ //Base station int num; //number int x; //coordinate x int y; //coordinate y int type; //type List <Receiver> Receivers; //BS’s }; The objective function can be therefore defined as:

∑

ci ( k ) y i → min .

(1)

xij ≥1, j ∈ S S

(2)

i∈BS

∑

i ∈BS

x ij ≤ y i ,i ∈ BS , j ∈ N (i )

(3)

where BS is representation of Base Stations, SS is representation of Subscriber Stations, ci(k) is the cost of type k" BS in location I. The objective is that the cost should be as low as possible (1) and the each receiver should be assigned to any base station (2)(3). 24

3. PROBLEM SOLUTION There were developed many algorithms to solve problem of optimal base station location. Some of them deal only with problems with optimal terrain covering by signal [1][2], the others additionally touch the problem of choice the best frequencies set (without interferences)[3]. There are also algorithms that take into consideration minimization of installation cost [4][5]. In our work we focused on two factors â&#x20AC;&#x201C; designing network with all of the target space radio coverage and cost minimization of whole installation. There are five algorithms to solve this problem. 3.1. REFERENCE INSTALLATION

This is the simplest possible solution. It was described in [2] and it is based on coverage of the entire map with the signal, regardless of there are receivers or not. This algorithm required a small modification by me - in the original version it is used only one type of the transmitter, in the case of my experiment system the user can define any number of them. It is necessary to select the best type of the transmitter. It is chosen on the basis of the best value for the transmitter price by the coverage area . So the action is as follows: 1. Firstly the best type is chosen. 2. The transmitters are equally spaced on the map. 3.2. PRUNING

Pruning is an algorithm, which is strongly based on the reference installation. It consists in "cutting" unnecessary transmitters. Again I had to make small modifications to adapt the algorithm from [2] to the described problem . The algorithm is as follows: 1. Set up transmitters as in the reference installation, but this algorithm uses the type of transmitters with the biggest range. 2. For each transmitter position is checked whether it is possible to set another type of the transmitter, or its complete removal (in the original version was tested only whether it is possible to delete). 3. If it is possible to use another type of transmitter or its removal, the total cost is calculated that if it is smaller than at present, this solution is saved. 3.3. PRUNING WITH NEIGHBORHOOD SEARCH

The next solution is an improvement of the previous one. It aims only to maximize the strength of the signal. It consists in searching the neighborhood for the initial solution in order to find a local optimum. As an initial solution used results of Pruning algorithm. 25

3.4. ISBA

It is the modificated solution from [4]. This algorithm uses to his functioning graph theory. All subscriber stations are represented by vertices of a graph. Vertices are connected with edges, when distance between them is shorter than half of range of base station type with longest range. The graph theory meaning is independent from graphs finding. When the program finds all independent groups, it calculates the centre of gravity for each group, and chooses the best base station. The best way to choose the base station looks as follows: we put every defined type of BS, its position is then corrected to find the most favorable position. The quality of its position is defined based on its price and the quantity of users (Ss) in range. The stages of the algorithm look as follows: 1. 2. 3.

Are unassigned subscriber stations. If yes, go to 2., else go to 4. Create a new group Check neighborhood of subscriber station. If the distance to neighbor is shorter than a ray of base station with the highest range, add to the new group and erase from the list of unassigned subscriber stations. Else go to 1. Execute this for each group • Calculate the centre of gravity • Choose the best type of BS • put BS Stop 3.5. BBL

This is original algorithm and its name is also an acronym from Borkowski Base station Location. It works differently than the other algorithms. On the map we write values that are the reflection of the distance to subscriber stations. These values are generated as follows: The highest value has a point, where SS is placed, and the next squares have value decrement. Therefore, the maximal value is a range of BS type with its highest range. In the beginning all squares on the map have the value “0”. The values are generated for each of the SS, and are added to the map. After the map is created, our program finds the biggest value on the map, and in that position places the best BS type. To choose the best BS type program uses function identical as the first algorithm. Steps of the algorithm looked as follows: 1. 2. 3. 4. 5.

Update mape Are squares on map with value > 0. If yes, go to 3, else stop program Search square on map with highest value Put proper BS Delete from map SS’s that are in range of existence BS’s, go to 1. 26

4. EXPERIMENTATION SYSTEM The Windows platform has been chosen as an environment for implementation of our program, an application in C#, which is the successor to C + + and the largest competitor of the JAVA platform . The final version was written using Microsoft development environment Visual Studio 2010 on the platform .Net version 3.5. When creating an application, its authors tried to make it smooth and proper functioning in addition to being convenient for future users. The main assumption was to create the program that was clear, readable and intuitive. Below there is a program window (Fig. 1), and briefly describes its main features.

Fig. 1 Program window.

Experimental environment has a graphical interface, where users can putting up the receivers, and the results are presented, i.e. the deployment of transmitters and their ranges. We can use the default look (gray background), load the image from a file, or use the tab "online map" and download from the Internet the map by requested location. Users can choose which algorithms they want to test. After pressing the "tests" button, they are asked to choose the place in which to save the test results, and also to enter the file name. The test results are saved in *.txt files and contain the following data for each of the algorithms: the number of transmitters, the total cost of installation, the installation cost per receiver, the average number of receivers assigned to one transmitter, the maximum number of receivers assigned to one transmitter, the minimum number of receivers assigned to one transmitter, the average received signal power level, the maximum received signal power level, the minimum received signal power level, the percentage coverage of map, the cost of covering one square meter and duration. The program has a modular structure with the potential to quickly and easily expand in the future. You can define any number of types of transmitters and their physical characteristics - such as a transmission power, sensitivity, antenna gain and frequency. 27

5. RESEARCH RESULTS The application allows users to check the influence of different agents on the overall cost of creating wireless network. For instance the count of the receivers can be mentioned here, they are able to receive wireless signal and locations of this receivers on the map (density, quantity and size of centers). We canâ&#x20AC;&#x2122;t forget about that the most important aim which is the minimal cost of the wireless network installation together with an assurance that all users stay in their network connections. The most important experiment was to compare the effectiveness of the location of the base stations by the each algorithm. For the scientific inquiries we use transmitters defined bright in the program. Table 1. AP types configuration. Type 0 1 2

Range 63 73 99

Cost 100 150 230

The examinations were carried out for 3 different methods of receivers layout (Fig. 2,4,6): the regular layout, the layout in small centers and the random layout without distinct centers of receivers. For each layout the different quantity of receivers was defined, beginning from 20 and ending on 100 (with step 20). Each type of spacing was tested for the different number of receivers. We decided to test 5 different numbers of receivers on the map (20,40,60,80,100). This will allow the observation of the efficiency of algorithms, not only of the receivers spacing, but also because of their number. Algorithms have been tested for the 10 test instances. In this paper we will focus on the total cost of the installation, but the program can also investigate the other factors, such as the average power level, the percentage area cover, the number of transmitters, etc.

Fig.2 Regular layout.

The table 2 shows the comparing analysis for the each algorithm with the regular receivers layout (fig. 2). Table 2. Regular layout â&#x20AC;&#x201C; comparison.

20 40 60 80 100

RefIn

Pruning

Pruning+NS

ISBA

BBL

2760.00 2760.00 2760.00 2760.00 2760.00

1538.00 1741.00 1818.00 1998.00 2102.00

1561.00 1764.00 1764.00 1998.00 2117.00

1368.00 1758.00 1996.00 2107.00 2197.00

886.00 1191.00 1338.00 1566.00 1631.00

Fig. 3 shows that in each case of the location the best in terms of the returned total cost, was the BBL algorithm. The algorithms ISBA, Pruning and Pruning + NS gave the similar results to each other, but looking more precisely to the above statement it should be noted that the algorithm ISBA behaved worse in the most of the cases. The all algorithms for the reference installation gave very good results with significantly reducing the total cost of the proposed installation. From the graph it can be read also a trend: it increases with the number of receivers on the map, the total cost of installations have risen more slowly, and for a number of transmitters should stabilize at a constant level, lower than the total cost of the reference installation.

Fig. 3 Regular layout â&#x20AC;&#x201C; comparison.

The next case of the receivers deployment is the placement of them in small centers. The results of the tests on this type of input data are presented in Table 3 and graphically in Fig. 5. The example of the layout shows Fig. 4. 29

Fig.4 Layout in small centers.

Fig. 5 and Table 3 show that in terms of minimizing costs, for locations receivers in centers (Fig. 4), also the best results has given the BBL algorithm. For the smaller instances ISBA obtained very similar results, however, for instances included more than 60 different receivers were becoming clearer. Table 3. Layout in small centers â&#x20AC;&#x201C; comparison.

20 40 60 80 100

RefIn

Pruning

Pruning+NS

ISBA

BBL

2760.00 2760.00 2760.00 2760.00 2760.00

918.00 1608.00 1525.00 1837.00 1879.00

938.00 1628.00 1520.00 1842.00 1884.00

445.00 625.00 754.00 829.00 920.00

415.00 575.00 685.00 920.00 693.00

Fig. 5 Layout in small centers- comparison.

The other algorithms, returned higher cost, uncompetitive with respect to two of the best in the comparison, but much better than the reference installation. Pruning and Pruning+NS Algorithms , after a sharp increase in costs for the instance of the 40 receivers, have stabilized around a single price level, just like the other algorithms. The last of the dealt with by us way of receivers location on the map was the random placement. The example of this type of deployment is presented in the Figure 6. The Table 4 and Figure 7 show the total costs returned by the each algorithm depending on the number of subscriber stations on the map.

Fig. 6 Random layout. Table 4. Random layout â&#x20AC;&#x201C; comparision.

20 40 60 80 100

RefIn

Pruning

Pruning+NS

ISBA

BBL

2760.00 2760.00 2760.00 2760.00 2760.00

1166.00 1670.00 1945.00 2162.00 2283.00

1153.00 1680.00 1932.00 2170.00 2288.00

797.00 1230.00 1373.00 1580.00 1750.00

728.00 1063.00 1240.00 1420.00 1485.00

On the chart (figure 7) it is easy to see that, for each case, the best results were generated by the BBL algorithm. The Algorithm ISBA for the smaller instances is not much different from the BBL . It should be noted that with the increasing number of receivers, the total cost difference between these algorithms were increasing. Based on the chart it can also be seen that the graph representing the total cost of the algorithm, the BBL is "flattened" for the bigger instances. Pruning and Pruning+NS Algorithms return a lot of worse results in relation to two of the best. The total cost generated by these algorithms for the large instances slowly has approached the level of the total cost of Refin algorithm. 31

Fig. 7. Random layout â&#x20AC;&#x201C; comparison.

Fig. 8. The operational time depending on the receivers number.

The another experiment was to check operational time of algorithms depending on the number of receivers on the map. In this goal, we chosen the map size of 600x400 and 6 test sets, each consisting of 10 different instances. The sets consisted of 100, 150, 200, 250, 300, 350 receivers spaced in a random way on the map.

Table 5. Random layout â&#x20AC;&#x201C; comparison.

RefIn Pruning Pruning + NS ISBA BBL

100

150

200

250

300

350

9.34E-05 5.14E-04

1.38E-04 5.46E-04

1.78E-04 6.79E-04

2.28E-04 8.21E-04

2.67E-04 1.06E-03

3.10E-04 1.12E-03

6.33E-03

9.21E-03

1.10E-02

1.25E-02

1.83E-02

2.57E-02

1.18E-01 1.30E+00

1.88E-01 1.84E+00

2.38E-01 2.52E+00

3.16E-01 3.06E+00

3.84E-01 3.71E+00

4.60E-01 4.44E+00

The results of this experiment were presented in Table 5 and in the Diagram 8. For each algorithm the increase is linear, but the highest growth is characterized by the BBL algorithm. For this test, times are beginning already to be relatively large, reaching up to 5 seconds (BBL). In the case of the algorithm ISBA the biggest time do not exceed the half a second. For the other algorithms (Reference Installation, Pruning, Pruning+NS), these times are practically negligible - about 1 / 10000 second. 6. CONCLUSIONS All of our examinations had one target - to create algorithms for the best location of transceiver for wireless network. We used well known methods with small modification (RefIn, Pruning, Pruning+NS and ISBA) and proposed our own solution to the problem (BBL algorithm). Obviously, we should refer to considerable simplification of analyzed model, like that was write in expression of problem. Clutters and lay of the land were not considered in this case. Unfortunately for wireless networks it is unusually important and almost indispensable into projecting effective networks of this type. Nevertheless, authors think that by creating this application they made it a useful tool in wireless network projecting. REFERENCES [1] [2]

HILLS A., Large-Scale Wireless LAN Design, IEEE Communications Magazine, November 2001, pp. 98-104. KAMENETSKY M. and UNBEHAUD M., Coverage Planning for Outdoor Wireless LAN Systems, International Zurich Seminar on Broadband Communications Access-Transmission-Networking, February 2002, pp 49-1 â&#x20AC;&#x201C; 49-6. 33

[3]

[4]

[5]

RODRIGUES R. C. and MATEUS G. R. and E LOUREIRO A. A., On the Design and Capacity Planning of a Wireless Local Area Network, Network Operations and Management Symposium (NOMS’00), Apr. 10–14, 2000, pp. 335–348 CALEGARI P., GUIDEC F., KUONEN P., CHAMARET B., JOSSELIN S., WAGNER D. and PIZAROSSO M., Radio Network Planning with Combinational Optimisation Algorithms, ACTS Mobile Communications Summit 96 Conference, Granada (Spain), November 25th, 1996. LIN P., NGO H., QIAO CH., WANG X., WANG T., QIAN D., Minimum CostWireless Broadband Overlay Network Planning, WOWMOM '06 Proceedings of the 2006 International Symposium on World of Wireless, Mobile and Multimedia Networks, June 2006, pp. 228-236.

Computer Systems Engineering 2011 Keywords: mesh structure, task allocation, static allocation

Michał HAŁACZKIEWICZ Piotr FRANZ Leszek KOSZAŁKA∗

OPTIMIZED FIRST FIT ALGORITHM COMBINED WITH SIMPLE SCHEDULING FOR SOLVING STATIC ALLOCATION PROBLEM IN MESH NETWORKS

The paper concerns the task allocation problem for static task allocation in mesh structured systems. Three allocation algorithms are evaluated, including First Fit (FF), Stack Based Algorithm (SBA) and proposed Current Job Based First Fit (CJB FF) algorithm. Numerical analysis for small and medium mesh structures shows superiority of the proposed algorithm for the task latency minimization.

1. INTRODUCTION Nowadays, modern computer systems in order to perform complex computations more efficient are often created by connecting many processing units into one big structure. The performance of such structures depends not only on computing power of single processing units, but it is also influenced by task allocation and scheduling algorithms in such structures. Problems of scheduling (task selection) and allocation are important in terms of reducing cost of computing (saving both time and resources). Scheduling problems are well known from the field of designing operating systems and most algorithm like for example First In First Out (FIFO), Shortest Job First (SJF) or Round-Robin (RR) are neatly described in literature, e.g. [3]. In the field of solving allocation problem still new ideas are proposed on basis on e.g. First Fit, Adaptive Scan, Stack Based algorithms, e.g. [1][2]. The aim of this paper is to examine the three implemented allocation algorithms: First Fit (FF), Stack Based Algorithm (SBA) and Current Job Based First Fit (CJB First Fit) that is proposed by one of the authors (M. Hałkiewicz). ∗

Department of Systems and Computer Networks, Wrocław University of Technology

The static allocation problem considered in this paper, assumes the two-dimensional mesh topology with closed queue of ready tasks (during simulation no new tasks are added to the queue/system). The tasks from the queue may be picked for allocation using FIFO or SJF scheme. For the purposes of this paper, the experimentation system was designed and implemented, which allows multidimensional comparison of considered algorithms.

2. PROBLEM FORMULATION Let us formulate the considered problem. Mesh is a set of nodes (processors) connected in orderly fashion. The full mesh M (w, h) is a rectangular two-dimensional matrix of size w × h, where w stands for width and h stands for height of the mesh. A single node in a mesh is denoted as (i, j), where i is an index of a column and j is an index of a row in the mesh structure. In such network submeshes can be distinguished. Submesh SM (i, j, w, h) is a rectangular set of w × h nodes that belong to a mesh M (w, h) and node (i, j) is the foothold of submesh SM in mesh M (Fig. 1). Free submesh is a submesh in which every node is free, i.e., it is not occupied by previously allocated tasks, thus a busy submesh is a submesh in which at least one node is already assigned to process a task.

Fig. 1. An example of the meshM (6, 4) and a submesh SM (3, 1, 2, 3).

The task that is processed by a mesh is denoted as J(w, h, t) and it is a rectangular form with known size w × h and processing time t. All the tasks are also additionally described by: • expected relative task width pw – ratio of expected task width to mesh size, • expected relative task height ph – ratio of expected task width to mesh size,

• expected relative task size p – ratio of expected task size to mesh size (when the expected values of task width and height are equal, then p = pw = ph , assuming that the sizes are normally distributed. The tasks wait in a queue to be allocated. The task queue Q can be FIFO or sorted (ascending or descending due to execution time or number of needed nodes) structure. To allocate the task a free submesh with a size equal to the task size is needed. Thus, the objective is to find such a sequence of tasks and their allocation to nodes that optimize the given criteria. As a main criterion an average latency LA is used, which describes the average time that a task needs to waiting in queue Q before it is allocated. Considering that L(i) denotes time that ith task was waiting in queue consisting of n tasks, LA can be described as follows: 1X LA = L(i) (1) n The additional criteria are the average processing time tA and the fragmentation fA . The tA is given by 1X tA = talloc (i) (2) n considering that talloc (i) is time needed by the algorithm to allocate ith from n tasks. Average fragmentation shows the usage of network resources. It is a ratio of free to busy nodes, calculated according to P w · h − P − ni wi · hi fA = , (3) w·h−P where w is mesh width, h is mesh height, wi and hi denotes width and height of ith task and P is number of nodes in biggest free submesh.

3. ALLOCATION ALGORITHMS To evaluate the efficiency of the proposed CBJFF algorithm, it is compared with First Fit (which was a template to create CBJFF) and Stack Based Algorithm (as representative most complex and advanced family of allocation algorithms nowadays). The First Fit algorithm is described in details in [1]. Its implementation is given below (Algorithm 1).

Algorithm 1 First Fit algorithm (FF) 1: S T A R T F R O M N O D E (i, j) = (1, 1) FO R T A S K J F R O M Q 2: CH E C K W H E T H E R N O D E J I S F R E E 3: CH E C K I F J I S F O O T H O L D O F I F F R E E S U B M E S H M A T C H E D T O T H E S I Z E O F T A S K J, T H E N GO T O S T E P 5 4: C H O O S E N O D E (i + 1, j) O R (1, j + 1) I F j + 1 > M E S H ’ S w GO T O S T E P 2 5: A L L O C A T E T A S K A N D I T S F O O T H O L D I N M E S H M I S N O D E (i, j) The detailed description of SBA can be found in [2]. Its main idea is to find base submeshes for a task, reducing the search space and avoiding unnecessary searches (see Algorithm 2). Algorithm 2 Stacked Based Algorithm (SBA) 1: F O R J F R O M Q 2: C R E A T E P R O H I B I T E D A R E A AP 3: C R E A T E C O V E R A G E A R E A S AC 4: C R E A T E B A S E A R E A S AB B Y S P A T I A L S U B S T R A C T I N G AP A N D AC F R O M M E S H M 5: S E A R C H S E Q U E N T I A L L Y T R O U G H AB S T A C K F O R F I R S T S U B M E S H SM T H A T J F I T S I N IF F O U N D, G O T O S T E P 7 6: R O T A T E T A S K B Y 90 ◦ A N D G O T O S T E P 2 7: A L L O C A T E T A S K A N D I T S F O O T H O L D I N M I S F O O T H O L D 3.1.

CURRENT JOB BASED FIRST FIT ALGORITHM

The proposed Current Job Based First Fit Algorithm is an improvement version of First Fit. The main idea is to speed up the process of searching free nodes in a mesh by omitting busy nodes.

Algorithm 3 Current Job Based First Fit Algorithm (CBJFF) 1: S T A R T F R O M N O D E (i, j) = (1, 1) FO R T A S K J F R O M Q 2: C R E A T E P R O H I B I T E D A R E A AP 3: S P A T I A L S U B S T R A C T AP F R O M M T O C R E A T E B A S E A R E A AB CONSIDER ONLY THAT AREA

4: 5:

7: 8:

IF N O D E J I S F R E E, T H E N GO T O S T E P 6 C H O O S E N O D E (i + wt , j), W H E R E wt I S W I D T H O F T A S K T O W H I C H B E L O N G S N O D E (i, j) O R (0, j + 1) I F i + w > CH E C K I F P I C K E D N O D E I S F O O T H O L D O F I F F R E E S U B M E S H M A T C H E D T O S I Z E O F T A S K J, T H E N GO T O S T E P 8 C H O O S E N O D E (i + 1, j) O R (1, j + 1) I F j + 1 > AB GO T O S T E P 3 A L L O C A T E T A S K . I T S F O O T H O L D I N M E S H M I S N O D E (i, j)

4. NUMERICAL ANALYSIS To evaluate the analysed algorithms, an experimentation system was developed (Fig. 2).

Fig. 2. An experimentation system block scheme

It was coded in C# and .NET Framework 3.5 was used. For convenience, it is possible to set the repetition of the experiment certain number of times for each algorithm (see Fig. 3). Moreover, detailed logs are generated that can be saved in well known formats, e.g. HTML. 39

Fig. 3. Experiment options

The aim of the numerical analysis was to compare efficiency of FF, SBA and CBJFF in the environment of small and medium mesh structures according to the given criteria. Moreover, in each experiment the impact of queue sorting on received latency results was examined. 4.1. EXPERIMENT 1: INCREASING NUMBER OF TASKS

In the first experiment, the set of tasks (queue) was increasing significantly with slightly growing mesh. Experiment design (combination of input values) is shown in Table 1. Other inputs (not listed in Table 1) are as follows: • min – max width of task: 3-6, • min – max height of task: 3-6, • min – max execution time of task: 5-20, • sorting: unsorted.

Table 1. Combination of input values for Experiment 1. Exp. no. 1 2 3 4 5 6 7

Mesh width height 20 30 40 50 60 70 80

Number of tasks

20 30 40 50 60 70 80

Expected relative task size [%]

60 140 240 380 540 730 840

22.5 15.0 11.3 9.0 7.5 6.4 5.6

Obtained results are presented in Fig. 4–6.

Fig. 4. The average latency – Experiment 1.

Fig. 5. The average allocation time – Experiment 1.

The CBJFF provided solutions with the lowest latency (inversely proportional to mesh size). However, for greater mesh structures the difference between the CBJFF and FF started to fade (see Fig. 4). Moreover, CBJFF was characterized by the best allocation time, significantly lower than for the other algorithms (see Fig. 5). The low latencies may be the result of low fragmentation maintained by CBJFF algorithm especially in comparison to SBA (Fig. 6).

Fig. 6. The average fragmentation â&#x20AC;&#x201C; Experiment 1.

The impact of chosen queue sorting type on average latency is shown in Fig. 7. For sorting the tasks in queue by processing time in ascending order over 25% of latency decrease was obtained comparing to the situation when no sorting was used. For descending sorting an increase of latency was noticed.

Fig. 7. Average latency depending on queue sorting for CBJFF algorithm â&#x20AC;&#x201C; Experiment 1.

4.2.

EXPERIMENT 2: INCREASING MESH SIZE

In this experiment, the task generation parameters were chosen so that the expected relative task size p was always constant and equal 15% for increasing mesh size. The combination of input values for Experiment 2 is shown in Table 2. Other inputs are as follows: • number of tasks: 134, • min – max processing time of a task: 5-20, • sorting: unsorted. Table 2. Combination of input values for Experiment 2. Exp. no. 1 2 3 4 5 6 7

Mesh width height 20 30 40 50 60 70 80

20 30 40 50 60 70 80

Task size min max 2 3 3 4 5 6 6

5 7 9 12 14 16 18

Obtained results are presented in Fig. 8–10.

Fig. 8. The average latency – Experiment 2.

Fig. 9. The average allocation time â&#x20AC;&#x201C; Experiment 2.

The CBJFF algorithm was characterized by the lowest latencies among all analysed algorithms (Fig. 8). However, the time required for CBJFF to allocate tasks grows with mesh size (for medium meshes it is higher than allocation time of SBA, see Fig. 9) and the algorithm may have some problems with using the resources of the mesh, even if the mesh size increases (Fig. 10).

Fig. 10. The average fragmentation â&#x20AC;&#x201C; Experiment 2.

The impact of chosen queue sorting type on average latency for this experiment is shown in Fig. 11. Again the best results were obtained for sorting processing times in ascending order and the worst seemed to be the sorting by processing times in descending order.

Fig. 11. Average latency depending on queue sorting for CBJFF algorithm â&#x20AC;&#x201C; Experiment 2.

5. CONCLUSIONS In this paper, the efficiency of three allocation algorithms was analysed as well as the impact of scheduling task on static allocation performances. The proposed algorithm CBJFF revealed to be fast and it may be recommended to use with small and medium mesh structures, if the objective is to minimize task latencies. CBJFF is faster than well know First Fit and for some instances it is even faster than SBA (recommended for large mesh structures). To decrease further the latency of tasks (regardless of the choice of the allocation algorithm) different sequences of tasks can be taken into consideration. For ascending sorting of processing times, latency can be decreased even about 20% (when comparing to results for unsorted queue).

REFERENCES [1] ZHU Y., Efficient processor allocation strategies for mesh-connected parallel computers. Journal of Parallel & Distributed Computing, vol. 16, 1992, pp. 328-337, . [2] YOO B.S., DAS C., A fast and efficient processor allocation scheme for mesh-connected multicomputers. IEEE Transactions on Computers, vol 51, No. 1, 2002, pp.. [3] TANENBAUM A. S., Modern operating systems. 2nd edition, Prentice Hall, NJ, 2001.

Computer Systems Engineering 2011 Keywords: water networks, model reduction, numerical efficiency

Daniel PALUSZCZYSZYN∗ Piotr SKWORCOW Bogumił ULANICKI

IMPROVING NUMERICAL EFFICIENCY OF WATER NETWORK MODELS REDUCTION ALGORITHM

Nowadays, it is common that water distribution network (WDN) models contain thousands of elements to accurately replicate hydraulic behaviour and topographical layout of real systems. Such models are appropriate for simulation purposes, however optimisation tasks are much more computationally demanding, hence simplified models are required. Variables elimination is a mathematical method for the reduction of such large-scale models described by non-linear algebraic equations. The approach benefits of preserving the non-linearity of the original WDN model and approximates the original model at wide range of operating conditions. However its compute-intensive nature demanded that its implementation should take into account the development in programming languages and the recently released libraries allowing an optimisation of the executable program for multi-core machines. This will ensure that model reduction application will be able to cope with complex topologies of large size networks. In this paper the process of design and development of the research software is described with focus put on the emerged computational research aspects. It is demonstrated that utilisation of parallel programming techniques and sparse matrices ordering algorithms drastically decrease computational time of the model simplification.

1. INTRODUCTION The primary aim of a water distribution network (WDN) is to deliver water from water sources to intended end points while meeting the specified requirements in terms of water quantity, quality and pressure. Typically, this is achieved by means of interconnected elements, such as pumps, pipes, control and isolation valves, storage tanks and reservoirs. Each of these elements is interrelated with its neighbours thus the entire WDN behaviour depends on each of its elements. ∗

Water Software Systems, De Montfort University, United Kingdom, e-mail: paluszcol@dmu.ac.uk

In the past modelling of WDN involved many steps and it was a very tedious and laborious process. With advancement in information and communications technology (ICT) and geographic information system (GIS) the process of modelling has been greatly improved as the information about the topology and components can be derived automatically from a GIS system. However this resulted that nowadays, it is common that WDN models contain thousands of elements to accurately replicate hydraulic behaviour and topographical layout of real systems; e.g. see large-scale models in [23]. Such models are appropriate for simulation purposes, however optimisation tasks are much more computationally demanding, hence simplified models are required. Especially, in online optimisation frameworks such [44] where an optimal solution have to be obtained within certain time interval. Reduced water network model can be obtained in many ways. But, technique chosen for such task will have to satisfy the following requirements in order to incorporate into the optimisation scheme described in [44]: (i) reduced model should accurately replicate hydraulic behaviour of the original model, (ii) algorithm should perform in real-time or near real-time to allow an online adaptation to abnormal events and structural changes, (iii) algorithm should keep a record how demand was aggregated and distributed in a reduced model, (iv) algorithm should provide capability to retain components important from the operator perspective. There are different techniques of a WDN model reduction; the outcome of most of these methods is a hydraulic model with a smaller number of components than the prototype. The main aim of the reduced model is to preserve the nonlinearity of the original network and approximate its operation accurately under different conditions. The accuracy of the simplification depends on the model complexity, purpose of simplification, and the selected method such as skeletonization [40], parameter-fitting [3], graph decomposition [9], enhanced global gradient algorithm [19], metamodelling [6] and variables elimination [46, 1, 30]. After investigation of the aforementioned model reduction techniques the recently extended variable elimination algorithm [30] was chosen as the most suitable. It was decided that the algorithm implementation should take into account the development in programming languages and the recently released libraries allowing an optimisation of the executable program for multi-core machines. Research described in the following sections was mainly generated from the necessity to improve numerical efficiency of the algorithm. Use of GIS in water industry resulted in an increasing amount of information about actual network topology and service. Hence, to ensure that model reduction application would be able to cope with complex topologies of large size networks an investigation was carried out for: (i) an

efficient way to manage large sparse matrices representing WDN topologies, (ii) to exploit multi-thread computing aimed at distributing the computational load on multi-core processors, and (iii) analyses of water networks aimed at improving the understanding of network functioning, and eventually, reducing the computational effort while managing or even improving the accuracy of the extended model reduction algorithm. It was apparent that none of these research directions prevails on the others but, rather, their combined development would provide means to enhance and, ultimately, create a practical and reliable tool. The remainder of the paper is organised as follows. Section 2 briefly introduces the model reduction algorithm. Sections, 3 and 4, gather all the requirements and necessary tools required to carry out the implementation. Section 5 outlines an initial design of the implementation process and reveals some arisen computational issues to be investigated. Section 6 focuses on the computational aspects arisen throughout the software development. The last Section provides conclusions.

2. WDS MODEL REDUCTION ALGORITHM The approach of variables elimination is based on a mathematical formalism initially presented in [46] and recently updated in [1] and [30]. This mathematical method allows a reduction of water network models described by a large-scale system of nonlinear differential algebraic equations. The approach is illustrated in Fig. 1 and proceeds by the following steps: full nonlinear model formulation, model linearisation at specified operation time, linear model reduction using Gaussian elimination and nonlinear reduced model reconstruction. The method was successfully implemented and tested on many water networks [25, 5, 33, 34, 42, 32, 36, 44, 37, 43]; see Table 1. Especially real-time oriented studies in [42] and [36] indicate that variable elimination can be effectively applied for real-time applications. Due to length of the simplification algorithm its description is omitted here. Although, a detailed illustration of the variable elimination on example water networks can be found in [25] and [1].

3. APPLICATION REQUIREMENTS Prior implementation it was vital to determine requirements for the developed application. The overall process of the software development was carried out in a way that is known in software development literature as a waterfall model. The waterfall model 48

Fig. 1. The variable elimination algorithm.

Table 1. Applications of variable elimination in water resources literature. Work

Application

[25]

WDN model reduction

[5] [33] [34]

[42] [32] [36] [37] [44] [1] [43]

Original model statistics Reduced model statistics Scope of reduction Framework Nodes Pipes Nodes Pipes %

1091 500 Optimal WDN operation 4248 Contamination detection 16 16 Water quality analysis 36 9 92 Optimal WDN operation 867 Water quality simulations 14945 Hydraulic state estimation 12523 Hydraulic state estimation 12523 Optimal WDN operation 2074 WDN model reduction 3535 Optimal WDN operation 12363

1295 134(∗ ) 450 15(∗ ) 4388 252 34 6 34 8 41 11 12 7 117 36 987 77 16056 8173 14822 347 14822 347(∗ ) 2212 44 3279 1023 12923 164

300(∗ ) 20(∗ ) 414 16 28 13 9 56 92 8995 1100 1100(∗ ) 42 1340 336

81.81 96.32 92.29 56.00 28.00 68.83 23.81 55.98 90.88 44.62 94.71 94.71 97.99 65.32 98.02

Off-line Off-line Off-line Off-line Off-line Off-line Off-line Off-line On-line Off-line Off-line Off-line On-line Off-line Off-line

(∗ ) One of many obtained solutions.

is the classical model of software development and have been widely adopted in many organisations and companies [29]. It fits especially well with implementations of the engineering algorithms as it is simple to implement and also the amount of resources required are minimal. The waterfall model offers also high visibility and transparency of the whole process during the development. At this stage all the requirements already outlined in Section 1 were gathered to formulate a foundation upon which the development will be carried out. Such synopsis allowed to identify not only necessary tools in the development process but also areas where a further research was needed. The following requirements were identified for the model reduction application: Type of application Initially, the model reduction application was aimed to be embedded in the control scheme presented in [44]. However, other research projects conducted by the Water Software Systems (WSS) group required a WDN model reduction software that could be both modular and standalone application. 49

Software level compatibility As mentioned above the final application could be embedded in other projects therefore to ease with a future integration the software should be coded in the same programming language. Recently, the most projects in WSS group was codded with use of the C# programming language along with the supporting .NET Framework 3.5/4.0 libraries. Real-time or near real-time model reduction Online optimisation techniques employed in WSS research projects required that the process of WDN reduction is performed in real-time or near real-time. Demand distribution log During the simplification process, nodes are removed and associated demands are redistributed based on the removed pipesâ&#x20AC;&#x2122; conductance. For the optimal scheduling purposes it was necessary to log the demands reallocation. Energy distribution Operational optimisation techniques usually aim to calculate optimal control schedules for pumps and therefore it is crucial to preserve the energy distribution of the original water network. Interaction with hydraulic simulator A hydraulic simulator is a essential tool, especially at the initial stage of the simplification process. It provides results of the extended period simulation. It was decided to use Epanet2 Toolkit software as the hydraulic simulator. Hence, an interface between Epanet2 Toolkit and created application is required to automatically read hydraulic data results from the model simulation. User interface User interface should be plain, transparent and intuitive to ease the whole process of model reduction. Also, it should allow to define the scope of the simplification i.e. user should be able to select the WDN elements to be retained in the reduced model.

4. TOOLS AND SOFTWARE EMPLOYED The software, tools and devices employed were as follows: High-performance multi-core CPU workstation It was demonstrated in many studies, see Table 1 that the simplification algorithm performed with a sufficient accuracy. But because the model reduction algorithm involved a number of matrix operations with time complexity of order O(n3 ) for the n Ă&#x2014; n matrix, the calculation time for large-scale networks (more than 10000 elements) could take up to several hours what is too long for utilisation in real-time applications. 50

Nowadays, modern computers have two or more CPU cores that allow multiple threads to be executed simultaneously. Moreover, computers in the near future are expected to have significantly more cores. To take advantage of this advent in IT hardware it was decided to parallelise sections of the module algorithm code with a large number of matrix operations. The workload for these compute-intensive operations was distributed among multiple processors. For this purpose a highperformance workstation powered by the six-core Intelr CoreT M i7 980X processor was provided as a host to perform all the necessary calculations. Hydraulic simulator The input data for the model reduction algorithm are water network topology and simulated hydraulic behaviour of the considered water distribution network. For this purpose the open-source Epanet2 Toolkit [38] was used as a hydraulic simulator to perform an extended period simulation of WDN hydraulic behaviour. The library consists of set of procedures that allow to run/stop simulation, modify simulation and network parameters and read/save the simulation data. The Epanet2 Toolkit provided also a compatibility with â&#x20AC;&#x153;.inpâ&#x20AC;? (INP) format as it is a commonly recognized file format to store water network models. Unfortunately functionalities of this library are limited and a number of additional C# scripts were written to allow for a dynamical hydraulic data export. Sparse matrices The structure of matrix representing a water distribution network is naturally sparse. Therefore, in order to exploit this sparsity feature additional open-source libraries Math.NET Numerics (http://mathnetnumerics.codeplex.com/ ) were investigated to provide sparse matrix operations and storage implementations.

5. IMPLEMENTATION PROCESS The first approach to implementation was carried out based on the process illustrated in Fig. 2. First, a water network model stored in the INP file format is simulated with the aid of Epanet2 Toolkit to obtain the hydraulic results. Next, the water network model is being inspected to locate any rules or controls associated with the water network elements. Complex and large water networks modelled in Epanet2 often contain rules and controls that can decrease the accuracy of the simplification. It is highly recommended to eliminate the controls and rules and instead use time patterns resulting from the simulation of the original model (with control and rules) and associate the patterns with the water network elements. Such approach serves as a hydraulic benchmark when original and reduced models are compared. Note that in 51

Fig. 2. The initial approach to implementation process of the model reduction algorithm.

Epanet2 user can associate rules or controls with pipes, transforming them in fact into valves. Since no time patterns can be assigned to the pipe, such rules or controls cannot be automatically eliminated. All components with controls/rules that could not be replaced with a time pattern are automatically selected for retention. The model preparation stage involves also a selection of other important hydraulic elements to be retained. Initially, it was assumed that operator, based on his knowledge about the particular WDN would choose network elements with a significant importance in order to preserve hydraulic characteristics for wide range of operating conditions. But, even though that this operation needs to be done once for the particular model, it could be a difficult and time consuming task. Hence, the model reduction process would not be a fully automatic as requested. Although a typical hydraulic simulation model contains thousands of pipes but only few tanks, pumps or control valves. Therefore, it is an adopted strategy here to reduce the number of pipes and nodes only and retain all other important elements. The identified non-pipe components of a WDN are listed in Table 2. The default is to retain all these elements, alternatively one can define a list of additional elements not to be removed. Table 2. Important elements in water distribution network. Water distribution network elements Tanks (variable head) Reservoirs (fixed head) Pumps Valves Pipes with associated controls or rules Nodes connected to any of the above

To enhance user in a decision-making process which elements should be additionally retained to replicate more accurately hydraulic behaviour and layout of original water network few tools were introduced at the preparation stage. They allow to select nodes, based on their degree; i.e number of neighbours, or select pipes based on their diam52

eter, an approach adapted inter alia by [36]. Nodes with many neighbours are often selected for pressure logger locations and are crucial in water quality analysis. In turn large diameter pipes are often form skeleton of the network. Hence, option to preserve critical nodes and large diameter pipes will allow to retain layout of the network, which is important in WDN design optimisations. Taking into account the aforementioned considerations the implementation process was adjusted. The major adjustment were done to the model preparation stage what resulted that the input model is split up into the two sub-models. Sub-model A, containing pipes and nodes, is subjected to the reduction and afterwards, reunited with the other part containing non-pipe elements (Sub-model B) to form the complete reduced model, which is saved in the INP file format.

6. NUMERICAL EFFICIENCY OF THE SIMPLIFICATION ALGORITHM Once, the process of the implementation was established the attention was directed to the following computational aspects of implementation. 6.1.

MATRIX STORAGE AND PARALLEL PROGRAMMING

The topology of water network can be represented as an incidence matrix that describes the connectivity between pipes and nodes. Such representation is also useful for the implementation purposes as the network topology can be explicitly stored in one of available data structures in the C# specification. The considered data structures were single-dimensional arrays, multi-dimensional arrays and jagged arrays (arrays of arrays). A single-dimensional array is a list of variables where access to its elements is trough an index. A multi-dimensional array has two or more dimensions, and an individual element is accessed through the combination of two or more indices. A jagged array is an array of arrays in which the length of each array can differ. Jagged array elements are accessed also with two or more indices [41]. As incidence matrices for water network topology are usually sparse a potential of the external C# library, Math.NET Numerics was examined. This free library is aiming to provide methods and algorithms for numerical computations in science and supports both dense and sparse matrices [26]. Sparse matrices in Math.NET Numerics are represented in 3-array compressed-sparse-row (CSR) format [11]. All the matrices used in the model reduction algorithm are rectangular hence the multi-dimensional arrays were the first natural choice for a matrix representation. Using multi-dimensional arrays a real matrix data structure can be constructed using a two-

dimensional array, one dimension for rows and another one for columns. However, their overall poor performance forced a need for more effective way to store and multiply sparse matrices. First, the focus was put on the performance of matrix operations. The model reduction algorithm involved a number of matrix multiplications thereby speed of these calculations is factor with a profound influence on the total algorithm calculation time. It was decided to investigate a parallel programming to exploit the potential of recent multi-core CPUs. The parallel programming is often employed for highly computeintensive algorithms. It follows the basic idea of decomposition or division of data to be computed asynchronously by each processors. The process of decomposition is dependable on algorithm to be parallelised and type of parallel computing architecture. Number of concurrent programming models were developed over the years e.g. message passing interface or multi-threading. In general, all of them have a static or dynamic period for partitioning or dividing data quantity to be computed in each processor and, eventually, a subsequent utilisation period of intermediate computations to compute the final result. There is universal agreement that writing multi-threaded code is difficult [45]. Fortunately, .NET 4.0 Framework enhanced the parallel programming by providing a new runtime, new class library types and new diagnostic tools [27]. These features allowed for the implementation of the scalable parallel C# code without having to work directly with threads or the thread pool. The inclusion of the parallel programming techniques drastically reduced the algorithm calculation time. Table 3 contains calculation times performed on the workstation described in Section 4 for a medium size network which contained 3535 nodes, 3279 pipes, 12 tanks, 5 reservoirs, 19 pumps and 418 valves. Table 3. Time taken to complete the simplification process for a medium-sized water network. The benchmark network contained 3535 nodes, 3279 pipes, 12 tanks, 5 reservoirs, 19 pumps and 418 valves. CPU threads

Simplification process time

1 2 4 12

1h 36min 01s 1h 13min 37s 0h 36min 57s 0h 12min 38s

The obtained times of for model reduction were satisfactory for the requirements of the control strategy described in [44]. The models to be considered in that study would be significantly smaller in size than model used as a benchmark in Table 3. The calculation time to perform all optimisation computations was under 4 minutes on average and was never longer than 5 minutes [44]. Hence, the achieved times of less than 15 minutes to 54

reduce much more complex model is more than adequate to perform all computations (model reduction and optimisation) within the desired time interval of 1 hour. However, during development of the model reduction module it was decided to transform it into a full standalone application that could be employed in other research projects carried out in WSS; models in these projects often contain more than 25000 elements. Hence, further research was conducted aimed at extra reduction of the total calculation time. One of the techniques often used by programmers to speed up matrix operations is flattening i.e. representation of multi-dimensional arrays using single-dimensional arrays. Flattening multi-dimensional array into single-dimensional array could benefit in better performance as in the .NET Framework single-dimensional arrays have faster access to their elements, due to optimizations in Common Language Runtime (CLR). Also, usage of jagged arrays instead multi-dimensional arrays could improve matrix computations as jagged arrays are made of single-dimensional arrays. The results of this examination are presented in Table 4 which shows average times taken to multiply sparse matrices constructed from the real water network topologies. Table 4. Performance of different C# implementations of matrices multiplication. Matrix storage and multiplication implementation

166Ă&#x2014;166 sparse matrix with 566 non-zero elements Average time from 1000 runs [s]

3552Ă&#x2014;3552 sparse matrix with 10005 non-zero elements Average time from 10 runs [s]

0.0239 0.0066 0.0493 0.0132 0.0095 0.2201 0.0289 0.0065 0.0050

386.751 71.958 395.336 74.770 60.879 1467.963 896.050 151.934 20.247

1-D Arrays 1-D Arrays (parallel programming) 2-D Arrays 2-D Arrays (parallel programming) Dense Matrix (Math.NET Numerics) Sparse Matrix (Math.NET Numerics) Jagged Arrays Jagged Arrays (parallel programming) Jagged Arrays, (single-indexing access, ijk order, parallel programming)

It can be seen that in all cases introduction of the parallel programming speed up the calculation time; in some cases even five times. For the small matrix (166 Ă&#x2014; 166) the single-dimensional (1-D) arrays outperforms the two-dimensional (2-D) arrays, but for the large matrix the difference between these approaches is not so evident. Also, the potential of the external C# library [26] was examined. Unfortunately, while the Math.NET dense matrices were fast especially for larger matrices the Math.NET sparse matrices due to its CSR format were slowest among the tested approaches. But, 55

on the other hand the storage space of the Math.NET sparse matrix was the smallest. The jagged arrays performed similar to 1-D arrays in case of 166Ă&#x2014;166 sparse matrix. But, for 3552Ă&#x2014;3552 sparse matrix the jagged arrays performed slower. Nonetheless, the jagged arrays were considered to be replacement for the multidimensional arrays rather than 1-D arrays. The main reason is behind this is the maximum storage space for C# data structures. The maximum-object size in CLR in .NET 4.0 Framework is limited to 2 GB for 32-bit application [28]. Moreover, due to CLR memory overheads the actual memory limit is around 1.3 GB. Tests performed on the host machine reveal the memory allocation limits for data C# data structures (see Table 5). Table 5. Memory allocation limits for C# data structures. Tests were performed on the workstation with 4 GB RAM. Note that in C# size of double type is 8 bytes. Data structure 2D array Flatten array Jagged arrays

Maximum allocated memory [Megabytes]

Maximum size of n Ă&#x2014; n matrix of double elements

1001 1183 1530

11185 12160 13829

As can be seen in Table 5 the jagged arrays allowed to allocate the biggest amount of memory. It is because due to memory fragmentation it is easier to find available memory for jagged arrays, i.e. it is more likely that there will be number of blocks of smaller size available than a single, continuous block of the full size of the array which is required to allocated single and multi-dimensional arrays. Moreover, the performance of jagged arrays can be improved even more by employing techniques of single-indexing and ordering indices ijk for matrix multiplication (see e.g. [20] for details). These modifications allowed to reduce even more calculation times for matrices multiplication when using jagged arrays (see the bottom row in Table 4). Combination of parallel programming and jagged arrays (single-indexing access, ijkorder) resulted that the overall simplification time of the benchmark water network used in Table 3 was reduced to 1 min 35 seconds. More methods or tools such as cache optimisation [22], Compute Unified Device Architecture(CUDA) [47] or native libraries can be introduced for further improvement of jagged arrays performance. However, it was decided that further research in this direction would not provide enough gain for the effort needed. Therefore, it was decided to seek for other numerical techniques that can increase speed of calculation in the most compute-intensive algorithm of the model reduction procedure; i.e. Gaussian elimination.

6.2. NODE REMOVAL ORDERING

The Gaussian elimination is the most compute-intensive procedures involved in the model reduction algorithm. When dense matrices are considered one iteration of the Gaussian elimination uses O(n2 ) arithmetic operations and as n iterations must be performed resulting this procedure needs O(n3 ) arithmetic operations to complete [24]. Since its introduction, Gaussian elimination and its performance is in a very strong interest for researchers from many disciplines, especially in areas where Gaussian elimination is applied to a sparse matrix. Many variations were developed over the years, often designed for a particular application. [39] noted that Gaussian elimination on the original matrix results in disastrous fillins. Fill-ins are additional non-zeros generated during the elimination. To illustrate this consider a simple network in Fig. 3. Nodes b, d and e are to be deleted from the network. When the process of removal starts from node b and then in order d and e additional links are created between any two nodes that were adjacent to removed node. For bde order five links (fill-ins) were created, see Fig. 3a. Whereas when starting removal from node e and then d and b only one fill-in (between a and c) was added, see Fig. 3b.

a 2

1 b

d 3

1 b

c (b) Order of removal: edb. #fill-ins = 1

(a) Order of removal: bde. #fill-ins = 5

Fig. 3. Change in number of fill-ins generation due to order of nodes removal.

Therefore, the aim of the most researches is to produce much less fill-ins during Gaussian elimination and thereby reduce computation time and storage space. To address the problem of fill-ins a common-used technique called reordering can be applied to sparse matrices [35]. The idea is to permute sparse matrix rows or columns or both rows and columns. By applying reordering algorithms, the zero and non-zero elements of a sparse matrix are rearranged such that the Gaussian elimination deals with it much 57

more efficiently. The amount of fill-ins depends on the chosen ordering [35]. However, fill-in minimisation is impossible to solve in practice hence heuristics are used [8]. The most widely recognised and applied ordering algorithms are Cuthill-McKee (CMK) [7], reversed Cuthill-McKee (RCMK) [15], minimum degree (MD) [14], Gibbs-Pool-Stockmeyer (GPS) [17] and nested dissection (ND) [16]. More details about the ordering algorithms can be found in [18, 35, 12, 8]. Nevertheless, following [15] observations that reversed Cuthill-McKee ordering yields a better scheme for sparse Gaussian elimination it was decided to incorporate RCMK to reorder Jacobian matrix prior Gaussian elimination. The original RCMK algorithm goal is to order nodes locally so that the adjacent nodes are ordered as close as possible. Another algorithm chosen for a closer investigation was the MD as it is perhaps the most popular strategy for reduction of amount of fill-ins during sparse Gaussian elimination [39]. This strategy selects the node with the smallest degree as the next pivot row which introduces the least number of non-zeros that will be introduced at the corresponding step of Gaussian elimination [35]. Note that Gaussian elimination is applied from the bottom in the simplification algorithm, hence obtained RCMK and MD orderings were reversed accordingly (RCMK becomes CMK). The CMK algorithm used to reorder nodes prior calculation of Jacobian matrix was based on its queue-based version, that was subsequently adapted for water network model reduction i.e. only nodes to be removed are ordered. Also MD, given in [2], was modified in the same way. The effectiveness of CMK algorithm depends critically on the choice of starting node. The starting node may be one of minimum degree [35] or pseudo-peripheral node as proposed by [12]. Here, the latter heuristic was implemented to determine the best starting node for CMK ordering. A comparison of original and ordered Jacobian matrices is shown in Fig. 4. The original Jacobian matrix J used in this illustration was obtained from a real water network which comprised 166 nodes. CMK ordering transformed the structure of the original sparse matrix J shown in Fig. 4a into a band diagonal form as depicted in Fig. 4c. While CMK is oriented on sparse matrix profile reduction the MD aims in reduction of number of fill-ins. Hence, they ordered structures are clearly distinctive; compare Fig. 4c and Fig. 4e. The right hand plots depict the respective Jacobian matrices after Gaussian elimination. It can been clearly seen that level of fill-ins, 1212, in the reduced unordered matrix J S is much higher than for the ordered versions; 755 for CMK and 544 for MD, respectively.

100

120

140

160

160 0

100 nz = 566

150

100 nz = 1212

150

(b) Matrix J S after Gaussian elimination.

(a) Unordered Jacobian matrix J. 0

100

120

140

160

160 0

100 nz = 566

150

100 nz = 755

150

S (c) Reversed Cuthill-McKee ordering of Jaco- (d) Matrix JRCM after Gaussian elimination. bian matrix JRCM . 0

100

120

140

160

160 0

100 nz = 566

150

(e) Minimum degree ordering of Jacobian matrix JM D .

100 nz = 544

150

S (f) Matrix JM D after Gaussian elimination.

Fig. 4. Illustrating original and ordered Jacobian matrices for a water network with 166 nodes.

The reduction of fill-ins were proved correct when number of calculations (multiplications and divisions) was tracked. The MD ordering resulted in the smallest number of calculations needed during Gaussian elimination, see Table 6. Table 6. Number of calculations needed to complete Gaussian elimination for unordered and ordered matrices. Ordering algorithm

Number of calculations

unordered reversed Cuthill-McKee minimum degree

25704 17794 15318

Although the post-elimination matrices have a completely different structure there is no difference between the reduced networks. The obtained numerical results were the same in the parts of â&#x20AC;&#x153;reducedâ&#x20AC;? Jacobian matrices that will be used in the next step to recreate the simplified nonlinear water networks. Ultimately, the choice whether to use of ordering algorithm is determined by size of the water network to be simplified. For water network models with the number of nodes n < 500 no ordering is applied. For larger problems the CMK ordering is chosen; despite it is not optimal it is very fast and easy to implement. Time complexity of CMK for a dense matrix is O(qmax m) where qmax is the maximum degree of any node and m is the number of links (edges) [13]. However, for sparse matrices the CMK time complexity is reduced to O(n) [31]. Whereas in case of MD its worst-case requires a O(n2 m) runtime [21]; for sparse problems MD has also much better runtime but as observed by [4] and [10] MD does not always succeed and can produce ordering worse than original while CMK ordering is found to be equivalent or slightly better than the natural ordering. However, the biggest profit from the ordering was reduction of time needed for Gaussian elimination. When the model reduction algorithm was applied to the benchmark network used in Table 3, but preceded with CMK or MD ordering of Jacobian matrix, the computational time was reduced to less than 5 seconds. Of course, much more research can be done in this area and investigate other reordering techniques such GPS, ND, approximate minimum degree [2], and their variations. But, such research would be beyond scope of this work. Nevertheless, it might form an interesting study to investigate effects of different orderings on performance of processing water network graphs represented by sparse matrices.

7. CONCLUSIONS This study has dealt with the implementation of the extended model reduction algorithm described in [30]. The process of design and development of the research software has been described with focus put on the emerged computational research aspects. Different implementation approaches and their limitations have been investigated. The implementation and graphical user interface were coded in the C# programming language. The Epanet2 Toolkit was used as a hydraulic simulator to perform an extended period simulation of WDN hydraulic behaviour. Parallel programming techniques were employed to distribute workload of the algorithm across multiple CPU cores which nowadays are present in majority of PCs. The limitations of available data structures to store matrix representation of water network along with benefits of sparse matrix reordering prior Gaussian elimination have been examined. The utilisation of parallel programming techniques and sparse matrices ordering algorithms drastically increased the speed of model simplification. The final implementation took into account the outcomes from investigation of parallel programming, storage structures for sparse matrices and nodes pre-ordering. The developed software is able to simplify the water network model, consisted of several thousands elements, within seconds of calculation time. The advantage of this near realtime model reduction is that can be used to manage abnormal situations and structural changes to a network, e.g. isolation of part of a network due to pipe burst. In such case an operator can change the full hydraulic model and run model reduction software to automatically produce updated simplified model. The WDN model reduction software could be integrated with other concepts applied to the WDN or it can be used as a standalone tool for the purpose of the model simplification only. The present tool is aimed at returning a simplified WDN topology which can be still used to perform hydraulic simulation.

REFERENCES [1] A LZAMORA , F., U LANICKI , B., AND S ALOMONS , E. A fast and practical method for model reduction of large scale water distribution networks. Journal of Water Resources Planning and Management (2012). [2] A MESTOY, P. R., DAVIS , T. A., AND D UFF , I. S. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications 17, 4 (1996), 886â&#x20AC;&#x201C;905. 61

[3] A NDERSON , E., AND A L -JAMAL , K. Hydraulic-network simplification. Journal of Water Resources Planning and Management 121, 3 (1995), 235–240. [4] B ENZI , M. Preconditioning techniques for large linear systems: a survey. Journal of Computational Physics 182, 2 (2002), 418–477. [5] B OUNDS , P., K AHLER , J., AND U LANICKI , B. Efficient energy management of a large-scale water supply system. Civil Engineering and Environmental Systems 23(3) (2006), 209 – 220. [6] B ROAD , D., M AIER , H., AND DANDY, G. Optimal operation of complex water distribution systems using metamodels. Journal of Water Resources Planning and Management 136, 4 (2010), 433–443. [7] C UTHILL , E., AND M C K EE , J. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 24th National Conference (New York, NY, USA, 1969), ACM, pp. 157–172. [8] DAVIS , T. A. Direct methods for sparse linear systems. Siam, 2006. [9] D EUERLEIN , J. Decomposition model of a general water supply network graph. Journal of Hydraulic Engineering 134, 6 (2008), 822–832. [10] D UFF , I. S., AND M EURANT, G. A. The effect of ordering on preconditioned conjugate gradients. BIT 29, 4 (Dec. 1989), 635–657. [11] FARZANEH , A., K HEIRI , H., AND S HAHMERSI , M. A. An efficient storage format for large sparse matrices. Communications Series A1 Mathematics & Statistics 582 (2009), 1–10. [12] G EORGE , A., L IU , J., AND N G , E. Computer Solution of Sparse Linear Systems. Academic Press, Orlando, Florida, 1994. [13] G EORGE , A., AND L IU , J. W. Computer Solution of Large Sparse Positive Definite. Prentice Hall Professional Technical Reference, 1981. [14] G EORGE , A., AND L IU , J. W. The evolution of the minimum degree ordering algorithm. Siam review 31, 1 (1989), 1–19. [15] G EORGE , J. A. Computer implementation of the finite element method. PhD thesis, Departement of Computer Science, Stanford University, 1971.

[16] G EORGE , J. A. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis 10, 2 (1973), 345–363. [17] G IBBS , N. E., P OOLE , J R , W. G., AND S TOCKMEYER , P. K. An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM Journal on Numerical Analysis 13, 2 (1976), 236–250. [18] G IBBS , N. E., P OOLE J R , W. G., AND S TOCKMEYER , P. K. A comparison of several bandwidth and profile reduction algorithms. ACM Transactions on Mathematical Software (TOMS) 2, 4 (1976), 322–330. [19] G IUSTOLISI , O., AND T ODINI , E. Pipe hydraulic resistance correction in WDN analysis. Urban Water Journal 6, 1 (2009), 39–52. [20] G OLUB , G. H., 2012.

AND

VAN L OAN , C. F. Matrix computations, vol. 3. JHU Press,

[21] H EGGERNES , P., E ISESTAT, S., K UMFERT, G., AND P OTHEN , A. The computational complexity of the minimum degree algorithm. Tech. rep., DTIC Document, 2001. [22] L AM , M. D., ROTHBERG , E. E., AND W OLF, M. E. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (1991), ACM, pp. 63–74. [23] L IPPAI , I. Water system design by optimization: Colorado Springs utilities case studies. In Pipelines 2005 (2005), pp. 1058–1070. ´ , L., AND G ACS ´ , P. Complexity of algorithms. Lecture Notes, Boston [24] L OV ASZ University, Yale University (1999). [25] M ASCHLER , T., AND S AVIC , D. A. Simplification of water supply network models through linearisation. Tech. Rep. 9901, Centre for Water Systems, School of Engineering, University of Exeter, Exeter, UK, 1999. [26] M ATH .NET N UMERICS. http://mathnetnumerics.codeplex.com/, 2013. [27] M ICROSOFT. Parallel programming in the .NET framework. http://msdn.microsoft.com/en-us/library/dd460693.aspx [Accessed on May 2011], May 2011. 63

[28] M ICROSOFT. Memory limits for windows releases. http://msdn.microsoft.com/enus/library/aa366778[Accessed on May 2013], May 2013. [29] M UNASSAR , N., AND G OVARDHAN , A. A comparison between five models of software engineering. International Journal of Computer Science Issues 7, 5 (2010), 94–101. [30] PALUSZCZYSZYN , D., S KWORCOW, P., AND U LANICKI , B. Online simplification of water distribution network models for optimal scheduling. Journal of Hydroinformatics 15, 3 (2013), 652–665. [31] P EDROCHE , F., R EBOLLO , M., C ARRASCOSA , C., AND PALOMARES , A. Lrcm: a method to detect connected components in undirected graphs by using the laplacian matrix and the rcm algorithm. arXiv preprint arXiv:1206.5726 (2012). [32] P ERELMAN , L., M ASLIA , M. L., O STFELD , A., AND S AUTNER , J. B. Using aggregation/skeletonization network models for water quality simulations in epidemiologic studies. Journal of American Water Works Association 100, 6 (2008), 122–133. [33] P ERELMAN , L., AND O STFELD , A. Aggregation of water distribution systems for contamination detection. In Water Distribution Systems Analysis Symposium (Cincinnati, Ohio, USA, 2006), pp. 1–13. [34] P ERELMAN , L., AND O STFELD , A. Water distribution system aggregation for water quality analysis. Journal of Water Resources Planning and Management 134, 3 (2008), 303–309. [35] P ISSANETZKY, S. Sparse matrix technology. Academic Press London, 1984. [36] P REIS , A., W HITTLE , A., O STFELD , A., AND P ERELMAN , L. On-line hydraulic state estimation in urban water networks using reduced models. In Computing and Control in the Water Industry (CCWI 2009) Integrating Water Systems (Sheffield, UK, 2009), J. Boxall and C. Maksimovicc, Eds., CRC Press, pp. 319–324. [37] P REIS , A., W HITTLE , A., O STFELD , A., AND P ERELMAN , L. Efficient hydraulic state estimation technique using reduced models of urban water networks. Journal of Water Resources Planning and Management 137, 4 (2011), 343–351. [38] ROSSMAN , L. EPANET 2 Programmers Toolkit. US: Risk Reduction Engineering Laboratory, Office of Research & Development, US Enviromental Protection Agency, Cincinnati, Ohio, USA, 2000. 64

[39] S AAD , Y. Iterative methods for sparse linear systems. Siam, 2003. [40] S ALDARRIAGA , J., O CHOA , S., RODRGUEZ , D., AND A RBELEZ , J. Water distribution network skeletonization using the resilience concept. In 10th Annual Water Distribution Systems Analysis Conference WDSA2008 (Ruger National Park, South Africa, 2008), pp. 852–864. [41] S CHILDT, H. C# 4.0: The Complete Reference. McGraw-Hill, 2010. [42] S HAMIR , U., AND S ALOMONS , E. Optimal real-time operation of urban water distribution systems using reduced models. Journal of Water Resources Planning and Management 134, 2 (2008), 181–185. [43] S KWORCOW, P., PALUSZCZYSZYN , D., U LANICKI , B., RUDEK , R., AND B EL RAIN , T. Optimisation of pump and valve schedules in complex large-scale water distribution systems using gams modelling language. In 12th International Conference on Computing and Control for the Water Industry, CCWI2013 (Perugia, Italy, 2013). [44] S KWORCOW, P., U LANICKI , B., A BDEL M EGUID , H., AND PALUSZCZYSZYN , D. Model predictive control for energy and leakage management in water distribution systems. In UKACC International Conference on Control (Coventry, UK, 2010), pp. 990–995. [45] S UTTER , H., AND L ARUS , J. Software and the concurrency revolution. ACM Queue 3, 7 (2005), 54–62. [46] U LANICKI , B., Z EHNPFUND , A., AND M ARTINEZ , F. Simplification of water network models. In Hydroinformatics 1996: Proceedings of the 2nd International Conference on Hydroinformatics (Zurich, Switzerland, 1996), A. Muller, Ed., vol. 2, ETH, International Association for Hydraulic Research, pp. 493–500. [47] VAZQUEZ , F., F ERNANDEZ , J. J., AND G ARZON , E. M. A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency and Computation: Practice and Experience 23, 8 (2011), 815–826.

Computer Systems Engineering 2012 Keywords: mesh, torus, scheduling, taks allocation, First Fit, SBA, SBAT, RCFF

Grzegorz BOROWIEC Aleksandra POSTAWKA∗ Leszek KOSZAŁKA†

STATIC TASKS ALLOCATION ALGORITHMS AND INFLUENCE OF ARCHITECTURE ON MESH STRUCTURED NETWORKS PERFORMANCE

This paper considers efficiency of modified Stack-Based Allocation and First Fit, considering static tasks allocation algorithms in mesh structured networks. The main idea is to add some new connections to the standard rectangular mesh architecture and analyse its performances. Two task allocation algorithms are implemented for three architectures: rectangle, cylinder and torus. They are evaluated from the perspective of the minimization of the task total times. The influence of different task scheduling approaches on mesh properties is also examined.

INTRODUCTION

The motivation of this paper, i.e., the comparison of mesh structures, was the observation of free nodes at edges of a grid (mesh). Rectangle is a standard two-dimensional architecture of Mesh Structured Networks (MSN). Adding a few additional connections can create new more advanced architectures like cylinder or torus, which are characterized by greater usage due to smaller fragmentation. In this paper simple modifications of rectangle structure, such as cylinder and torus, are considered and the discussed task allocation problem is similar to [1]. For some architectures task can be allocated only if there are enough free nodes that corresponds to the dimension of the task, i.e., so called contiguous tasks. On the other hand, for others a task can be split into subtasks (see [7], [8]), but it increase transmis∗ Computer Architecture Group, Wrocław University of Technology, Poland, e-mail: aleksandra.postawka@pwr.wroc.pl † Department of Systems and Computer Networks, Wrocław University of Technology, Poland, e-mail: leszek.koszalka@pwr.wroc.pl

sion. Therefore, in this paper, we focus on the allocation of contiguous tasks. We also use the modifications of First Fit (FF) and Stack-Based Allocation (SBA) allocation algorithms for torus and cylinder architectures. The description of FF algorithm is given in [2], [3] or [4], whereas more details about SBA algorithm can be found in [1], [2], [5], [6]. For some other related papers see [9], [10], [11]. The rest of the paper is organized as follows. Section 2 contains the problem formulation. In Section 3 descriptions of the considered algorithms are given, whereas the numerical experiments are presented subsequently. The last section concludes the paper.

PROBLEM FORMULATION

The problem considered in this problem is formulated as follows. There are given a queue, which contains a set of tasks with known size and processing time, a rectangular mesh network M (W, H) which consists of W · H processors arranged in a two-dimensional grid: M (W, H) = {(x, y) : x, y ∈ Z ∧ x ∈ [0, W ) ∧ y ∈ [0, H)}. Each node represents a single processor and it is identified by (x, y) coordinates, where (0, 0) coordinate is the bottom-left corner. Moreover, a subgrid SM (< i, j >, < w, h >) is a rectangular grid which completely belongs to M (W, H), where < i, j > indicates bottom-left corner and < w, h > denotes the width and height of the subgrid. For the mesh network M (W, H) its subgrid SM (< i, j >, < w, h >) can be defined as follows: SM (< i, j >, < w, h >) = {(x, y) : x, y ∈ Z ∧ x ∈ [i, i + w) ∧ y ∈< [j, j + h)}, where i + w 6 W , j + h 6 H. A free subgrid is a subgrid, where all the processors are free and an allocated subgrid is a subgrid, where all the processors are allocated by tasks [3]. It is also assumed that tasks cannot overlap each other. Let J(w, h, t) denote a rectangular task, where w, h and t are width, height and task processing time respectively. A tasks can be allocated starting from (i, j), thus a submesh SM (i, j, w, h) is allocated. The task occupies a set of busy nodes completely until it is finished. The objective is to find a feasible allocation of tasks, which minimizes their processing times. Let us define some useful terms. A coverage of busy subgrids CJ of tasks J (i.e., waiting in queue) is a set of nodes, where J cannot be allocated otherwise it overlaps other task. A base subgrid BJ of task J is a set of nodes, where J can be allocated (without overlapping other tasks). A candidate area CAJ for the task J is a set of nodes 67

which is considered to become the base subgrid. A reject area RJ of task J is a set of nodes that use of any of them as a base subgrid causes crossing the boundary of the mesh. In the further part of this paper, it is assumed that queues are organized as FCFS (firstcome first-served) queue, so if the next task cannot be allocated, the algorithm waits until some task completes and a suitable free subgrid can be found.

ALGORITHMS

In this section, the considered algorithms are described. 3.1.

RECOGNITION COMPLETE FIRST FIT (RCFF)

Recognition Complete First Fit (RCFF) algorithm is based on the approach used in First Fit algorithm, but it is additionally modified for the cylinder and torus architectures. Full description of FF is presented in [2], [3] and [4]. Firstly, the coverage set CJ and the reject area RJ for the new task J are prepared. FF sequentially searches the nodes, starting from (0, 0), in order to find first node which does not belong to CJ or RJ (thus it can be used for J allocation). If there is no such node and the width of J is different than its height, then the task is rotated and the algorithm attempts to allocate it again. If allocation fails, then the algorithm has to wait. It is easy and fast for small networks. However, in greater networks searching of all the nodes is expensive due to the complexity of the allocation algorithm, which is is O(wh). 3.2.

RCFF FOR CYLINDER AND TORUS

In the case of rectangle, to allocate the task J(w, h, t) in the mesh M (W, H), coordinates 0 6 i 6 W − w and 0 6 j 6 H − h are considered, which do not belong to the reject area RJ . An example of RJ for rectangular mesh network is shown in Fig. 1(b). The reject area is calculated for the task J(1, 2, 5) shown in Fig. 1(a). In the case of cylinder, tasks can be allocated not only in the upper-left part of the grid but at one of the edges as well. Thus, tasks are wrapped and placed on the opposite edge and RJ is calculated only for OY axis (Fig. 1(c)). In order to allocate the task J(w, h, t) in the cylinder structured network M (W, H), nodes with the coordinates 0 6 i 6 W − 1 and 0 6 j 6 H − h are considered. In the case of torus structured network there are no borders and the first task can be placed at any node. Consequently, the special feature of torus architecture is the lack of reject area RJ (Fig. 1(d)). To allocate the task J(w, h, t) in the grid M (W, H) the nodes with coordinates 0 6 i 6 W − 1 and 0 6 j 6 H − 1 68

are considered. In comparison to the rectangle architecture less space is unused. Despite the fact that in torus architecture there is no reject area, additional coverage areas have to be taken into consideration (which are inessential in the rectangle architecture). The example is shown in Fig. 2. Not only the tasks are wrapped around the edges, but the coverage. Cylinder is a special case of torus architecture, thus only the situation shown in Fig 2(b) has to be considered.

(a) Task

(b) Rectangle

(d) Torus

Fig. 1. Reject areas for different architectures.

In practice, there may occur the situation shown in Fig. 3(a). Free nodes are located at the edges of the grid, but another task J1 (2, 5, 10) cannot be allocated. Extending connections to the cylinder structure, there is a possibility of the same task allocation as shown in Fig. 3(b). To allocate the task on the cylinder or torus grid some modifications have to be introduced. When the task is crossing the edges of the mesh, the pointer exceeds its range and it is wrapped to the node on the opposite side. It is the main modification for the First Fit algorithm. Let us take into consideration the situation shown in Fig. 3(c). The task J0 (3, 3, 5) is being performed and it occupies the subgrid

(a) Task

(b) Case 1

Fig. 2. Torus coverage.

SM (< 1, 1 >, < 3, 3 >). The next job in the queue is J1 (5, 2, 7), so it will be allocated on the subgrid SM (< 0, 4 >, < 5, 2 >). During the allocation the pointer for OY axis exceeds its maximum value of 4, thus it is changed to the value within the range (zero) by performing a modulo operation. The next task to allocate is J2 (2, 3, 5). The FF algorithm allocates it on the submesh SM (< 4, 1 >, < 2, 3 >). We are facing the similar situation - the pointer for the OX axis exceeds the maximum value of 4 and the modulo operation is performed. The formulas which allow us to calculate the proper values of pointers is given as follows: x = x mod W y = y mod H

(a) Rectangle

(b) Cylinder

Fig. 3. Tasks allocation for different structures. 3.3.

SBA, ISBA AND SBAT

Stack Based Allocation (SBA) algorithm uses elementary areas of subtraction operation. Full description of SBA can be found in [2] or [5]. In the first step the coverage set CJ for the current task J is calculated. Together with the candidate area (at the beginning it is the entire mesh without the reject area RJ ) it is placed on the top of the stack. The coverage is subtracted from the candidate areas and the results (new candidate areas with reduced coverage sets) are placed on the stack again. The algorithm works until the stack is empty or a candidate area without reject areas to subtract (which means a base area BJ ) is found. It is used for the allocation of task J. If the task cannot be allocated, then it is rotated and the algorithm attempts to allocate it again. If the allocation fails, the algorithm waits. Stack Based Allocation for Torus (SBAT) is the modification of Improved Stack Based Algorithm (ISBA) which is described in [1]. SBAT is more complex because 70

tasks can be allocated also on the borders, so they are wrapped around two edges of the grid. As previously mentioned such situation causes that CJ wrap together with them. Consequently the subtraction algorithms for torus and cylinder architectures have to take into account more special cases.

EXPERIMENTS

In this section, the considered approaches are evaluated during numerical analysis, which is based on the experimentation system. 4.1.

EXPERIMENTATION SYSTEM

The system consists of the following elements (Fig. 4): • Controlled input: A - allocation algorithm chosen from the set {RCFF, SBAT}; S - queue scheduling algorithm from the set {unsorted, time descending, time ascending, task size descending, task size ascending}. • Problem’s parameters: P1 - the number of tasks in the queue; P2 - the range of the size for each job J(w, h, t) in the queue; P3 - the range of processing time for each job in the queue; P4 - the size of torus, cylinder or rectangle shaped mesh. • Outputs: E1 denoted by ta - allocation time ; E2 denoted by tp - processing time; E3 denoted by L - system load; E4 denoted by Fe - external fragmentation; E5 latency.

Fig. 4. The block diagram of the defined system.

To evaluate the quality of the algorithm the criteria suggested in [1] have been used: 71

• Allocation Time ta – defined as the time needed to find free subgrid and allocate the given job J. • Processing Time tp – defined as total time needed by the mesh to process all the given jobs. • External Fragmentation Fe – defined as a ratio of the number of free processors N (Nf ) to the total number of processors in the mesh (Na ), i.e., Fe = Nfa · 100% • Latency - defined as a number of mesh time ticks between adding the task to the queue and its successful allocation. Experimentation environment has been written in C++ and compiled by g++ with -O3 optimization flag enabled. Simulations were performed on the machine with Intel Pentium Dual Core 3.00 GHz processor with 1GB of RAM. At the beginning of each simulation the tasks are randomly generated due to the given parameters and inserted into the queue. The task queue can be scheduled in ascending or descending order by such properties as task processing time or task dimension. When the static job queue is prepared, the allocation algorithm tries to push the first task into the mesh. The queue is organized in FCFS fashion, so when the task cannot be allocated, the algorithm waits. After all the tasks are allocated, the calculated statistics are returned. During the numerical experiments influence of different mesh architectures and scheduling algorithms on the allocation and processing times is analysed as well as the mesh architectures are also compared in terms of the external fragmentation factor and the latency. 4.2.

EXPERIMENT 1

Experiment 1 presents the influence of an architecture on the allocation time. The problem parameters are generated according to the following rules (for the mesh M (W, H)): • P1 is calculated according to the rule: if W · H < 10000 then P1 = 10000 else P1 = W · H, • P2 : 1 ÷ dW · 0.4e, • P3 : 1 ÷ 1000 [ticks], • P4 : 20×20, 50×50, 100×100, 200×200, 300×300, 400×400, 500×500, 600× 600, 700 × 700, 800 × 800.

Fig. 5 shows the dependency of average allocation time for each of the three investigated mesh architectures for RCFF and SBA(T) algorithms. The computational complexity of RCFF algorithm is O(W H), which can be also observed in Fig. 5(a), i.e., for the same values of width and height. The cylinder and torus architectures are more complex than the rectangle structure, but the allocation time is not significantly greater (for the same dimensions). For example, the allocation time of cylinder architecture for the grid size 800 × 800 is greater by 14% than for rectangle architecture and for torus it is greater by 37% than for the basic structure. SBA(T) has almost the same allocation time for all grids greater than 200 × 200 (Fig. 5(b)). For smaller meshes allocation time decreases with increasing grid dimensions. Although the complexity of mesh architecture affects the time allocation, the dependencies are very similar for analysed mesh. In line with expectations the average time needed to allocate the task in the cylinder architecture is longer than in the basic rectangular mesh and in the case of torus structured network allocation takes the most time. Removing each of the borders (by wrapping the tasks) makes the subtraction operation more complex, because more special cases have to be taken into consideration. In Fig 5(b), it can be observed that task allocation time in torus mesh is nearly twice longer than in rectangular mesh. According to the obtained results, for the grid size 800 × 800 allocation time for cylinder architecture is greater by 36% than for rectangle architecture and in the case of torus shaped network it is greater by 91% than for the basic structure. 4.3.

EXPERIMENT 2

The purpose of Experiment 2 is to analyse the differences between mesh architectures from the perspective of processing time as well as task latency. In this experiment also the influence of different task scheduling algorithms. During the experiment the following data sets are analysed: • Data set 1: – P1 : 10000, – P2 : (5 ÷ 15) × (5 ÷ 15), (5 ÷ 30) × (5 ÷ 30), (15 ÷ 30) × (15 ÷ 30), – P3 : 1 ÷ 1000 [ticks], – P4 : 50x50, • Data set 2: – P1 : 10000, – P2 : (5 ÷ 15) × (5 ÷ 15), (5 ÷ 30) × (5 ÷ 30), (15 ÷ 30) × (15 ÷ 30), 73

RCFF rectangle

RCFF cylinder

RCFF torus

45 40

t [ms]

35 30 25 20 15 10 5 0

20x20

50x50

100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800

0.3 SBA rectangle

SBA cylinder

SBA torus

0.25

t [ms]

0.2

0.15

0.1

0.05

20x20

50x50

100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800

Fig. 5. Average task allocation time for different structures and various grid sizes.

– P3 : 1 ÷ 1000 [ticks], – P4 : 100x100. For each type of mesh architecture, data set and values of parameters five different simulations are analysed, where each of them uses different task scheduling algorithm. Tasks are scheduled by one of the two parameters: task processing time and task surface area. The data set is either unsorted or scheduled according to the mentioned parameters in ascending or descending order. For unsorted approach jobs random (i.e., following random sequence in the queue). Simulation results are presented in Fig. 6, 7 and 8. The results obtained for the processing time parameter are presented in Fig. 6. The values are presented from non-zero value in order to easier detect the dependencies. It can be noticed that SBA(T) for selected architecture always gives better results than RCFF. The algorithm selection has a great impact on average processing time. It can be also observed that for almost all investigated sets, the torus architecture achieves the best results, especially for tasks with the greatest surface area (Fig. 6(e) and 6(f)). For example in the case of grid dimensions 50 × 50 and SBA(T) the profit from the use of torus instead of rectangle architecture for unsorted job queue is: 3.9% (Fig. 6(a)), 15.3% (Fig. 6(c)) and 17.9% (Fig. 6(e)) for tasks dimensions within the range 5 ÷ 15, 5 ÷ 30 and 15 ÷ 30, respectively. For smaller tasks it is not so obvious and in some cases the cylinder architecture returns better results than torus structured network. For all cases the standard rectangular grid returns the worst results. It can be observed that characteristic in Fig. 6 is very similar to Fig. 7. The reason is the close relation between the external fragmentation and the processing time. The greater average number of unused nodes, the more time take the processing of the same tasks. Because of this fact the two mentioned diagrams can be compared with other parameters together. Tasks with the smallest surface area (5 ÷ 15) × (5 ÷ 15) are characterized by the least external fragmentation (Fig. 7(a) and 7(b)). It is quite obvious due to smaller tasks can fit better. As expected, torus mesh usually achieves the best results, since there is no reject area and it occurs that tasks, which cannot be allocated in rectangular mesh, fit when it is wrapped. Similar observations are for the cylinder mesh. Following Fig. 7, task scheduling according to processing times always gives less external fragmentation than in the random queue. Moreover, it can be noticed that for FF is better to schedule tasks by time in ascending order, while for SBA(T) better results are obtained when tasks are scheduled in descending order. However, the greater task surface area it is better for SBA(T) to schedule tasks in ascending order. For example in the case of grid size 50 × 50 and torus architecture, fragmentation for scheduling in ascending order in comparison to descending is higher, see in Fig. 7(a) (task sizes 75

5 รท 15), lower in Fig. 7(c) (task sizes 5 รท 30) and lower in Fig. 7(e) (task sizes 15 รท 30). The reason of that is higher probability of good collocation for smaller tasks. When scheduling in descending order first configuration lasts for a long time and it may have much influence to the average final results. The case is different for scheduling according to the same parameter in ascending order. Considering FF algorithm, scheduling tasks by their dimensions often makes the fragmentation greater and consequently gives longer processing time (Fig. 7(b), 7(c), 7(e)). Scheduling in ascending order is even worse. At the beginning of such simulation tasks fit very well, but later there are only the tasks with the biggest surface area and the gaps between them cannot be filled. Better choice is to schedule tasks in descending order. Then next tasks in the queue are smaller or of the same size, thus the probability that it fits is greater. Completely different conclusions can be drawn from the observation of the average latency factor (Fig. 8). The best results are achieved when tasks are scheduled in ascending order of processing time or of the surface area. In the first case the average latency value for SBAT is lower by 34% and in the second case it is lower by 27% (in comparison to random queue). Allocation of tasks according the shortest tasks first and leaving the longest jobs at the end shortens the average waiting time. The same effect is gained when smaller tasks, which fit better, are placed earlier. The average is counted for many tasks with short latency and for few tasks with greater value. Scheduling of tasks according to considered parameters in descending order provides worst results, since it extends the task waiting time for processing. In such cases the latency value is greater by 30% for scheduling according to time and by 19% for dimensions (considering SBAT algorithm for 100 ร 100 grid). Better results for the latency value and selected architecture are always achieved with the use of SBA(T) than FF. Slight improvements can be noticed for cylinder and torus architectures, i.e., the more wrapped edges, the lower value of latency.

CONCLUSIONS

The best results are obtained for SBA(T) algorithm. Although it occurs that allocation time is longer than for FF, better results may be gained for such parameters as processing time, external fragmentation and latency. In most cases the best results are provided for torus architecture. Algorithms are more complex and allocation time is longer, however better mesh properties are obtained. Torus architecture achieves the best results for the tasks with greater surface area. Because of wrapping of tasks on the edges. Although there is no reject area in torus, more coverage areas have to be taken into consideration. 76

random

time asc.

time desc.

dim. asc.

dim. desc.

random

time asc.

time desc.

dim. asc.

dim. desc.

2400

7000

2200

t [ticks]

6500

6000

2000

1800

1600

5500

1400 5000 RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(a) Grid size 50×50; tasks size (5÷15)×(5÷15)

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(b) Grid size 100×100; tasks size (5÷15)×(5÷15) 4

x 10

random

time asc.

time desc.

dim. asc.

random

dim. desc.

3.2

time asc.

time desc.

dim. asc.

dim. desc.

7.2

3.1 7

6.8

2.9

t [ticks]

2.8

6.6

2.7 6.4 2.6 6.2 2.5

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(d) Grid size 100×100; tasks size(5÷30)×(5÷30)

x 10

random

time asc.

time desc.

dim. asc.

1.8

dim. desc.

x 10

random

time asc.

time desc.

dim. asc.

dim. desc.

1.75 1.7

7.5 t [ticks]

t [ticks]

1.65

1.6 1.55

6.5

1.5 1.45

6 RCFF rect. RCFF cylinder RCFF torus

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(e) Grid size 50×50; tasks size (15÷30)×(15÷30) (f) Grid size 100×100; tasks size (15÷30)×(15÷30) Fig. 6. Processing time.

random

time asc.

time desc.

dim. asc.

dim. desc.

30 28

random

time asc.

time desc.

dim. asc.

dim. desc.

26 40

[%]

35 30

18 25

16 14

12 15 RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

RCFF rect. RCFF cylinder RCFF torus

(a) Grid size 50×50; tasks size (5÷15)×(5÷15) 40

SBA rect. SBAT cylinder SBAT torus

(b) Grid size 100×100; tasks size (5÷15)×(5÷15) 33

random

time asc.

time desc.

dim. asc.

dim. desc.

random

time asc.

time desc.

dim. asc.

dim. desc.

31 36 30 34 [%]

[%]

29 32

28 30

24 RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(d) Grid size 100×100; tasks size (5÷30)×(5÷30) 38

random

time asc.

time desc.

dim. asc.

random

dim. desc.

time asc.

time desc.

dim. asc.

dim. desc.

42 34

[%]

40 38

36 30 34 28

32 30 RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(e) Grid size 50×50; tasks size (15÷30)×(15÷30) (f) Grid size 100×100; tasks size (15÷30)×(15÷30) Fig. 7. External fragmentation.

random

time asc.

time desc.

dim. asc.

random

650

dim. desc.

time asc.

time desc.

dim. asc.

dim. desc.

600 3500

550 500

t [ticks]

3000

2500

450 400 350 300

2000

250 200

1500 RCFF rect. RCFF cylinder RCFF torus

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(a) Grid size 50×50; tasks size (5÷15)×(5÷15) 5

2.4

SBA rect. SBAT cylinder SBAT torus

(b) Grid size 100×100; tasks size (5÷15)×(5÷15) 4

x 10

5.5 random

time asc.

time desc.

dim. asc.

x 10

random

dim. desc.

2.2

time asc.

time desc.

dim. asc.

dim. desc.

4.5

1.8 t [ticks]

t [ticks]

4 1.6 1.4

3.5 3

1.2

2.5

0.8 RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

RCFF rect. RCFF cylinder RCFF torus

(d) Grid size 100×100; tasks size (5÷30)×(5÷30) 4

x 10

x 10 4.5

random

time asc.

time desc.

dim. asc.

dim. desc.

random

time asc.

time desc.

dim. asc.

dim. desc.

3.5

10 t [ticks]

t [ticks]

SBA rect. SBAT cylinder SBAT torus

2.5 2

9 8 7

1.5

5 0.5

4 RCFF rect. RCFF cylinder RCFF torus

RCFF rect. RCFF cylinder RCFF torus

SBA rect. SBAT cylinder SBAT torus

(e) Grid size 50×50; tasks size (15÷30)×(15÷30) (f) Grid size 100×100; tasks size (15÷30)×(15÷30) Fig. 8. Latency.

For smaller tasks the results for torus architecture are not always better than for others, thus such network is more profitable for tasks requiring more nodes. The application of task scheduling algorithms can significantly improve the performance of task processing in a mesh.

REFERENCES [1] ZYDEK D. and SELVARAJ H., Fast and efficient processor allocation algorithm for torusbased chip multiprocessors. Computers and Electrical Engineering, vol. 37, 2011, pp. 91105. [2] HAŁACZKIEWICZ M., Creation and implementation of static task allocation algorithms in mesh networks. Wrocław University of Technology, MSc thesis, 2009. [3] ZHU Y., Efficient Processor Allocation Strategies for Mesh- Connected Parallel Computers. Journal of Parallel and Distributed Computing, vol. 16, 1992, pp. 328337. [4] FAN W., CHING-CHI H. and LI-PING CH., Processor Allocation in the Mesh Multiprocessors Using the Leapfrog Method. IEEE Transactions on Parallel and Distributed Systems, vol. 14, 2003, pp. 276-289. ´ [5] KOSZAŁKA L., KUBIAK M., and POZNIAK-KOSZAŁKA I., Comparison of SBA Family Task Allocation Algorithms for Mesh Structured Networks. Lecture Notes in Computer Science, vol. 4331, 2006, pp. 21-30. [6] YOO B.S. and DAS CH.R., A fast and efficient processor allocation scheme for meshconnected multicomputers. IEEE Transactions on Computers, vol. 51, 2002, pp. 46-60. [7] ALMOMANI R. and ABABNEH I. Communication Overhead in Non-Contiguous Processor Allocation Policies for 3D Mesh-Connected Multicomputers. The International Arab Journal of Information Technology, vol. 9, 2012, pp. 133-141. [8] ABABNEH I., Availability-based noncontiguous processor allocation policies for 2D mesh-connected multicomputers. Journal of Systems and Software, vol. 81, 2008, pp. 10811092. [9] ABABNEH I., An efficient free-list submesh allocation scheme for two-dimensional meshconnected multicomputers. Journal of Systems and Software, vol. 79, 2006, pp. 1168-1179. [10] DALLY W.J., Performance analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, vol. 39, 1990, pp. 775-785. [11] ZYDEK D. and SELVARAJ H., Processor allocation problem for NoC-based chip multiprocessors. Proceedings of sixth international conference on information technology: new generations (ITNG 2009), 2009, pp. 96-101.

Computer Systems Engineering 2012 Keywords: heuristics, non-linear system, bilinear model, PID, Fuzzy Logic, FARX, linearisation

Lukasz GADEK∗ Keith J. BURNHAM† Iwona POZNIAK-KOSZALKA‡

CONTROLLING A NON-LINEAR SYSTEM - BILINEAR APPROACH VS INNOVATIVE HEURISTIC LINEARISATION METHOD

The article introduces a comparative study of two approaches on handling an unknown non-linear (NL) system. The first approach considers a bilinear estimation procedure and the implementation of Bilinear Proportional Integral Derivative (BPID) controller. The second approach utilizes an innovative algorithm, namely Heuristic Linearisation (HL) and the Fuzzy Logic weighted PID (FL PID). The concept consist of a simple but efficient structure of multiple linear sub models weighted with respect to displaced points of interest. The comparison of two schemes is performed with respect to an arbitrary NL system simulated with Matlab.

INTRODUCTION

The classical control theory is dated to beginning of the twentieth century. It is based on a simplified understanding of object dynamics. The approach utilized Laplace and Ztransforms established in the end of eighteenth century by Pierre-Simon Laplace. During the second half of the twentieth century along with development of measurement devices and increased demand for accuracy interest on a NL control theory had risen. Despite the majority of plants could be approximated with linear models, the better result was obtained when physically based NL had been introduced to the model. This observation led to development of the NL theory. Due to the complexity, it is utilized only when a high performance is required. A popular belief in control society is that ’all practical ∗

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland; Control theory and Application Centre, Coventry University, UK, email: gadekl@uni.coventry.ac.ul † Control theory and Application Centre, Coventry University, UK ‡ Department of Systems and Computer Networks, Wrocław University of Technology, Poland

systems are non linearâ&#x20AC;&#x2122; - as quoted from [1]. However non-linearity can be prudently approximated by a linear model with a certain decay of accuracy. In practical industrial applications, the most common controller is classical ProportionalIntegral-Derivative (PID). Despite it was established at the beginning of nineteenth century, it is still popular and frequently applied, e.g. as in [2] due to simplicity and robustness. The approach to describe object with three actions: proportional, inertial and damping is common in different disciplines, e.g. resistor-capacitor-inductor in electronic or mass-spring-damper in mechanics. A general idea is understandable and explainable to majority of engineers in industry, hence it is preferred over more complex NL representation. However, PID is linear. As a linear controller, PID combined to a NL plant is not efficient in all operational regions. Set of tuning parameters optimal for one region may not be efficient (or even unstable) for the other region. There exist a variety of well established model based linear controllers which can be more efficient than the classical PID, e.g.: - Self Tuning Controllers self-adjusting in terms of a reference model to compensate long term wearing-off effects as defined in [3], - Minimum Variance controllers minimizing deviation (to optimal value) of output with a high cost of and often unrealistic control action ([4], - Predictive Controllers based on an unconstrained linear model (an ubiquitous approach due to simplicity) which aim is to minimize given cost function with respect to predicted behaviour of the system based on a model [5]. Some cases of NLs can be approximated with a bilinear model [6] which is transferable for a linear controller. Although, common phenomena such as: saturation, thermal dependencies, delay, dead-zone or hysteresis are still to be represented only with more sophisticated models as: NARX, Wiener or Hammerstein [7]. The common approach to NL systems in industry, basing on a rule of thumb, is to use empirically developed switching logic (as clipping edges of operating region) or introduce an array of controllers designed for specific regions. To obtain the array a piecewise linear models presented in [8] as Fuzzy ARX (FARX) must be employed. Idea of piecewise linearity and switching logic is a base for the HL approach. This paper contains a comparative analysis of two approaches to handle a NL system: HL with Fuzzy Logic PID and HL with a bilinear model. Plain BPID is used as a reference. Sections are organised resemblant to respective steps in handling of an unknown system. In Section 2 a description of various models types, their specific behaviours and identifiability is explained. Presentation and type assessment of the arbitrary system is 82

conducted. The parameter estimation with respect to the bilinear model and the array of linear models with HL is described in Section 3. Control efficiency and observations is in Section 4. Summary followed by conclusion is presented in Section 5.

MODEL STRUCTURE

An essential step in the system handling is to assess a type of mathematical representation should be used. Different structures and controllers must be used for linear (or weak NLs) and NL systems. Linear model is a basic approximation of a plant and is followed by simple controller and estimation methods as described in Section 2.1. NL system class is a wider category consisting of models such as Wiener, Hammerstein or bilinear as in Section 2.2, hence selection of the NL model is not a trivial task. The system utilized in studies is described and assessed in Section 2.3 2.1.

LINEAR STRUCTURE

Linear model class is consisting of resemblant structures. The most popular one is a Auto-Regressive with Exogenous Output (ARX) model, i.e. current output is correlated with past outputs and external input. The ARX is applicable in a case of white-noise (uncorrelated, zero-mean, Gaussian) which is assumed in majority of practical systems. Model in (1) is a discrete ARX which is a base for multiple linear structure or FARX. a0 yk = −a1 yk−1 − · · · − an yk−n + b0 uk−1 + · · · + bm uk−m + ek yk = a0 − A(q −1 ) yk + B(q −1 )uk−1 + ek

(1)

where: u is input, y is output, e is white noise, a0 is 1 A and B are respectively vectors of ai and bi parameters An unknown plant is linear (or close to linear) when following occur: - System fulfil superposition rule, - Steady state gain is constant for a feasible operating range, - Transient properties (rise time, settling time, damping) are constant, - Sinusoidal output is obtained from sinusoidal input. In practice, presented conditions are fulfilled in a limited operating area, i.e. in a neighbourhood of the point of interest (feasible area of sub-model). Therefore, a multiple 83

Fig. 1. Fuzzy Logic base weight calculation in FARX. The x can be defined as current state of the plant, i.e. coordinates of operating based on input, output or both.

linear model (FARX) is established as an ensemble of models weighted with respect to the distance from the i-th point of interest as in (2). A(q −1 ) = w1 A1 (q −1 ) + w2 A2 (q −1 ) + ... + wz Az (q −1 ) B(q

−1

) = w1 B1 (q

−1

) + w2 B2 (q

−1

) + ... + wz Bz (q

−1

(2)

)

The most convenient approach for calculating wi is to use a single sub-model only or to utilize weighted parameters A(q −1 ) and B(q −1 ) of two neighbour sub-models with respect to the current operating point. Weights calculation can be represented using Fuzzy Logic as in Figure 1. 2.2.

NON-LINEAR MODELS

Two most frequently used NL models are Hammerstein (3) and Wiener (4) in which NLs are respectively input or output based. yk = 1 − A(q −1 ) yk + f (u) + ek (3) ∗ −1 −1 ∗ (4) yk = 1 − A(q ) yk + B(q )uk−1 + ek yk = f (y ∗ ) As it can be concluded model properties of (3) and (4) are highly dependant on the NL function f (.), i.e. phenomena such as saturation, dead zone of input or varying transient properties can be modelled with appropriate selection of f (.). The disadvantage of NL modelling is requirement of a priori knowledge to determine type of NL function f (.).

Bilinear approach is capable of modelling NLs, yet it can be interpreted using linear methodology. The structure is consisting of two separable parts: linear and NL multiplication of input and output as in (5). X yk = 1 − A(q −1 ) yk + B(q −1 )uk−1 + ηi,j q −i yk−1 q −j uk−1 (5) i,j=0

As it can be deduced, bilinear structure is more specific than (3) or (4) as it does not utilize f notation. Hence, it does not require as much a priori knowledge, yet a bilinear part η must be defined. The convention is to use η in form of: - Lower triangular matrix, - Upper triangular matrix - Diagonal matrix (which produce the least quantity of parameters).

Fig. 2. Steady state characteristic match comparison between an exemplary NL plant (solid line), FARX (dashed line on the left) model and bilinear model (dashed line on the right).

Exemplary steady state gains provided with FARX, NL and bilinear model are presented in Figure 2. It can be observed that perfect match is not achieved as FARX model is not able to obtain smooth gain and bilinear bending ability is limited, yet both models are providing a sufficient approximation.

Fig. 3. The studied system output (circles) for sinusoidal excitation (line). Distortion from sinusoidal shape is emphasized.

2.3.

SYSTEM ANALYSIS

The main assumption of the studies is that plant properties are unknown, as in the majority of practical cases. The only information is obtained due to a response for step and sinusoidal excitation. The sinusoidal response in Figure 3 is suggesting that NLs in the plant are insignificant, as the output is moderately distorted sinusoid - NL impact is proportional to the distortion.

PARAMETER ESTIMATION

In this section parameter estimation of FARX (Section 3.1) and bilinear (Section 3.2) model is performed. Both are conducted with respect to the Least Squares (LS) method utilizing a criterion in (6) where ym is estimated model output. JLS =

N 1 X [y(i) − ym (i)]2 N

(6)

i=0

From (6) an algorithm is derived as presented in [9]. The result in (7) is an explicit ˆ Measured output is y. X LS estimation formula for calculation of model parameters θ. 86

is an observation matrix in which row is consisting of past states (inputs and outputs for (1)). The explicit X can be driven from substitution of (8) with model - (1), (3), (4), (5) etc. θˆ = (X T X)−1 X T y

(7)

y = Xθ

(8)

The main issue in (7) is a knowledge on a systems structure, i.e. definition of elements in rows of the observation matrix. Parameters sets θˆ obtained in this section are utilised in control task in Section 4.

Fig. 4. HL method- the correlation between genotype, parameter estimation and resulting model. 3.1.

HEURISTIC LINEARISATION

The main issue with FARX is to establish optimal borders of sub-models, i.e. to distribute points of interest for linearisation process. It can be described as an optimisation problem with goal function defined as a sum of (6) for respective sub-models. Solution plane is most likely to be highly NL. Therefore, non-deterministic algorithm must be employed in the selection of interest points. In this article, a Genetic Algorithm (GA) implemented as in [10] is used in the HL. The genotype is defined as a set of borders between arouse encircling points of interest. An example is presented in Figure 4. 87

Multiple structure is presumed to consist of second order linear sub-models with offset C, i.e. parameters vector is θ = [A(q −1 ) B(q −1 ) C] where both A and B are of two elements. The best achieved result is as following: - Three sub-models are used to achieve satisfactory performance, - First model when y ∈ [0 − 16] with θˆ1 = [−1.65 0.74 0.43 − 0.30 − 0.04], - Second model when y ∈ [16 − 32] with θˆ2 = [−1.54 0.68 0.66 − 0.20 − 2.69], - Third model when y ∈ [32 − 50] with θˆ3 = [−1.24 0.48 0.81 0.14 − 7.64]. FARX model error with respect to (6) is parametrized as 0.1113 and is significantly lower than for single ARX (JLS = 29.1850). 3.2.

BILINEAR STRUCTURE

In this section the parameter estimation of a second order bilinear model with a single element bilinearity and comparison of the model efficiency is performed. Due to bilinear structure in (5) observation matrix row is constructed as in (9) Xk = [−yk−1

− yk−2

uk−1

uk−2

uk−1 yk−1 ]

(9)

As a result model with following properties is obtained: - Parameter set θˆ = [−1.71 0.78 0.47 − 0.39 0.002] where the last is η, - Model output mean square error (6) higher than in the case of HL and equal to 0.5781 for the same data set, - Steady state gain match inferior to HL model as in Figure 5.

Fig. 5. Comparison of steady state gains for: actual plant, ARX model, FARX model (obtained with HL) and bilinear model.

The FARX model is more accurate as it is more flexible, i.e. each sub-model can be represented by different transient and steady-state properties. In the studied case the superiority of FARX is mostly due to a better fit of steady state gain.

CONTROLLERS

In this section a basic description of applicable controllers with respect to the both models is included. At the end of each sub-section an evaluation of a control with respect to mean square error criterion (10) and control effort (11) is performed.

JM SE

N 1 X = [y(i) − r(i)]2 N

(10)

i=0

JCE =

N −1 1 X [u(i + i) − u(i)]2 N −1 i=0

(11)

FARX model is utilised to obtain an array of PID controller as shown in Section 4.1. Bilinear model is used to design appropriate compensation with a BPID as presented in Section 4.2. Both controllers are tested in a closed loop including the same step-wise request.

Fig. 6. The concept of FL PID. The operating point is defined by current input, output or both. Weights in the FL module are: 1 when the operating point is in-bounds of specific sub-model feasible area, 0 otherwise and 1 â&#x2C6;&#x2019; 0 when the operating point is in a border of two sub-models. 4.1.

FL PID

In this section a description of the array of PID controllers, a simple model based tuning and result of the application of FL PID in the referenced system is presented. The general idea of FL PID is to: - Generate the control array consisting of the quantity of controllers according to FARX sub-models, - Adjust each PID with respect to according sub-model (model based tuning), - Utilize FL to merge controllers output with respect to the operating point. The schematic of FL PID is presented in Figure 6. Tuning of a single PID is based on a simplified Ziegler-Nichols method. To obtain a zero steady-state error with stand-alone proportional part gain Kp must be an inverse 90

of system gain K as in (12). Such tuning will result with zero offset in steady state - if impact of dynamics is negligible the a simple P controller with gain as in (12) can be used. Kp =

1 K

(12)

In the genuine Ziegler-Nichols I and D gains (Ki and Kd ) are tuned with respect to Tu which is a period of oscillation Tu for critically damped system [11]. In the simplified method gains are as in (13) and (14) where C is T2u . Ki = Kp C Kp Kd = 4C

(13) (14)

Obtaining of C is often not feasible for a discrete system, e.g. when critically damped system’s poles are free of complex part. Hence, an empirical approach to selection of C is assumed. Table 1. Tuning parameters of FL PID for the researched system.

corresponding with first sub-model second sub-model third sub-model

K 1.4445 3.2857 2.7917

C 0.35 0.64 0.35

Kp 0.6923 0.3043 0.3582

Ki 0.2423 0.1948 0.1254

Kd 0.4945 0.1189 0.2559

The tuning in Table 1 is obtained basing on the parameter estimation from Section 3.1 and is used to control the NL system. As a result stable response is obtained JM SE = 0.1494 and JCV = 0.4304. 4.2.

BPID

In this section BPID is used to control the NL system. The principle of BPID is to ’compensate’ bilinear part of the model (5) and then apply linear PID. Figure 7 presents the concept of compensation, i.e. linearisation of the plant by reducing bilinear part. 1 Basing on [12] transfer function of the compensator for the considered system is 1+η - where η is a model parameter. The PID tuning is conducted using the same approach as in Section 4.1. From the model K = 1.1429 and empirically C = 0.873, hence PID parameters are Kp = 0.8730, Ki = 0.2619 and Kd = 0.7275. When control is applied a stable response is obtained with following performance criteria: JM SE = 0.1120 and JCV = 1.2648. 91

Fig. 7. The concept of the bilinear compensator. As the linear controller (PID) is designated for linear systems only, the NL system must be linearised to be controlled appropriately.

SUMMARY

In Section 5.1 a comparison of results obtained in Section 3 and 4 is presented. Basing on the comparison a conclusion with proposition for future works is drawn in Section 5.2. 5.1.

COMPARISON OF RESULT

An assessment of HL approach by a comparison to the reference bilinear method is conducted in this section. Despite criteria are not providing full overview on the performance, some conclusion can be drawn. Table 2 is presenting comparison criteria for two approaches of handling the presented NL system. Table 2. Criteria for respective steps in handling the studied system.

approach HL (FARX) bilinear

parameter estimation JLS no of parameters 0.1113 5x3 = 15 0.5781 5

closed-loop control JM SE JCV comment 0.1494 0.4304 three PIDs 0.1120 1.2648 compensator

From Table 2 following observations can be drawn: - HL estimation is more accurate than bilinear yet it requires more parameters,

- Bilinear control is resulting with a marginally closer to optimal response, - FL PID is superior with respect to the control cost. Basing on further experiments, performance equal or better in terms of JM SE can be achieved with the HL approach if a quantity of sub-models would be increased. On the other hand, high quantity of sub-models is introducing high complexity to the FARX structure. 5.2.

CONCLUSION

The HL approach can be consider as a first step to obtain a generalised algorithm for handling an unknown system. Due to the comparative studies, it is shown that a high performance of control and high match of model is obtained with the induced approach, yet a relatively simple principle is utilized. To obtain a robust method, definition of a current operating point, i.e. description of a sub-model space, should be improved. With an appropriate selection of rules (states) to define sub model or controller bounds more sophisticated NLs can be handled, e.g. hysteresis. Basing on the satisfactory result of the presented research and a simplicity of the idea, a high potential in the industrial application of the FARX and HL approach can be presumed.

REFERENCES [1] ATHERTON, D. P., An Introduction to Nonlinearity in Control System. Ventus Publishing, 2011. [2] ASTROM, K.J. and HAGGLUND, T., Revisiting the Ziegler-Nichols step response method for PID control. Journal of Process Control, vol. 14, 2004, pp. 635 - 650. [3] BOBAL, V. and BOHM, J. and FESSEL, J. and MACHACEK, J., Digital Self-tuning Controllers: Algorithms, Implementation and Applications. Advanced Textbooks in Control and Signal Processing, Springer, 2005. [4] BURNHAM, K. J. and LARKOWSKI, T., Self-Tuning and Adaptive Control. Wroclaw University of Technology, Wroclaw, 2011. [5] MAYNE, D.Q. and SERON, M.M.and RAKOVIC S.V., Robust model predictive control of constrained linear systems with bounded disturbances. Automatica, vol. 41, 2005, pages 219 - 224. [6] LARKOWSKI, T. and BURNHAM, K. J., System Identification, Parameter Estimation and Filtering. Wroclaw University of Technology, Wroclaw,2011. [7] JANCZAK, A., Identification of Nonlinear Systems Using Neural Networks and Polynomial Models: A Block-Oriented Approach. Springer, 2004.

[8] NAKAMORI, Y. and SUZUKI, K. and YAMANAKA, T., Model predictive control of nonlinear processes by multi-model approach. International Conference on Industrial Electronics, Control and Instrumentation, 1991. Proceedings , vol. 3, pages 1902-1907. [9] LJUNG, L., System Identification - Theory for the User. Prentice Hall, New Jersey, 1999. [10] HROMKOVIC, J, Algorithmics for Hard Problems. Springer, Berlin, 2010. [11] HANG, C.C. and ASTROM, K.J. and HO, W.K., Refinements of the Ziegler-Nichols tuning formula. Control Theory and Applications, IEE Proceedings, vol. 138, pages 111-118, 1991. [12] MARTINEAU, S. and BURNHAM, K. J. and MINIHAN, J. A. and MARCROFT, S. and ANDREWS, G. and HEELEY, A. Application of a bilinear PID compensator to an industrial furnace. Proceedings of the 15th IFAC World Congress,vol. 15, 2002.

Computer Systems Engineering 2012

Keywords: WAN, CFA problem, TCFA problem, network design, optimization

Róża GOŚCIEŃ*

NEW APROACHES FOR CFA AND TCFA PROBLEMS IN WAN

This paper presents results of investigation focused on modelling computer networks – a very important issue affecting nowadays Wide Area Networks (WANs). Modelling computer networks includes two main groups of problems: optimization of existing topologies and new topology design. In this paper we formulated Capacity and Flow Assignment (CFA), Topology Capacity and Flow Assignment (TCFA) problems for WAN and proposed novel heuristic algorithms: CFA Bottom-Up and TCFA Top-Down to solve them. Moreover, we presented findings of computational experiments, carried out to compare proposed algorithms with other CFA/TCFA algorithms (both exact and heuristic) and also to determine dependences between processing time and dimensions of abovementioned problems.

1. INTRODUCTION The most important thing about Wide Area Networks (WANs) is its necessity to connect Local Area Networks (LANs), Personal Computers (PCs) or terminals over large distance and to fulfil user’s requirements. Evidently, abovementioned tasks are connected with some costs. Inevitabilities of fulfillment these requirements and cost savings make modelling WANs so important and increasingly popular nowadays [4,8]. Modelling WANs includes two main groups of problems: optimization of existing topologies, which are not efficient enough, and new topologies design. These problems can be divided into three main groups: FA (Flow Assignment), CFA (Capacity and Flow Assignment), TCFA (Topology, Capacity and Flow Assignment) [4,7]. __________ * Department of Systems and Computer Networks, Wroclaw University of Technology, Poland, e-mail: roza.goscien@student.pwr.wroc.pl

In this work we formulated CFA and TCFA problems and proposed heuristic algorithms to solve them. Moreover, we presented findings of computational experiments, carried out to compare proposed algorithms with other methods (both exact and heuristic) and to determine dependences between processing time and dimensions of considered optimization problems. The paper is organized as follows. In section 2 we formulated model of problems considered in paper. Next section presents all compared algorithms. Findings from computational experiments are presented in section 4. The last section 5 concludes whole work. 2. PROBLEM FORMULATION In this section, we formally defined models of considered optimization problems. 2.1 MODEL OF COMPUTER NETWORK

Model of computer network can be created using graph theory. Graph’s vertices V={v1, v2, ..., vn} correspond to network’s nodes and graph’s edges E={e1, e2, ..., em} correspond to links. There are n nodes and m links in the network. Numbers c(e) assigned to edges determine links’ capacities (in e.g. bits per second) [3,8]. In modelling tasks, we used directed graphs. Note, that link e=(x,y) ∈ E is connected with two nodes: source node x=e- and destination(termination) node y=e+. In computer memory, model can be saved as a weighing matrix N or adjacency matrix N’ and vector of capacity c. Weighting matrix N=[nij] nxn contains capacities of all existing links. nij is equal to 0 if link between nodes i and j does not exist. Otherwise, there is a capacity value of this link. Adjacency matrix N’=[nij] nxn informs only whether specific link exists. n’ij is equal to 0 if link between nodes i and j does not exist. Otherwise, there is a value 1. Links’ capacities are saved sequentially in vector c. There is an example of a computer network model depicted in Fig. 1.

 0 8 N = 0  0

0 1 N'=  0  0

0 0 4  0 4 0  , 0 0 16   12 16 0

0 0 0 1

0 1 0 1

1 0  , 1  0

c = [4 8 4 16 12 16] Fig. 1. An example of computer network model

2.2 MULTICOMMODITY FLOWS

Multicommodity flow in WAN describes average traffic, including routing rules for all traffic demands. Flow commodity is a set of packets with the same i-th source and j-th destination node. Let rij denotes average packet rate transmitted from node i to node j and R=[rij] nxn is a demand matrix [2,5]. Note, that k-th flow commodity is connected with a pair of nodes: source sk and termination tk. Value rk= rij, where i=sk and j=tk, is known as value of k-th commodity. Mathematically, multicommodity flow is a function fk:E→R+ᴗ0; k=1,2,...,q, which assigns to networks links e ∈ E values fk(e) [b/s] satisfying constraints (1) – (3) [6,7]: for v ∈ V and k=1,2, ..., q:

∑

f ( e ) − ∑e − = v k

e+ =v

 rk , v = s k  f (e) =  − rk , v = t k 0, v ≠ s ∧ v ≠ t k k  k

(1)

for e ∈ E and k=1,2, ..., q:

f k (e) ≥ 0 ,

(2)

f (e) = ∑k =1 f k (e) ≤ c(e) , q

(3)

Value f(e) [b/s] is known as an entire link’s flow. Abovementioned formulation is based on node-link notation of the flow [5,8]. We considered bifurcated flows [5,8]. 2.3 OPTIMIZATION CRITERIA

We used two different dependences in optimization tasks: cost of network [2] and average packet delay [2,3]. Cost of network is a sum of costs leasing all network’s links with specified capacities. Mathematically, cost of network is described by formula (4). This goal function was used only as an additional constraint in CFA problems.

d (e) = ∑e∈E k (e, c (e))

(4)

where k(e,c(e)) is a cost of leasing link e with capacity c(e). Main goal function in considered problems was average packet delay – nonlinear function of flow, described by: T (f ) =

where γ =

∑

e∈E

f (e ) c (e ) − f (e )

(5)

r is average packet rate transmitted in the network per second.

k =1 k

2.4 FORMULATION OF OPTIMIZATION PROBLEMS

Problems CFA and TCFA are NP-complete [4,6]. We assumed following formulations of these problems[3]. CFA Constants: network topology (number of nodes, number and locations of links), sets of candidate links’ capacities and corresponding costs of leasing for all existing links, the maximum acceptable cost of network – costMax, the average packet size in the network – packetSize, demand matrix R.

Minimize: average packet delay. Subject to: links’ capacities, multicommodity flow. TCFA: Constants: number of nodes in the network – n, sets of candidate capacities and corresponding costs of leasing for all possible links between n nodes, the maximum acceptable cost of network – costMax, the average packet size in the network – packetSize, demand matrix R. Minimize: average packet delay. Subject to: network topology, links’ capacities, multicommodity flow. 3. ALGORITHMS In this section, we discuss examined algorithms for optimization problems. 3.1 CFA METHODS

We examined three different CFA algorithms: exact based on complete search, heuristic proposed in [1,3] – in this paper called CFA Top-Down, authorial, heuristic: CFA Bottom-Up. The idea of Bottom-Up Algorithm is based on starting analyzing topologies from the cheapest one and gradually increasing links’ capacities until the first feasible solution is found or costMax constraint is exceeded. Choice of links for increasing capacity is made using criterion δ(e) formulated as follows: δ (e) =

q max − c ( e ) + σ (e ) q max

(6)

where qmax is maximum demand’s value from matrix R, σ(e) = 1 for links, which are connected to flow commodities sources/destinations and σ(e) = 0, otherwise. 3.2 TCFA METHODS

We also examined three different TCFA algorithms: exact: TCFA complete search (TCFA cs), heuristic: TCFA modified-topology (TCFA mt) [3], authorial, heuristic: TCFA Top-Down. TCFA cs and TCFA mt can run with one of previously discussed CFA algorithms. The premise to create TCFA Top- Down was the fact that existence of many different links in the network may give a possibility to allocate demands to different paths, and thereby achieve small value of average packet delay. This method starts solving problem with fully adjacency matrix as a potential, final topology and uses modifiedCFA algorithm. ModifiedCFA returns final solution if there is a final solution or solution which exceeding constrain costMax of a smallest value or no solution when there is no solution at all for this input data, regardless of costMax. In next step returned solution is used to calculate criterion ρ (e) . Link with the smallest value of criterion is candidate to be removed from network. ρ (e) =

f (e) c (e) − f (e)

(7)

4. EXPERIMENTS AND DISCUSSION The aim of computational experiments was to: determine dependences between processing time and dimensions of input data for CFA and TCFA algorithms (measurement of processing time for topologies with the same properties between input data except one parameter (e.g. n, m, q) which was increasing during experiment), compare heuristic CFA/TCFA methods according to processing time, number of returned optimal/feasible solutions, compare all CFA/TCFA algorithms (both exact and heuristic) and specify the range of applicability for them. Input data for algorithms were sets of random, solvable topologies, regardless of costMax. All presented results are averages of 4 – 10 measurements. 100

4.1 CFA

According to findings of CFA experiments we can say that processing time of CFA algorithms increases along with increasing dimensions of input data m, q, n. This dependence is nonlinear. Furthermore, processing time of exact CFA algorithm increases also with increasing number of candidate capacities for all links – dependence is nonlinear, while processing time of CFA heuristics is approximately steady in function of number of possible capacities for all links. Dependence between processing time and costMax is very interesting issue, because it is connected with structure of implemented algorithms. Figures 3 and 4 present results for topologies with 4 nodes, 6 links, 3 flow commodities and 2 possible capacities for all links.

Fig. 3. Processing time of exact CFA method as a function of costMax

According to results presented in figures 3 and 4: processing time of all algorithms is short and approximately steady, until the first limit value of costMax – v1 (about 460[€] in figures) is reached, after first limit value (cost of the cheapest topology, which can be created using input data), processing time of exact algorithm increases with increasing costMax, until the second limit value v2 (about 660[€] in figures - cost of the most expensive topology) is reached, for first limit value we can observe sharp increase of processing time of heuristic algorithms, to the maximum value. After that, processing time of these methods decreases with increasing costMax until it becomes close to the second limit value v2,

101

after reaching second limit value, processing time is approximately steady as a function of costMax (for all methods).

Fig. 4. Processing time of CFA heuristic methods as a function of costMax

What is also important to notice, for the same sets of input data, processing time of Top-Down is shorter that processing time of Bottom-Up, which is a repercussion of both algorithms ideas. Table 1 presents comparison of CFA heuristic algorithms. Exact CFA algorithm was used as a reference. Number of returned optimal solutions by heuristic algorithms was less than 20%. Results are better in terms of number of returned feasible solutions. This time our Bottom-Up reached result close to 95%. Table 1. Comparison of CFA algorithms

NUMBER OF RETURNED

CFA TOP-DOWN

CFA BOTTOM-UP

17%

11%

71%

95%

OPTIMAL

SOLUTIONS

NUMBER OF RETURNED FEASIBLE SOLUTIONS

When optimal solution is a crucial issue, then the exact method has to be used. When some deviation from optimal solution is allowed but there is restrictive time constraint, Top-Down should be used due to higher number of returned optimal solutions (compared to Bottom-Up). Bottom-Up is a good tool for initial experiments, because of its acceptable processing time and high number of returned feasible solutions.

102

To sum up overall comparison of CFA methods, it is important to notice once again, that processing time of exact algorithm is always longer than processing time of heuristic algorithm for the same set of input data. Moreover, exact method is not always able to solve problem in reasonable time. This is the main reason, why heuristic algorithms are so necessary. 4.2 TCFA

Findings of TCFA experiments verified that all previously presented dependences (in CFA section) between processing time and dimensions of input data are also true for TCFA problems. What is very important to notice and remember, processing time of TCFA algorithms depends on used CFA algorithm. Experiments showed that TCFA algorithms based on CFA Top-Down are the fastest algorithms of all. Methods, which use CFA Bottom-Up are second best and solve problems a bit longer. The slowest group of algorithms are algorithms based on exact CFA. There is also authorial TCFA TopDown in this group. All TCFA methods were also compared according to number of returned optimal/feasible solutions. Results for proposed TCFA algorithm and methods based on proposed CFA algorithm are presented in table 2. The connection of TCFA complete search and exact CFA was used as a reference in comparison. Table 2. Comparison of TCFA algorithms TCFA CS & CFA BOTTOM-UP

TCFA MT & CFA BOTTOM-UP

TCFA TOP-DOWN

14%

100%

71%

86%

NUMBER OF RETURNED OPTIMAL SOLUTIONS

NUMBER OF RETURNED FEASIBLE SOLUTIONS

Results for number of returned optimal solutions were not satisfying for proposed methods - less than 15%. If optimal solution is quite important, TCFA Top-Down is not an appropriate tool. TCFA heuristic method and proposed CFA Bottom-Up can achieve better results. Considered methods were good according to number of returned feasible solutions, especially connection of TCFA complete search and CFA Bottom-Up. If feasible solution is desirable to be found in short time, all algorithms can be used.

103

Proposed TCFA Top-Down, compared to other algorithms, is a solution with high number of returned feasible solutions and acceptable processing time, but the biggest disadvantage of this method is relatively low number of returned optimal solutions. TCFA Top-Down is a good tool when crucial issue is to find any feasible solution in short time. 5. CONCLUSIONS AND PERSPECTIVE In the summary of all experiments it is important to emphasize, that CFA and TCFA optimization problems are tasks with high computational and memory complexity (especially when the goal function is nonlinear), even for relatively small networks. Moreover, complexity of these problems increases with increasing number of dimensions of input data. Growth of problem complexity is connected with greater demands for memory and time, which are necessary to find optimal solution by exact algorithms. Because of time and technical constraints, there is a necessity to find ways to solve optimization tasks using fewer resources. This is the main reason and purpose of inventing heuristic methods – algorithms which can find feasible solution using fewer resources (e.g. time, memory) than exact method. Selection of suitable algorithm is a compromise between accuracy of solution and required resources to solve problem. Depending on specified requirements, another algorithm may be optimal tool. To choose the best one, there should be considered at least factors like: computer/technical equipment, time constraint, allowed deviations from optimal solution. Algorithms proposed in this paper, are good heuristic methods with short (CFA Bottom-Up) and acceptable (TCFA Top-Down) processing time, especially in comparison with exact methods. Their biggest advantage is high number of returned feasible solutions. These features make proposed algorithms very good choice for initial studies, where there is no possibility to achieve results using exact methods (time constraint) and optimality of result is not very important – the most important issue is to reach any feasible solution in relatively short time. There are several interesting issues that might be considered in the future work with problems discussed in this paper. The most important include extension experiments with other exact/heuristic algorithms or with path-link notation of flow, studies with grater topologies (number of nodes ≥ 6), detailed analysis of computational complexity of algorithms and memory usage.

104

REFERENCES [1] GERLA, M., KLEINROCK, L. On the Topological Design of Distributed Computer Networks, IEEE Trans. Commun., Vol. COM-25, 1977, pp. 48-60 [2] GOLA, M., KASPRZAK, A. Topology design problem with combined cost criterion and time varying traffic in wild area networks, Scientific computation, applied mathematics and simulations. 17-th IMACS World Congress, 2005 [3] KASPRZAK, A. Rozlegle sieci komputerowe z komutacja pakietow, chapter 12, Wroclaw University of Technology, Wroclaw, 1999 [4] MIKSA, T., KOSZALKA, L., KASPRZAK, A. Comparison of heuristic methods applied to optimization of computer networks, Proceedings of the 11-th International Conference On Networks, 2012 [5] PIORO, M., MEDHI, D. Routing, flow and capacity design in communication and computer networks, Morgan Kaufman Publishers, San Francisco, 2004 [6] WALKOWIAK, K. Heuristic algorithms for assignment of non-bifurcated multicommodity flows, Proceedings Advanced Simulations Of Systems Asis, Sv Hostyn, Czech Republic, 2003 [7] WALKOWIAK, K. Ant algorithm for flow assignment in connection-oriented networks, Int. J. Appl. Math. Comput. Sci. 2, 2005 [8] WALKOWIAK, K. Modelling and optimization of computer networks, Chapters 3 â&#x20AC;&#x201C; 6, Wroclaw University of Technology, 2011

105

Computer Systems Engineering 2012

Keywords: line planning problem, public transport, simulated annealing

Małgorzata MICHAŁUSZKO* Dawid POLAK Piotr RATAJCZAK Iwona POŹNIAK-KOSZAŁKA*

LINE PLANNING PROBLEM – SOLVING USING EVOLUTIONARY APPROACH

The line planning problem is significant for strategic planning of public transport. In this paper, a new algorithm designing the line system is proposed, which consists of phases. At first the deterministic algorithm builds the solution using the information about the start points and the destinations of the passengers travel. Next the parameters of the algorithm from the first part are chosen, by means of simulated annealing, to obtain the best value of criterion function. The performance of the proposed algorithm is compared to performance of random search and simulated annealing.

1. INTRODUCTION The line planning problem is a strategic planning process in public transport (see [3, 4]). The passengers, which can be seen as a customers, which require from the transportation system to get them to their destination directly, cheaply, through the shortest path, etc. On the other hand the provider want to have the greatest possible profit that means to lower the cost of the line system and increase the number of passengers traveling or the cost of ticket. The good system should balance the needs of both sites. The line plan system can be generally divided in the system, which concentrate on the consumer needs, usually with fixed cost of the system and those, which reduce the cost of the system. The objectives of the customers can be different. Some passengers prefer to travel along the shortest path even with transfers while __________ *

Department of Systems and Computer Networks, Wrocław University of Technology, Poland.

106

other prefer longer but direct travel. According to this some models are proposed (see [3]), where the objective is to minimize the time of travel but also the number of transfers. Such approach was used in algorithm proposed in [2], where the algorithm starts from empty solution and in every iteration the lines, which maximize direct travel, are added till the moment when budget constraint is reached or all passengers could travel appropriately. In this paper, we assume that the places of the bus stops and connections between them are know. The solution of line planning problem is the pool of lines with frequencies of every line. The frequencies are one number connected with the mean frequency on the line. The changing of the frequency depend on the time of the day and exact times of departure is the part of time table scheduling which is omitted in this paper. 2. PROBLEM FORMULATION In this paper, following [1], the objective is the minimization of the cost (understood as length of lines multiply by its frequency), the total time of travel of passengers and the total number of transfers. It is assumed that the maximum number of transfers for passengers can be set. The passengers, which have to change the busses more times than set threshold, are added to passengers which node are outside the line system. The transfers also can have some fixed cost added to the criterion function. This cost of the transfer can increase, when the number of transfers increase to chose the line system with smaller number of transfers for each passenger. This option do not block the possibility of many transfers but its value increase the value of criterion function. Beside minimization of three criteria simultaneously, we also analyse the problems, where only one of them is minimized as the main criterion. 3. LINE PLANNING ALGORITHM The proposed algorithm optimizes the criterion value taking into consideration the preferences of the majority of passengers. The input of the algorithm is a network that means the bus stop locations and connections between them, and also a destination matrix, which contains numbers of passengers, which travel from bus stops index as a row to the bus stop index as a column. The process of finding the approximation of the line system, which is the best for majority of passengers is given as follows. At first the network is ranked, where every edge (a connection between bus stops) gets the rank, which is integer value. This rank is the number of passengers, whose shortest path, from start to end point, goes through that edge. Next the line is built, where 107

the first step is to finding the edge with largest rank. Then to the both sides of a line the edges with the largest rank among the neighbors are attached. The building of a line is finished when all the neighbors of the tails of the lane have rank below some threshold or the maximum number of vertices in the line is reached. Finally, ranks of edges used in the latest build line are decreased by dividing them, by fixed value. If the maximum number of line is not reach the algorithm goes to the previous step and build next line. The algorithm has also an add additional parameter, which is related to the value of rank, which is too small to add the edge with it to the line. If this threshold value is too small the first line can be long and containing the edge with very various rank value. The line contains the edges with both big and small ranks is not preferable, because on the part with large value of the rank the frequency of the bus should be big and the same big frequency on the edge with low rank is an unnecessary cost. Too small threshold values also cause difficulties. When the rank decreases (in iterations) the line built with the same threshold becomes very short. The solution for this is adding a new parameter. The value of this parameter will be used to decrease the threshold after each iteration. The parameters of the algorithm allow tuning it to the instance of the problem. However because of great number of them, choosing their values is difficult and not comfortable. To avoid this, the simulated annealing (SA) was proposed to set value of the parameters. Initial values of parameters are set randomly. The solution is evaluated using criterion function. In each iteration value of randomly chosen parameter is changed. Adding the simulated annealing makes the algorithm easier to use, but it increases the execution time. To improve it some data structures are used. The procedures that are executed many times in the algorithm are increasing or decreasing the rank of edge, finding edge with highest rank and checking the rank of edge. To make this faster the edges ranks are stored in matrix indexed with vertex on the edge ends. If the edge rank is greater than zero reference to it is stored in the sorted list. The frequencies of routes for both algorithms are counted accordingly to the maximum number of passengers who will be travel through the part of the route. That means that similarly like when the graph was ranked, for every passenger his plan of travel is counted. For every edge of every route, that passenger travel with; the value of the frequency of this edge is increased. The frequency of whole route is the max frequency of its edges. Furthermore, the algorithm is forced to include more nodes to the line system. It is important because there are no constraints that all bus stops have to be added to the line system. The added part is the number of passengers from node, which is outside the line system, multiple by some fixed value. 108

4. NUMERICAL EXPERIMENT In this section, the efficiency of the proposed algorithm is evaluated in reference to random search and simulated annealing during numerical experiments on the basis of a graph presented in Fig. 1. First part of the experiment is to choose the best values of parameters of simulated annealing for both algorithms. The algorithms are tested with function of temperature reduction new_temperature=temperature路alpha. The checked values of alpha are 0.75, 0.90,0.99 and the temperature values are 1000, 1000000, 1000000000. To check the flexibility of the algorithm four different parameters of the criterion function are used. The criterion function which minimizes only cost of the track, second which minimizes the total distance, third minimizes the total number of transfers and the last which takes into considerations all criteria. Every test for every criterion function is repeated 5 times and the time is set to 5 minutes. The last test is long term one. It is performed just once and the time of them is 30 minutes. The results are be compared to the results of random algorithm. The random algorithm randomly chooses number of routes to build and length of each route. Then for each route it chooses the first node and then randomly chooses further. After every execution of the algorithm the value of criterion function but also the cost of the line system, total number of transfers and total length of travel are counted. For the test the artificial graph is used. The graph has 39 vertices connected with 55 edges. The destination matrix is done in such way that the most of passengers want to travel to center of the graph but some of them also travel to the other randomly chosen direction. The parameters of algorithms were chosen empirically such that the best results were obtained for the initial temperature of simulated annealing equals 1000000 and temperature decreasing parameter set to 0.99. At first some preliminary analysis is done for the values of parameters of the algorithm. They are tested according to the sensibility to overlapping lines, the long line, adding new lines on low frequency routes and removing the important line. The best results gives the criterion function, in which cost is three times more important than total length of travel and total number of transfers, and the weight of number of passengers, which could not travel with the line system is set to huge value, which force algorithm to built system which consists all nodes. Such criterion function increase its value significantly, when the double route or the route which consist the low frequency edges are added, however it still increase value when the route, which is frequent is removed. Furthermore, the proposed algorithm searches only part of the solution space. To check if it is a good approximation its effect was compared to the results provided by

109

simulated annealing only. That searching starts from the best solution from hundred solutions generated randomly. Then the solution is modified on one randomly chosen way. The more significant change as adding or removing the route is two times less probable than such change as adding the node to route, removing the route, both in the end on the route and also inside it.

Fig. 1. The graph used for tests. Numbers represents weight of each edge.

7.1. MINIMIZING COST

In this part, the criterion minimized by the algorithms is the cost of the line system. The results concerning the analysed algorithms are show in Fig. 2 and Fig. 3. It can be seen that the proposed algorithm provides better results than random search and simulated annealing. However, the total length of travels of all passengers is approximately the same for simulated annealing and the new algorithm. On the other hand, the number of transfers is significantly larger for simulated annealing.

110

Fig. 2. The mean values of the cost of line system obtained by the new algorithm, SA and random search (left) and the new algorithm and SA (right)

Fig. 3. The mean values of total length of travels of passengers (left) and the mean values of total number of transfers (right) obtained by the new algorithm, SA and RS when the cost of line system was minimize.

7.2. MINIMIZING TOTAL LENGTH OF TRAVEL

In this experiment, the algorithms minimize the total length of travel for all passenger. The value of such criterion function is similar for proposed algorithm and simulated annealing. These results do not change when the time of experiment is longer: 30 minutes. The results of the numerical experiments are shown in Fig. 4 and Fig. 5.

111

Fig. 4. The mean values of criterion function, which take into account only total length of travels, obtained by the new algorithm, simulated annealing and random solver.

Fig. 5. The mean values of cost of the system (left) and the mean values of total number of transfers (right) obtained by the new algorithm, SA and RS when the total length of travels was minimized.

7.3. MINIMIZING TOTAL NUMBER OF TRANSFERS

In this experiment, the criterion function is the total number of transfers. The results are shown in Fig. 6 and Fig. 7. The total lengths of the travel for algorithm are almost equal for all solutions obtained by algorithms but cost of the line system is the best for the solution obtained by the new algorithm.

112

Fig. 6. The mean values of the total number of transfers obtained by the new algorithm, simulated annealing and random solver.

Fig. 7. The mean values of cost of the system (left) and the mean values of total length of travels (right) obtained by the new algorithm, SA and random solver, when the total number of transfers was minimized.

7.4. MINIMIZING MULTI CRITERIA FUNCTION

In this experiment, the multicriteria function is analysed, i.e., minimization of the cost of the system, the total length of travels and the total number of transfers. The results for the considered algorithms are given in Fig. 8 and Fig. 9.

113

Fig. 8. The mean values of criterion function obtained by the new algorithm, simulated annealing and random solver (left) and the new algorithm and simulated annealing (right)

Fig. 9. Mean values of cost of the system (left), mean values of total length of travels (in the middle) and mean values of total number of transfers (right) obtained by the new algorithm, SA and RS

8. CONCLUSIONS Even if the criterion function do not require minimizing the number of transfers, cost or distance the results obtained by proposed in this work algorithm are good. It might mean that the part of solution space searched by an algorithm is correct. It gives better or almost the same results as simulated annealing and usually stops in the same result. However it is not working well in problem when just one line has to be built and it should contain as much nodes as possible. The proposed algorithm is not as flexible as simulated annealing, however in most cases it gives better results and the variance of the result is much smaller than for simulated annealing. The advantage of the algorithm is also the fact that the routes are build according to the know rules, not completely randomly. In our further work the algorithm will be further analysed and improved. 114

REFERENCES [1] BORNDÖRFER R., GRÖTSCHEL M. and PFETSCH M. E., A Column – Generation approach to line planning in public transport. Transportation Science, vol. 41, 2007, pp. 123-132. [2] SCHOLL S., Customer-oriented line planning. PhD thesis, Universitat Gottingen, 2005. [3] SCHÖBEL A., SCHOLL S., Line planning with minimal traveling time. Proceedings of 5th Workshop on Algorithmic Methods and Models of Optimization of Railways, Palma de Mallorca, Spain, 2005. [4] BORNDÖRFER R., Discrete optimization in public transportation. 2009, http://www.opus4.kobv.de/opus4-matheon/files/619/6462_ZR_08_56.pdf [access: 2012-0805].

115

Computer Systems Engineering 2012 Keywords: modelling, utility–service provision, transformations, sustainable development

Anna STRZELECKA∗ Piotr SKWORCOW∗ Bogumil ULANICKI∗

MODELLING OF UTILITY–SERVICE PROVISION FOR SUSTAINABLE COMMUNITIES

Utility–service provision is designed to satisfy basic human needs. The main objective of the research is to investigate mathematical methods for evaluating the feasibility of using a more efficient approach for utility services provision, compared to the current diversity of utility products delivered to households. Possibilities include reducing the number of delivered utility products, on-site recycling and use of locally available natural resources that will lead to more sustainable solutions. The core of that approach is the simulation system that will carry out scientific and technological feasibility study of transformation graph which include both direct transformations and indirect transformations of the utility products into defined services.

1. INTRODUCTION Utility–service provision is designed to satisfy basic human needs, such as: adequate level of hygiene, adequate quality and quantity of drinking water, etc. For many years scientists and engineers have been working on improvements of ways to deliver utility products to households, as well as removing unnecessary and/or unwanted products. Today each utility product such as water, gas, electricity, etc. is delivered to end-users via separate infrastructure, see [17]. This leads to problems not only with installation, managing or maintenance, but also raises questions, such as: which option is the best to heat a house, is there any cheaper solution for waste removal, are there any more sustainable or more environmentally friendly solutions, etc. On the other hand, the utility companies are looking for new solutions to reduce the cost and improve efficiency of providing services to customers. Water Software Systems, De Montfort University, Leicester, United Kingdom, e-mail: anna.strzelecka@email.dmu.ac.uk

116

The main objective of the research reported in this paper is to investigate mathematical methods for evaluating the feasibility of alternative approaches to utilityâ&#x20AC;&#x201C;service provision. Possibilities for such alternative approaches include reducing the number of delivered utility products, on-site recycling and use of locally available natural resources. In this approach households can be treated as input-output systems as presented in Fig. 1. The potential for recycling the waste products are indicated by green arrows. Red arrows in Fig. 1 show utility products that cannot be recycled or used in any way and therefore have to be removed from the system. Additionally some utility products can be acquired from the local resources, e.g. water from rain, air or ground; energy/electricity from sun, wind or ground. Furthermore, some products can be replaced by other products, for instance gas can be replaced by electricity, heat can be provided by gas or electricity. Hence, the complexity of the utility infrastructure could be significantly reduced, but this depends on local conditions and available natural resources.

Fig. 1. Conceptualisation of a household, [17]

The core of the proposed approach is the simulation system that enables carrying out feasibility study of utility service provision scenarios, considering both direct transformations (e.g., cooking with gas or cooking with electricity) and indirect transformations (e.g., grey water from baths and showers can be recycled and used in dishwashers or washing machines) of the utility products into defined services. 117

The purpose of the developed simulation system is to help with the decision making process when designing alternative approaches to utility services provision. The simulation system is composed of the following blocks: an interface to define service-provision problem, an interface to define candidate solutions (transformation graphs), a computational engine to analyse the feasibility of solutions and a common XML database. Both interfaces and the computational engine are developed in C# and .NET 3.5, while the XML database is implemented using eXist-db, an open source native XML database system, see [19]. The purpose of the XML database is to store information about all devices, utility products and services, which can be used to define utility–service provision problems and candidate solutions (so-called transformation graphs) using corresponding interfaces. Utility–service provision problems and candidate solutions are defined using XML format and can also be stored in the database. At the current state of development the system enables manual definition of a potential solution to a problem, in form of a transformation graph, followed by simulation and evaluation of feasibility of a solution. Being a part of the All-in-One project [18], this research considers also futuristic scenarios; this is reflected by the fact that some devices in the database are under development and not yet available. The project is looking 100 years in the future. There are several questions that the researches are trying to answer to, e.g. is it possible to satisfy all needs by delivering just one utility product to a household, or delivering required products via one infrastructure? Is it possible to achieve this vision by 2111? What are the gaps in science and technology that prevent achieving this vision? How the utility-service provision will look in 100 years’ time (see [4])?

2. MODELLING UTILITY–SERVICE PROVISION Utility–service provision is designed to satisfy basic human needs. In order to develop simulation system that will enable the feasibility analysis of a proposed solution preliminary literature review was conducted in several fields. 2.1.

DEFINITIONS

In this subsection some basic definitions used in this paper are introduced, which at the same time form elements of the proposed approach to model utility service provision. A fundamental need is a need that is necessary for an individual to live a healthy life. Fundamental needs are distinguished from wants. According to Dean, see [6], we can all recognise that there are things in life that we might want that we do not need and things that we might need that we do not want. It is sometimes suggested that needs

118

are absolute, while wants are relative. Fundamental needs remain the same at all times and are uninfluenced by cultural changes, see [7]. What is changing is the way in which these needs are satisfied. In this research only the basic fundamental needs are taken into consideration because they can be satisfied by provision of utility products. Thus, we investigate the following needs: access to transportation, adequate level of personal hygiene, adequate quantity and quality of drinking water, clean and safe environment, clothes, entertainment/leisure, sexual activity, adequate level of comfort, adequate nutritional food, physical activity, physical security, provision of adequate sanitation, rest and regeneration, social communication and interaction [16]. A secondary need is derived from fundamental need (e.g. adequate nutritional food can be split into two secondary needs: hot food and cold food). In contrast to fundamental needs, they may change in time or vary for different cultures, [7]. However, not all fundamental needs can be split into secondary needs. In the context of this research, utility services and products provided by one or several utilities satisfy directly some but not necessarily all secondary needs. The fundamental needs are satisfied by the provision of utility products and utility services indirectly. A product is a substance that is delivered by utility to end users (utility product), produced locally through transformations (by-product) or harvested locally from natural resources (products from nature). By utility products we understand electricity, water, gas, etc. By-products include clean water from recycling, solid waste than can be processed (e.g. in solid waste burners), greywater from showers or washing machines, etc. Electricity obtained from solar irradiation or wind, water harvested from rain are examples of products from nature. The products are necessary to satisfy human needs, but some of them can be used to replace the others (i.e. water can be harvested from rain and recycling and thus reducing the need to deliver utility product drinking water). Service is a process of satisfying a secondary need, e.g. supply of drinking water, nutrition, partial body cleaning, etc. A device is an appliance that uses technologies to transforms one or many products into another products (e.g. three-blade wind turbine transforms wind energy into electrical power, or diesel generator that transforms diesel fuel to electrical energy) and/or into services (e.g. kitchen tap transforms utility product drinking water into a service drinking water or electric space heater transforms utility product electricity into a service thermal comfort - heating). A device can have more than one transformation defined, for example device kitchen tap can transform utility product drinking water into service drinking water, or can transform utility product drinking water into service washing dishes and product greywater see Fig. 2. A scenario consist of a set of requirements and constraints. At the current state of

119

Fig. 2. Example of alternative inputs/outputs configurations for the same device

development it includes required services (with number of units of each service) and year for which particular scenario is tested for. In the future other constrains will be included, e.g. availability of local natural resources (wind, waves, sunlight, etc.), availability of technologies, subset of utilities that cannot be supplied (e.g. fossil fuels), etc. Technologies are required by devices to transform one or more products into another product(s), for example device shower with electric water heater uses technologies: water pump and power generation. A transformation graph is an attempt to model utility service provision at household or community level, using the information about devices stored in the XML database, which will be discussed in the following section. In this graph each node is a device and each edge is a utility product or service carrier. 2.2. NEEDS REQUIREMENT ANALYSIS

A general review of daily activities of end-users in domestic environment helps to identify the needs that are being addressed by services provided by each utility product. The analysis was conducted in order to attempt to predict future demands as well as determine necessary factors to satisfy human needs in general. Predicting the future with a high level of certainty is inherently difficult. There are numerous aspects that can make this task almost impossible. Political and economical aspects are not taken into consideration in this research. However, weather projections are considered to be important, [21] as well as human population, [14]. Weather conditions are important when determining naturally available resources that can be used within a household. Growing population and personal preferences set a new trend towards smaller flats with fewer people occupying them, [14]. Water and energy consumption in households varies over time. Depending on the time interval, the consumption might be changing daily. Usually there is a morning peak around 8 am when people are getting ready to work, then moderate mid-day usage lasting till 4 pm, an evening and relatively small late night peak when people are coming back and subdued low night usage until 4 am, see [3]. Also, the consumption is different during the weekdays and the weekend. There are also seasonal changes, for example people are using more energy in the winter to heat up their houses, and in the summer

120

water consumption might increase due to flower and garden watering, etc. Average water consumption in the UK is estimated to about 150 litres per person per day, see [9]. Approximately 7% of that water is used for drinking and cooking. Therefore not all of the water used within a household has to be treated to potable quality. Some of non-potable water needs could be met in an alternative way, for example water from baths, showers and sinks could be recycled and reused for garden watering. Also, rainwater could be harvested and processed [9]. In 2011 domestic consumption was 26% of total UK final consumption of energy products, see [12]. This energy is used for space and water heating, cooking, lighting and electrical appliances. Household energy demand depends on many factors, e.g. space heating is highly dependent on technical factors such as the type of dwelling, its levels of insulation and the efficiency of the heating mechanism; energy use for water heating and wet appliances, such as dishwashers and washing machines, depends on technical factors such as the efficiency of the appliances, as well as lifestyle choices such as the number of times clothes are worn before being washed, or the frequency of and time taken showering [8]. In the UK, as in other affluent countries, at least one computer is found in most homes, analogue television and radio equipment is being supplemented by digital equipment, and the stocks of mobile telephones, sound systems, videos, DVDs, camcorders, answering machines, digital cameras, printers and scanners are growing rapidly, see [5]. Total household energy use is therefore complex to model, as it should take account of a wide variety of technical and lifestyle factors, see [8]. 2.3.

SUSTAINABILITY

Sustainability has no longer single or agreed meaning. The concept of sustainability and sustainable development has been associated with great variety of human activities. They are related to the use of naturally available resources, non-renewable mineral and energy resources. According to Hasna, see [10], sustainability refers to a development of all aspects of human life affecting sustenance. Sustainable development according to Allan, see [1], is the development that is likely to achieve lasting satisfaction of human needs and improvement of the quality of life under conditions that ecosystem and/or species are utilized at levels and in ways that allow them to keep renewing themselves. In 1987 the World Commission on Environment and Development introduced the most widely known definition of sustainable development: “... is a development that meets the needs of the present without compromising the ability of future generations to meet their own needs”, [2]. However, there are many others definitions in the literature, see e.g. [13] or [11]. According to the dictionary, phrase sustain means “allow to remain in a place or position or maintain a property or feature”; and “provide with nourishment”;

121

also “supply with necessities and support”, [20]. By sustainable communities we understand communities that are promoting sustainability and sustainable development. It includes communities that are new-built as well as existing ones that want to improve. Sustainable Communities Plan from 2003, see [15], defines sustainable community as places where “people want to live and work, now and in the future. They meet the diverse needs of existing and future residents, are sensitive to their environment, and contribute to a high quality of life. They are safe and inclusive, well planned, built and run, and offer equality of opportunity and good services for all”.

METHODOLOGY

At the current state of development the methodology progresses in the following steps: i). Determination of the year for which the utility-service provision problem is considered. ii). Definition of the services to be provided with number of units. iii). Definition of product(s) that can be supplied (including naturally available resources) and/or removed. iv). Selection of the devices required to solve the problem. v). Definition of the transformation graph(s) that satisfies the constraints. vi). Simulation of the previously defined graph and calculation of balances of all products and services. vii). Visualisation of the final results for human decision makers. 3.1.

XML DATABASE

The XML database is implemented within the eXist [19] environment. eXist is an open source database management system entirely built on XML technology, also called a native XML database. Unlike most relational database management systems, eXist uses XQuery, which is a World Wide Web Consortium (W3C) Recommendation, to manipulate its data. Information about products, services, devices and technologies is stored in the XML database. All of them can be stored and manipulated in the database using a purposely 122

developed software with graphical interface. The XML database is searchable and can support constraint satisfaction or objective function optimisation queries. As of July 2012 there are over 70 devices, over 30 products, over 20 services and over 70 technologies stored in the database. Each of them has some compulsory fields; for devices these are: device name, device description, input/output transformation, used technologies and maximum throughput (per hour). The maximum throughput is related to efficiency of the transformation processes, as it defines how many units of products or services can be produced or satisfied by the device. The ratio between required input and produced output is fixed (i.e. no dynamics). For products the compulsory fields are as follows: product name, product description and units. For services we have service name and service description. Finally, for technologies: technology name, technology description and year available. Most of devices currently stored in the database are existing appliances or devices currently emerging/under development. However, some of them are hypothetical devices which may emerge in the future. Therefore, each device is tagged with year of availability to enable modelling of existing, near future, and science-fiction approaches to utility services provision, see [17]. There is also place for additional optional fields, for example cost, dimensions, CO2 emission, etc., which will be used in the future analyses. 3.2.

TRANSFORMATION GRAPH

One of main aims of utility service provision is the best use of delivered utility products, by-products, and naturally available resources. Of course the most important one is to satisfy human needs by delivering appropriate services. Devices that are required to achieve that can be connected with each other giving a transformation graph. A user has to hand-pick from the database all devices that (in his opinion) will deliver required services and will transform one product into another. At this point user can decide which by-products will be recycled or re-used. Next step is to create a transformation graph in a XML file. Here the user has to specify the connections between supplies and devices and services. The developed simulation system allows to check the feasibility of that solution. First it has to connect to the database, next - previously defined transformation graph in an XML format has to be loaded following by a scenario file in the same format. The last step is checking the feasibility of that particular solution. Simulator calculates balances of all products and services; provides amount-to-be-removed and amount-to-be-supplied for each product.

123

4. RESULTS Consider the following example. We want to deliver one unit of each of the following services: full body cleaning, thermal comfort - heating, nutrition and washing dishes. The year that the scenario is tested for is 2012. In the database there are 3 devices that can deliver service full body cleaning, 6 devices that can deliver service thermal comfort - heating, 6 devices that can deliver service nutrition and one that can deliver service washing dishes. One of possible solutions is presented in Fig. 3. Natural resources are not used in this example.

Fig. 3. Transformation graph with no feedback loops

To satisfy the demand following products needs to be supplied: • drinking water - 75 L; • electricity - 52 kWh; • food - 1 kg; and following needs to be removed: • greywater - 75 L; • solid waste - 0.1 kg. All service demands (indicated in blue in Fig. 3) have been satisfied. 124

As improvement, greywater is re-used (there are two devices in the database to do so), as well as organic waste (also two devices for that purpose). Based on that information, a transformation graph was created, see Fig. 4. The feasibility of that graph was conducted using the simulator and the results are as follows. Products that need to be supplied: • drinking water - 43 L; • electrical power - 89 kWh; • food - 1 kg. By-products that are produced and reused: greywater, clean water, drinking water, and organic waste. Devices Greywater recycler, In-situ food-waste composting, and Filtration an UV water purification system were used to process greywater, organic waste and clean water. Therefore, drinking water could be recovered and reused. Compost needs to be removed from the system.

Fig. 4. Transformation graph with feedback loops

All service demands (indicated in blue in Fig. 4) have been satisfied.

125

5. FUTURE WORK The future work is focused on several aspects: • Introducing time horizon to the model; it will enable more realistic modelling of the utility–service provision. Water and energy consumption varies significantly over time and modelling just a snap shot does not reflect the complexity of the problem. • Introducing storages for products in the transformation graph; it is essential to support intermittent supplies in sustainable communities. • Representation of the entire content of the database in a form of a hypergraph. This will help to facilitate automatic database searches and will be useful for the decision making process if product(s) should be removed or if it can be transformed into another product(s).

6. CONCLUSIONS An approach to model utility–service provision has been introduced. It is designed to satisfy basic human needs. These needs were identified together with corresponding services required to satisfy them. The developed simulation system enables a feasibility study of a proposed solution. Methodology and fundamental elements of the proposed approach to model utility–service provision were presented. The XML database that stores information about devices, products, services and technologies was developed. The concept of transformation graphs was introduced; a description how the utility products can be processed to provide services to humans using the information stored in the database.

ACKNOWLEDGMENT This research is a part of and is sponsored by the Engineering and Physical Sciences Research Council (EPSRC) project “All in One: Feasibility Analysis of Supplying All Services Through One Utility Product” (EP/J005592/1).

126

REFERENCES [1] ALLEN R., How to save the world. Strategy for world conservation. Toronto: PrenticeHall, 1980. [2] BRUNDTLAND G.H., Report of the World Commission on Environment and Development: Our Common Future. United Nations, 1987. [3] BUTLER D. and MEMON F., Water demand management. London: International Water Assn, 2006. [4] CAMCI F., ULANICKI B., BOXALL J., CHITCHYAN R., VARGA L., and KARACA F., Rethinking future of utilities: supplying all services through one sustainable utility infrastructure. Environmental science & technology, vol. 46, 2012, pp. 5271â&#x20AC;&#x201C;2. [5] CROSBIE T., Household energy consumption and consumer electronics: The case of television. Energy Policy, vol. 36, no. 6, 2008, pp. 2191-2199. [6] DEAN H., Understanding human need, The Policy Press, 2010. [7] DOYAL L. and GOUGH I. A theory of human need. Basingstoke: Palgrave Macmillan, 1991. [8] DRUCKMAN A. and JACKSON T., Household energy consumption in the UK: A highly geographically and socio-economically disaggregated model. Energy Policy, vol. 36, no. 8, 2008, pp. 3177-3192. [9] GREAT BRITAIN: DEFRA, Future water: the Governments water strategy for England. Stationery Office, 2008. [10] HASNA A.M., Dimensions of sustainability. Journal of Engineering for Sustainable Development: Energy, Environment and Health, vol. 2, no. 1, 2007, pp. 47-57. [11] MORI K. and CHRISTODOULOU A., Review of sustainability indices and indicators: Towards a new City Sustainability Index (CSI). Environmental Impact Assessment Review, vol. 32, no. 1, 2012, pp. 94-106. [12] NATIONAL STATISTICS, Energy Consumption in the United Kingdom, Technical report, Department of Trade and Industry, 2002. [13] PEZZEY J., Sustainable Development Concept: An Economic Analysis. Washington, D.C.: The World Bank, 1992. [14] PALMER J. and COOPER I., Great Britains housing energy fact file 2011.Technical report, United Kingdom Department of Energy and Climate Change, 2011. [15] PRESCOTT J., The Sustainable Communities Plan. Technical Report, Office of the Deputy Prime Minister, 2003. [16] STRZELECKA A., SKWORCOW P., ULANICKI B., and JANUS T., An Approach to Utility Service Provision Modelling and Optimisation. In International Conference on Systems Engineering, 2012, pp. 191-195.

127

[17] ULANICKI B., STRZELECKA A., SKWORCOW P., and JANUS T., Developing scenarios for future utility provision. In 14th Water Distribution Systems Analysis Conference, 2012, pp. 1424â&#x20AC;&#x201C;1430 [18] All In One, http://www.allinone.uk.net/, 2012. [19] eXist-db Open Source Native XML Database, http://exist-db.org/exist/index.xml, 2012. [20] Free Online Dictionary, Thesaurus and Encyclopedia. http://www.thefreedictionary.com/, 2012. [21] UK Climate Projections, http://ukclimateprojections.defra.gov.uk/, 2012.

128

Computer Systems Engineering 2013

Keywords: distributed computing, unicast, heuristic algorithm

Grzegorz CHMAJ*

HEURISTIC ALGORITHM FOR CLIENT-SERVER DISTRIBUTED COMPUTING

The distributed processing systems contain many nodes connected into one logical structure that is able to perform the processing of given task. The task is submitted by the system operator – and from his point of view the system is inputted with a task, computing layer processes the task and the result is returned by the system. Such layer-oriented approach requires to design the operational mechanisms of the lower layers – computation and communication layers. In this paper the unicast approach is studied – the communication between nodes goes through a central node – server: the operational algorithms are defined and the experimentation results are presented.

1. INTRODUCTION The distributed processing systems are utilizing the computational power of multiple devices – to gain the significant processing power used then to compute the given task. Such systems can have various scale – from system-on-chip, through multiprocessor local system, to large scaled ones, incorporating thousands of machines spread throughout the whole world. The public distributed processing systems are the ones that are most known in the world. They are usually based on BOINC[1] software and are called public because every internet user can install the local client on his home computer and then join the selected project and contribute his local processing power. The most famous BOINC-based project is Seti@home – where radio signals captured by radio telescope are sliced into fragments and processed by the system to look for the extra-terrestrial intelligence. In this paper, the unicast approach is studied – the distributed system contains many nodes, but directs __________ *

Department of Electrical and Computer Engineering, University of Nevada Las Vegas, USA

129

its communication through the central node (server). The algorithm for such operation is presented, along with the experimentation results. 2. SYSTEM DESCRIPTION Public distributed computing system – is a structure built with many machines (called nodes) connected into one structure, which may be considered as one virtual machine with big computing power. Each node is the machine connected into the computing system through a link. Connection link has upload and download limits, node has also processing power limited. Node introduces its computing power and serves the same role as each other system’s machine. The internal structure of the system might contain some nodes of special functions though. System receives computational task, which is then computed using the computing system. This task is divided into uniform fragments called source blocks. These blocks are computed at system nodes, the result of computation we call result blocks. One source block computed – results in one result block. For the sake of simplicity – the size of each source block is the same, the same rule is true for result blocks. To compute each source block, the same amount of computing power is required. Moreover, source blocks are computationally independent – i.e. one block may be computed without any knowledge about rest of source blocks. These assumptions model rules used in popular public computing such as Seti@home or Climate Prediction. In Seti@home project: input task is the long lasting radio signal in a form of sound “file”. This signal is divided into fragments having the same length (source blocks) – thus having same computational power requirements. Each fragment is analyzed independently, result is sent back to central node. The same size of result blocks we should relate to transport system, such as popular Peer-to-Peer protocol BitTorrent. It divides the file to fragments having same size (typically 256kB) – what is the analogy to our result blocks size equality. There is logical direct connection between each two nodes. This connection is created in lower layer of the overlay network, so it is not visible to layer at which we place our model. This concept is often used in literature [2], [3] as well in daily networking – the most popular overall network the Internet. The time scale of system processing is divided into time slots, which we call iterations. Each time slot may be considered as time period of given duration expressed in seconds. Each iteration has the same length. During each iteration, nodes may transfer result blocks between them, but the information about blocks available at nodes is updated with iteration change (i.e. blocks downloaded by node v in iteration t can be sent by v to other nodes not earlier than during iteration t + 1). The length of iteration may be considered as 130

follows: if total size of data sent by node v is 256kB, and upload link speed is 128kb/s, then duration of iteration is 16s (256kB/128kb/s). The idea of using time slots in static modeling was also used in [4], [5], and many others. The system considered in this paper performs the distributed processing for the given task, however the communication between nodes is done through a special node – called server. This way this system is called “client-server distributed processing system”. 3. RELATED LITERATURE Unicast type of communication, usually referred as client-server architecture is a widely used mechanism of communication. However it introduces limits in cases when massive amounts of the same data needs to be delivered to multiple nodes. [6] studies the routing algorithm with the regard of bandwidth limitation. Authors model the unicast approach and consider both single unicast and group unicast communication. Due to unicast limitations, another communication models are studied and compared. Authors of [7] describe the allocation of the bandwidth in unicast and multicast flows. They also provide the evaluation of bandwidth available and describe the proposed policies for multicast approach. The base of many distributed processing system – BOINC – was described in [1]. Author described some details of the system, mentioned the popular projects based on this platform, and compared it to the grid computing. Multiple aspects of BOINCbased public distributed computing were evaluated in [8]. Authors included system architecture, infrastructure, applicability, security, result validation, client architecture and some others. The aspect of motivating the volunteers was described as well. [9] presents the specific application of public computing – finding protein binding sites. The work related to a project development, early stages challenges and implementation details were provided. The real-time variation of the BOINC platform was shown in [10] – such elements of real-time system as deadline timer and parameter-based admission control were added. Authors also describe extensions and their relation to original BOINC. 4. HEURISTIC ALGORITHMS In order to solve the problem stated, the following heuristic algorithm was developed. It consists of two main parts: UH1 – the allocation of source blocks, UH2 – the results distribution. The algorithms are offline – the input data is known at the time 131

the algorithm is executed. The algorithm notation uses the following elements: b = 1, 2, …, B – blocks, t = 1, 2, …, T – time slots, v, w = 1, 2, …, V – network nodes, cv – cost of block computation at node v, kwv – cost of block transfer between nodes w and v, pv –computation limit of node v, dv – download limit of node v, uv – upload limit of node v. Two binary variables are used: xbv= 1 when block b is computed at node v; 0 otherwise ybwvt= 1 when block b is transferred from node v to node w in iteration t; 0 otherwise. The algorithms are defined the following way: UH1 – source blocks’ allocation 0. Assign av blocks to each node v = 1, 2, …, V:

B  d vT when B  d vT  0 av   otherwise 1

1. If there are still not allocated blocks (v av < B), go to step 2, otherwise exit algorithm 2. For each node v = 1, 2, ..., V compute score ev using formula (2.14): ev = cv + w kvw 3. Determine maximum score among all score values: emax  max (ev ) v1, 2, ..., V

4. Compute score gap gv for each node using (2.16):

 0 when pv  av  0  av (V  1)  uvT  g v =  emax  ev otherwise   ev Put all gv values in array and sort it descending. 5. Allocate blocks to nodes: a. Point to first element of array (this element identifies node v having highest gap gv) b. Assign av blocks to node v using formula (2.17)

uvT uvT  uvT V  1  av when pv  V  1 and B  b w xbw  av  V  1  0  uT av'   pv  av when pv  v and B  b w xbw  av  pv  0 V 1  B  b w xbw otherwise   132

c. If there are still non-allocated blocks, then point to next element of gv array (identifying node v having gap gv) and go to step 5b. Otherwise finish algorithm. UH2 – result blocks’ distribution 0. Create list Lv containing all nodes and list Lb containing all result blocks. 1. Let fv denote the pointer to element on list Lv and fb denote the pointer to element on list Lb. Let lvn denote n-th element on list Lv, and lbn denote n-th element on list Lb. Set iteration t = 1. a. Set pointers to first elements of lists: fv = lv1 and fb = lb1 b. Check if node v at position fv on list Lv is able to make download (bw ybwvt < dv). If yes – then go to point 1c), otherwise go to point 1g) c. Check if block b at position fb is present on node v identified by fv. If not, go to 1d) otherwise go to 1f) d. Check if node w which satisfies condition xbw = 1 is able to make upload (bv ybwvt < uw). If yes – then go to 1e) otherwise go to 1f) e. Send block b from node w to node v (ybwvt = 1). Increase pointer fb by one (so it identifies next element on list Lb). Go to 1g). f. Increase fb by one. If block b at position fb was already considered in point 1c) for node v identified by fv, then go to 1g), otherwise go to 1c) g. Increase fv by one. If there was no block transfer since element fv was previously analyzed, then go to 1i), otherwise go to 1b) h. If every node possesses all result blocks, then exit algorithm. Otherwise go to 1i) i. Set pointers to first elements of lists: fv = lv1, fb = lb1. Increase t by one (what means switching to next iteration). If t > T (there are no iterations left), then exit algorithm, otherwise go to step 1b) Complete algorithm for unicast flow may be described in three following steps: UHA – algorithm for unicast flow. 0. Allocate blocks using algorithm UH1 1. Perform blocks’ computation 2. Distribute result blocks to nodes using algorithm UH2.

133

4. EXPERIMENTS In order to evaluate the quality of the algorithms described above, the comparison between heuristics and optimal solutions were done. The optimal solutions were produced using MIP models implemented in CPLEX optimizer, the time of CPLEX experiment was limited to 3600 seconds. Due to complicity of researched problems, optimal solution was achieved only for small networks with relatively small number of blocks and iterations. Increasing problem size results with firm growth of computation time (for optimal solutions) and memory need, thus even increasing time limit for CPLEX does not affect the quality of solutions significantly. The quality of a solution is measured using the function: F = bv xbv cv + bvwt ybwvt kwv Function F sums up all the network costs and processing costs – and models the electrical energy consumption. The researched 300 networks had the following parameters: parameter number of nodes number of iterations number of blocks

values 3 – 17 2–9 3 – 26

CPLEX returned 243 optimal solutions, for 51 networks its solution was classified as feasible, and for 6 networks there was no solution at all. We will focus on networks, for which CPLEX provided optimal solution – which we denote as UOA (Unicast Optimal Algorithm). Investigation showed, that comparing 243 optimal UOA UHA solutions with related UHA solutions, average DUOA (difference between UHA and UOA UHA UOA) was 0% (242 cases where F =F and one case where CPLEX provided optimal solution, while UHA solution did not provide any result). Due to unicast specific nature – the communication can be done only through the server – the optimization concerns the source blocks allocation (with regard to the various parameters of nodes). The experiments results show, that UHA algorithm performs very well, also the time of the execution was very low (below 1 second, while CPLEX needed much more time for larger networks). However the algorithm is not ideal – in one case UHA was unable to provide any result, while we know that the solution exists, as it was produced by CPLEX. The relation between the number of nodes V and the cost of system operation was also evaluated. As presented in Fig. 1 the cost of system operation F increases as the

134

number of nodes V increases. The relation is almost linear. This is caused by a specific property of the system presented: xbv + wt ybwvt = 1 b = 1, 2, …, B v = 1, 2, …, V

koszt całkowity.

It says, that each node needs to receive all the results produced by the system. With this requirement in mind, it is easy to explain that each node added to the system generates the additional traffic of delivering all the results to the node added.

1850000 1600000

F 1350000 1100000 850000 95

103

111

119

liczba węzłów V

V koszt całkowity

Fig. 1. Relation between V and F 6. CONCLUSIONS In this paper, the distributed processing system with client-server network approach is described, along with the operational offline algorithm. The research results show, that the algorithm performs very well in terms of the metrics defined and time required to obtain the solution. The further work concerns another communication models – peer-to-peer and anycast: the design of the algorithms and their time/cost performance compared to optimal solutions. Finally, the evaluation of properties and the cost of operation should be done to compare communication models between themselves – when applied to the same processing problem.

135

REFERENCES [1] ANDERSON D.P., BOINC: A System for Public-Resource Computing and Storage, 5th IEEE/ACM International Workshop on Grid Computing. November 8, 2004, Pittsburgh, USA. [2] BALDONI R., SCIPIONI S., TUCCI-PIERGIOVANNI S., Communication Channel Management for Maintenance of Strong Overlay Connectivity, Proceedings of 11th IEEE Symposium on Computers and Communications, 2006, pp. 63–68. [3] STEINMETZ R., WEHRLE R., (eds.), Peer-to-Peer Systems and Applications, Lecture Notes in Computer Science, Vol. 3485, 2005. [4] ARTHUR D., PANIGRAHY R., Analyzing BitTorrent and Related Peer-to-Peer Networks, In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, 2006, pp. 961–969. [5] GANESAN P., SESHADRI M., On Cooperative Content Distribution and the Price of Barter, In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05), 2005, pp. 81–90. [6] YUNXIAO ZU, YANLIN WANG, YUGENG SUN, A Study on Unicast Routing Algorithm with Network Bandwidth Constraint, The 2000 IEEE Asia-Pacific Conference on Circuits and Systems, 2000, pp. 837-840 [7] LEGOUT A., NONNENMACHER J., BIERSACK E., Bandwidth-Allocation Policies for Unicast and Multicast Flows, IEEE/ACM Transactions on Networking, Vol. 9, no. 4, pp. 464-478 [8] KORPELA E., SETI@home, BOINC, and Volunteer Distributed Computing, Annu. Rev. Earth & Planet. Sci. 40, 2010, pp. 69-87 [9] DESELL T., NEWBERG L., MAGDON-ISMAIL M., SZYMANSKI B., THOMPSON W., Finding Protein Binding Sites Using Volunteer Computing Grids, Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science Advances in Intelligent and Soft Computing Volume 144, 2012, pp 385-393 [10] YI S., JEANNOT E., KONDO D., ANDERSON D., Towards Real-Time, Volunteer Distributed Computing, 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2011, pp. 154-163

136

Computer Systems Engineering 2013 Keywords: pathfinding, Dijkstra’s algorithm, A*, continuous area

Mariusz HUDZIAK∗

COMPARING DIFFERENT APPROACHES OF FINDING AN OPTIMAL PATH IN CONTINUOUS AREAS WITH OBSTACLES

Problem of finding an optimal path has fundamental role in path planning. In many real-life problems, e.g. in game industry or in robot motion, path planning has crucial role. Many algorithms finding an optimal path are based on Dijkstra’s algorithm, which provides optimal path for given weight system. But such a solution can be inefficient in cases, where number of nodes in graph is very big. Solution to this problem may be A* algorithm which uses heuristic method to speed up searching process. In continuous path-finding problem another issue arise how to find the optimal path on continuous area. To solve this problem Quad-trees and Probabilistic Roadmap Method are implemented and effectiveness of Dijkstra’s algorithm and A* for finding path on continuous area are compared.

INTRODUCTION

Process of finding an optimal path on continuous area consist of two general phases, i.e., creating graph from continuous area and finding optimal path between start and end points. The exist polynomial time exact algorithms for this problem. On the other hand, if the area is contiguous, then creating an optimal graph is NP-hard [3]. Even if the problem of finding the best path can be solved optimally, this may not be enough. In some cases, e.g. tracking one object by another or in game world environment, path have to be computed many times. This process consume a lot of processor time, thus the efficiency is related with the created graph [6]. Solution to stated problem is needed in many areas. In game industry finding a path for individual agents is second time most consuming process after graphics rendering. Autonomous robot with limited resources e.g. fuel, moving in not fully known environment needs to efficient path. Finding the shortest path in uncertain environment is also used for Unmanned Aerial Vehicles(UAV) [5]. ∗

Department of Systems and Computer Networks, Wrocław University of Technology, Poland

137

In this paper, Dijkstraâ&#x20AC;&#x2122;s algorithm (further called also Dijkstra) [2] and A* algorithm (firstly described in [4]) are used to solve problem of finding optimal path in given graph. Furthermore, Probabilistic Roadmap Method(PRM) [3] and Quad-trees [3] are used to generation a graph in contiguous area. As an input, a map with some predefined static obstacles is used. All mentioned algorithms and experimental system were implemented by the author (M. Hudziak) as well as the experiments were designed. The rest of paper is organized as follows. In Section 2 problem statement is presented. Next methods for building a graph on continuous area and for finding optimal path on graph are given. In Section 3 the experimental system is described, together with numerical analysis of the graph generation methods and pathfinding algorithms. In the last section, conclusions are provided and future research are discussed.

PROBLEM FORMULATION

Considered problem is formulated as follows. There is given a continuous area A with obstacles O, where source and destination points are defined. The objective is to find the feasible path p(A) in A, which does not cross obstacles and the computational is minimized. Sample solution for this problem is illustrated in Fig. 1. First phase of problem of finding an optimal path on continuous area is building a graph. It is described in Section 3. Second phase is finding a path on graph and it is described in Section ??.

Fig. 1. An example path.

138

ALGORITHMS

To find a path between to points, we use Dijkstra’s algorithm and A*. Dijkstra’s algorithm is an algorithm which gives always the shortest path in a graph and A* algorithm is a modification of Dijkstra’s algorithm, which uses heuristic to lessen a number of visited nodes in the graph. The often used heuristic is Euclidean distance from source to destination. If the chosen heuristic underestimates real distance from start node to goal node A* algorithm always finds the best route. However, to apply these algorithms in a contiguous area, it has to be express as a graph. Furthermore, it is required that the graph that has minimal number of vertices. Namely, The smaller the number of vertices, then the algorithm is faster due to computations. To construct a graph, we analyse Navigation Mesh and Way-points methods (see Fig. 2). In Navigation Mesh, the area is subdivided into some regions and then centers of neighbouring regions are connected. Result of this process is applied in this paper. It is also possible to connect different parts of regions e.g. corners of regions. In Way-points method we put set of points on the area and connect them. Number of connections for each point can be limited by some value, for purpose of reducing size of a graph.

Fig. 2. Navigation Mesh and Way-points method [7].

An example of the algorithm which generates navigation mesh is Quad-trees algorithm. It subdivides the given area recursively in four regions until some minimal size is achieved. Only regions where obstacle is present are subdivided. After subdividing procedure the algorithm connects neighbouring regions centers. Result of Quad-trees algorithm can be seen in Fig. 3a. Example of Way-points method is Probabilistic Roadmap Method(PRM). Algorithm randomly spreads a set of points on given area with uniform distribution. This set of points are further connected. In our case to connect neighbouring regions the nearest 139

neighbours algorithm is used. Results of PRM work can be seen in Fig. 3b. In this paper, these two presented algorithms are used to construct a graph from a continuous area.

(a) Quad-trees algorithm.

(b) Probabilistic Roadmap Method.

Fig. 3. Quad-trees and Probabilistic Roadmap Method (PRM)

EXPERIMENTS

In this section the numerical analysis of efficiency of the considered approaches is given. But at first, the experimentation system is presented. The proposed experimentation system consists of three parts (an example view of these parts is given in Fig. 4). • Drawing – in this part we are able to build a map by putting some rectangular obstacles on the area. Rectangular obstacles were used because of simplicity such a shape is sufficient for testing efficiency of implemented algorithms. • Graph building – in this part we can use two implemented algorithms: Quad-trees or Probabilistic Roadmap Method for building a graph from continuous area. We can use area built in drawing part or previously created by loading it from file. In the left part of window we can choose algorithms parameters.

140

• Path finding – in the last part we can find a path on given graph using two implemented algorithms: A* and Dijkstra’s algorithm. In the left part of the window we can choose algorithms parameters.

(a) Drawing part.

(b) Building a graph part.

The experiments presented in the further part were done using computer with Intel Core 2 Duo T8300 2.4 GHz CPU and 4GB RAM memory. 4.1.

EXPERIMENT 1: GRAPH CONSTRUCTION

In Table 1, we can see computation time for graph generating algorithms. Quadtrees algorithm gives better results than PRM for smaller maps (Small and Medium), but worse for a Big map. The minimum size of subdividing for Quad-trees was chosen to 141

obtain good covering of the map by resulting graph. The same criterion was used in case of PRM – number of connections and vertices were set to visually cover all regions of a map. Table 1. Graph generation – computational time.

Map size Small 640x480 Medium 1024x1024 Big 4096x4086

Quad-trees 47ms 75ms 2045ms

PRM 115ms 135ms 1392ms

In Table 2 we can see the amount of vertices and edges generated by Quad-trees and PRM. Quad-trees generate big graphs with high amount of vertices and edges. The huge advantage of PRM is that number of vertices could be adjusted to give sufficient covering of the map. Graphs are smaller so computational time of algorithms searching for path will be lower. Advantage of Quad-trees algorithm over the PRM is that it always give efficient distribution of vertices, so the graph is connected (if minimal size is properly chosen). Table 2. Graph generation – vertices and edges number.

Map size Small 640x480 Medium 1024x1024 Big 4096x4086

Quad-trees(vertices, edges) 1365,3713 813,1864 4239,10719

PRM(vertices, edges) 200,599 200,600 500,2058

In Fig. 5, we can see comparison of two paths found on graphs generated by Quadtrees algorithm and PRM. The Quad-trees graph was not as efficient as the graph generated by PRM. The PRM graph gives shorter path on this area, so its performances are better than the deterministic graph generation.

142

(a) Quadtrees path.

(b) Probabilistic Roadmap Method - path.

Fig. 5. Comparison of paths for Probabilistic Roadmap Method (PRM) and Quadtrees. 4.2.

EXPERIMENT 2: PATHFINDING

In this experiment, the results of pathfinding are present. The first experiments were conducted on maps presented in Fig. 6.

143

(a) A* test map.

(b) Small map.

144

In the first experiment, the small map was used (see Fig. 6b). In Table 3 we can see results of algorithms searching for the shortest path. This map was created mainly to analyse the efficiency of A*. In such an area A* should give worse results than Dijkstra’s algorithm, due to imprecision of heuristic applied in A*. It can be seen in Table 3, where the number of vertices (steps) uncovered by A* and Dijkstra’s algorithm are very similar. Length of paths found by algorithms are compared with the optimal path length provided by Dijkstra’s algorithm and it is given in percentages. Table 3. Experiment results – small map.

Dijkstra 2.97ms 100% 822

tc Path Length steps

A* 4.28ms 100% 721

Next big map experiments are given, which reflect more realistic environment, similar to the area with lots of buildings (see Fig. 6c). In this case A* gives better results than Dijkstra’s algorithm. As can be seen in Table 4, A* visited less than half of all vertices comparing to Dijkstra’s algorithm and gives the same results. Furthermore, A* is also faster than Dijkstra’s algorithm. Table 4. Experiment results – big map.

tc Path Length steps

Dijkstra 20.79ms 100% 4241

A* 17.42ms 100% 2066

Finally, experiment map was used for testing A* algorithm and its Euclidean heuristic (see Fig. 6a). It can be seen in Table 5 that A* gives very good results. It visited only 8 vertices and found path that was as good as Dijkstra. Also working time was almost four times shorter. Table 5. Dijkstra’s algorithm and A* results for A* map.

Dijkstra A*

tc 6.10ms 1.6ms

steps 1367 8

145

path Length 100% 100%

CONCLUSION

In this paper, the continuous pathfinding problem was investigated. Probabilistic Roadmap Method(PRM) and Quad-trees algorithms for generating graph on continuous area were compared. Next, Dijkstra and A* algorithms for finding optimal path on the graph were analysed. The A* algorithm for pathfinding was the fastest in most cases. Quad-trees algorithm for building a graph was fast for small maps, but usually generates high number of graph nodes. On the other hand, Probabilistic Road-map Method (PRM) was slower on small maps than Quad-trees, but builded smaller graphs. Its main drawback was a random covering of the map. Our future research will focus on graph simplification and faster algorithms for pathfinding (for example HPA* [1]).

REFERENCES [1] BOTEA A., MULLER M. and SCHAEFFER J., Near optimal hierarchical path-finding. Department of Computer Science, University Alberta, 2004. [2] DIJKSTRA E.W., textitA note on two problems in connexion with graphs. Numerische Mathematik, Volume 1, 1959, pp. 269-271. [3] HALE D.H., A growth-based approach to the automatic generation of navigation meshes. Technical report, The University of North Carolina at Charlotte, 2011, pp. 18-38. [4] HART P.E., NILSSON N.J. and RAPHAEL B,. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, Vol. 4, 1968, pp. 100-107. [5] JUN M. and Dâ&#x20AC;&#x2122;ANDERA R., Cooperative control: models, applications and algorithms. Path Planning for Unmanned Aerial Vehicles in Uncertain and Adversarial Environments, Vol. 1, 2003, pp. 95-110. [6] TANG J.J., CHEN L., and YAN L., A heuristic pathfinding approach based on precomputing and post-adjusting strategies for online game environment. Games Innovations Conference (ICE-GIC), 2010 International IEEE Consumer Electronics Society, 2010, pp.1-8. [7] http://udn.epicgames.com/Three/AIAndNavigationHome.html, [access: 2014-01-15].

146

Computer Systems Engineering 2013 Keywords: machine learning, gesture recognition, depth sensor ∗ ´ Justyna KULINSKA

COMPARING EFFECTIVENESS OF CLASSIFIERS IN PROBLEM OF BODY GESTURE RECOGNITION

A problem of gesture recognition is widely known in the scientific literature. Two main applications of such a system are Natural User Interface (NUI) and Human Robot Interaction (HRI) - both of these issues are currently getting more popular. Previous implementations are principally based on image recognition or some additional devices like sensor gloves. Thus, in this paper performance of gesture recognition using 3D depth sensor and skeleton joint from the NiTE library is examined. The presented comparison of effectiveness of several classifiers is a basis of creating a real-time high-performance application.

INTRODUCTION

Formerly the only way to communicate with a computer was a keyboard and nowadays it is hard to imagine everyday work without a mouse. Nowadays a different approach, namely Natural User Interface (NUI), is one of development directions and one of the domains for application of gesture recognition. This also concerns gaming market which is very interested in the subject. Another application area is Human Robot Interaction (HRI) which needs gesture recognition to communicate with robots and to help them understand human behaviours. Both of those two main fields of application of gesture recognition are recently getting more popular. However, this topic may have many other applications, for example detection of two persons interaction which can be used in urban monitoring for detecting potentially dangerous interactions [19]. Certainly, future will bring more and more other applications. 1.1.

RELATED WORK

A problem of body gesture recognition can be solved in various ways and two of them can be distinguished. First - using some extra devices and second - vision-based ∗

Department of Systems and Computer Networks, Wrocław University of Technology, Poland

147

only. The first group focus mostly on hand gestures (because of natural limitation to create a device gathering information from the whole body) and in this group we could use for example special gloves or acceleration-based devices like Wiimote [12]. This kind of recognition gives relatively good result but on the other hand it is less intuitive than vision-based system and also could be more expensive because of extra equipment. Focusing on video-based identification the least costly and the most intuitive approach seems to be the use of a simple RGB camera. There are variety of papers dealing with this subject, see e.g. [7, 18] and references therein. Most of published work uses skin color to detect interesting regions and concentrates only on single hand gesture [11] or the relative position of two hands and head [3, 4, 9]. This huge ratio of number of hand-based gestures to other system is due to problematic extraction of the whole human body features using only a camera image. Some work in this subject was done by Schmidt, Fritsch and Kwolek [13] who obtained upper body shape represented as cylinders. Unfortunately, colour reliable systems are very susceptible to mistakes because of their dependency on lightening. Another thing is the fact that different people have different face colour, for instance because of race or longitude of their living place. These problems are leading to the idea of 3D vision-based recognition. There are some ways to obtain 3D information using for example stereo camera system [16] or depth sensors (like Microsoft Kinect or ASUS Xtion sensors). Another way is a relatively new solution discussed in [15]. Unfortunately, even though the authors presented many uses of depth data in gesture recognition, they didnâ&#x20AC;&#x2122;t say anything about their effectiveness or comparison between each other or comparison with color-based recognition systems. However, authors of another paper [17] who created such a system claim that its effectiveness increased in comparison to their previous colour-based work. We can find many depth recognition systems dealing only with single hand detection [5], another kind of recognition was presented by Biswas and Basu [1] who recognizes upper body pose using whole detected body image. Another way is to first obtain few body joints and then according to their position try to recognize the whole body pose. Obtaining body joints from depth data is a different subject (example work on this topic using template matching was done by Shotton [14]) but it could also be received from already implemented and generally available methods like Microsoft Kinect SDK or open-source OpenNI and NiTE libraries. There already exist several recognition systems using body joints [2, 6, 8, 10]. Because of complexity of the human gesture recognition problem, all referenced systems were based on machine learning algorithms or statistical methods for the identification task. We could divide gestures into two separate groups: dynamic gestures and static gestures. Dynamic gestures are those which need more than one camera frame

148

for recognition (for example hand waving) but static gestures (poses) do not need series of images (for instance T-pose - standing with arms outstretched to the sides). Those two groups need completely different way to handle and thus different algorithms. For dynamic gesture recognition the vast majority uses Hidden Markov model (HMM), and for static poses there are more alternatives. Typically used methods are Support Vector Machine (SVM) and Neural Networks. 1.2.

GOALS

Unlike previously mentioned survey paper [15], this paper focuses on comparing effectiveness of few classification algorithms rather than creating a fully functional application. Specificity of this problem forces us to divide it in two different parts - feature extraction and classification. Those two parts must be considered separately because of a kind of conflict of interest. On the one hand we want to use Knime workbench for classification - because of simplicity and wide range of possible options. On the other hand information about posture gives us ASUS Xtion motion sensor (presented at the Figure 1) which forces us to use API in C++ language. For this reason, the processes of features extraction and classification are completely separated.

2. 2.1.

EXPERIMENT

POSTURE INFORMATION

Xtion sensor (presented at Figure 1) is a device which provides depth information. This kind of data gives possibility to obtain many interesting information, for example human body position. OpenNI and NiTE libraries could be a simple way to get information about skeleton joints of human body. Sample detection of skeleton joints is presented in Figure 3. These libraries provide various other data like for example center of mass used in this paper. Features The aim of this work is to create system for gesture representation. Similar system was created by Monir [8], but our system uses a different way for feature representation. We propose to save the body pose as a set of features represented as vectors from center of mass to each joint in the skeleton, normalized by the torso length. This should allow to recognize the same body gesture performed by various persons. Finally, the features set contains 45 float values - 15 joints times 3 vector components.

149

Fig. 1. ASUS XtionPro depth sensor (from [20]).

2.2.

CLASSES

We want our system to recognize five different classes, this number seems to be sufficient to examine influences of different poses. Three out of five classes represent base poses, and the remaining two poses are combinations of these base poses. This has been introduced to check vulnerability to errors in the system. Three base classes are presented in Figures 4a-4c and mixed classes are presented in Figures 4d and 4e. Two of base poses includes only hands and arms and the third one includes also legs to check if this has influence at classification process. As it was mentioned before some combination of these base are used to check how this affects the effectiveness. First mixed pose has upper body from T pose and the lower body from ballerina pose, the second one oppositely - upper body from ballerina pose and lower body from T pose (also S pose). 2.3.

DATASET

Because of specificity of the task there are no already created datasets available. For this reason we have to implement our own system for creating datasets. System In this work we focus only on static gestures. Consequence of it is that only one frame of image is sufficient to get information about a single gesture. The system for obtaining datasets is a simple application, where one can see a preview from depth sensor with 150

Fig. 2. Depth image using sample application from Fig. 3. Skeleton joints using sample application from OpenNI library (from [21]) NiTE library (from [22])

detected joints drawn on it. When someone performs any gesture this information can be stored in format which was already described in paragraph 2.1. Of course the system allows to assign class type to every performed gesture. Collected data can be saved as a new dataset or can be included into one previously created. This is full functionality required to create dataset for experiments. Sample screenshot from the application is presented in Figure 5. Training dataset Of course to make proper dataset for training classifiers, proper system is not sufficient. In order to provide classifier that has good ability to generalize (i.e. is independent from a person), it is important to ensure diversity of data. Firstly, every gesture should be created independently - there is no reason to grasp the same performance twice. The important thing is to do another gesture only after moving away from the sensorâ&#x20AC;&#x2122;s view. Secondly: the more persons the better - because every person has unique body proportion. Variety of persons ease the process of teaching classifiers in order to make differences of body proportion irrelevant to the classification process. It is also known that more data is always better in machine learning. Having in mind the above discussion and because of time and resource limitations the following values were used: Training dataset - 5 different classes are used (which are presented further) and 151

(a) Sample T-pose

(b) Sample S-pose

(d) Mix of T and balle- (e) Mix of ballerina and rina pose T pose Fig. 4. Classes.

152

Fig. 5. Exemplary screenshot from system for creating datasets.

5 different people were asked to perform each gesture ten times. This gives 50 observations for each class and the total amount of 250 observations to obtain the training dataset. Test dataset - two other persons were asked to perform each gesture 5 times, that gives 25 observations for each person and the total amount of 50 observations to test effectiveness of algorithms. 2.4.

EXPERIMENT DESCRIPTION

Three classifiers are used in this paper - MPL (Multilayer Perceptron), SVM (Support Vector Machine) and kNN (k-Nearest Neighbor). The first two were chosen because of high frequency of usage in related experiments - to compare with a rather simple kNN classifier. Because SVM is a classifier which works only for two classes problems, to use it in this case an additional method One Against All is used as a way to extending SVM to multi-classes classifier. The whole experiment is divided into two parts - first one is a choice of the best parameters for each classifier and then second part is a comparison of all classifiers with previously selected parameters. Important issue in this kind of work is to determine what is meant by effectiveness. This could be for example time efficiency, which also is important matter especially while taking into account planned implementation in a real-time application. However, in this paper we focus only on correctness which means number of errors. Parameters selection As previously stated, first part of the experiment involves only tuning classifierâ&#x20AC;&#x2122;s pa153

rameters. Because of a small size of the test dataset a ten-fold cross-validation method is used for comparing the result at this stage. In each step of the cross-validation process every classifier receives exactly the same part of the whole dataset as a validation set. Moreover, to provide impact of randomness in this method every experiment is repeated ten times. This process creates 100 values for each parameters setting, where a single value indicates percentage of misclassified observations. Our aim is to decrease this value as much as it is possible. To compare different parameters of classifier the mean and variance of the output data is taken into consideration. Classifiers comparison The second stage of experiments involves comparison of classifiers with previously obtained fixed parameters. In this case the test dataset is used and obtained accuracy represents a performance rate, which is defined in Equation (1). P T P (c) A= c (1) N where T P (c) is value of true positives (properly classified samples) for classifier c and N is size of test dataset. Of course the higher is the accuracy,the higher is the performance of the classifier.

3. 3.1.

RESULT

FEATURES EXAMINATION

First experiment is related to evaluation of features. As previously stated, we proposed a novel approach to features representation. Properly defined features is an important issue because without it, any classifier cannot perform well. To examine if our proposed approach is works properly in practice, first thing we do is to compare variance of values of features in the entire dataset and in a single group. Results are presented in Table 1. Table 1. Variance of features in classes and in entire dataset. Left foot horizontal position Right foot horizontal position Mean variance in classes 0.013 0.011 Variance in dataset 0.22 0.02 Difference

0.207

0.009

From the table we can easily notice that variance in the whole dataset is significantly 154

higher for the first feature, however variances in classes are similar. This indicates that our way of features extraction works well because right foot has the same location in all classes and left foot is different in two classes. We can observe that the difference between variances in the dataset and in classes is higher in the second case, so it suggests that division of classes using the first feature should be easy. Because numbers are sometimes hard to analyze, a better way to examine the features is using graphs. Two scattered graphs of feature values in classes are presented in Figure 6.

(a) Vertical position of left foot.

(b) Horizontal position of left hand.

Fig. 6. Values of features in classes.

It can be easily noticed in Figure 6a that two classes which use left foot are distinguished from others. Similar situation is presented in Figure 6b, where different classes can be well separated. Values marked with red circle are probably result of mistake in NiTE library which switched left hand with right hand. 3.2.

PARAMETERS CHOICE

k-Nearest Neighbours The first classifier is k-Nearest Neighbour classifier which has only one parameter k and one option: to weight answers by distance or not. For both options test is performed using k values in range from 1 to 125 (which is half of the training dataset size). Results are presented in Table 2 and Figure 7. It occurs that kNN classifier gives surprisingly good results. For a non-weighted version up to 13 neighbours and for a weighted version up to 60 neighbours it gives 100% correct result. After those values both mean and variance start to increase. Support Vector Machines SVM classifier in standard version could divide two classes only linearly. To divide 155

(a) Non weighted

(b) Weighted Fig. 7. Mean and variance of kNN classifier result.

156

Table 2. Result of kNN classifier. Non weighted 2 3 5 7 9

Mean Variance

0 0

0.8 5.82

3.2 5.82

100

125

3.2 5.82

3.6 7.92

6 42.83

37.6 117.01

47.2 67.23

Weighted 5 7

Mean Variance

0 0

100

125

0 0

0.8 2.59

1.2 3.39

them in a different manner kernel functions are used. In this paper we examine three commonly used kernels, which are described further. Polynomial - (γ · X + b)p , with parameters p -power, γ and b - bias, Hyper Tangent - tanh(κ · X + δ), with parameters κ and δ, −X 2

Gaussian - e 2·σ2 , with parameter σ. Each kernel has its own parameters so it has to be tested separately. Polynomial Because of large number of parameters in this kernel it is hard to perform a test covering all possibilities. At the beginning the range of the most significant parameter power was tested. It occurs that values higher than 5 lead to situation where the classifier cannot find any boundaries. We performed tests with power equal 1, 2, 3, 4 and 5 with various values of the remaining parameters. It occurs that only value of 5 is causing changes in the result, for all other parameters the mean is around 0.5 and the variance is aounrd 1.5. Then using a fixed value of power a test was conducted using two values of γ - 1 and 5, with different values of bias. Results are presented in Table 8 and in Figure 8. 157

Table 3. Result of SVM classifier with Polynomial kernel. power 5 γ 1 bias 1 5 10 15 25 45 50 Mean Variance

0 0

0.4 1.45

5 5 1

0 0

(a) γ = 1

(b) γ = 5

Fig. 8. Mean and variance of SMV classifier result with Polynomial kernel.

It can be observed that better results are obtained for value of γ = 5, therefore in both cases we are able to obtain 100% correct results. Underlined value in the Table 3 are used as the best parameters for this kernel. These values are used to avoid situation where no partition can be found, which happened for higher values of bias. Hyper Tangent Results for this kernel are omitted because for any parameters it is not possible to obtain proper results. The best mean is 66% with high variance, so this kernel is not suitable for this problem and it is not considered further. Gaussian This kernel has only one parameter; range from 0.1 to 50 is considered because higher 158

Fig. 9. Mean and variance of SVM classifier results with Gaussian kernel.

and lower values do not give any results. The outcome is presented in Table 9 and in Figure 9. Table 4. Result of SVM classifier with Gaussian kernel. Ď&#x192; 0.1 0.2 0.3 0.5 0.7 1 1.5 Mean Variance

40 261.8

4 9.7

0.8 2.59

0.4 1.45

0 0

0.4 1.45

It occurs that also for this kernel it is possible to obtain 100% correct results. Two underlined parameter values give that result. Because each kernel divides space in a completely different manner we decided to keep both Polynomial and Gaussian kernel. For the second one we use value Ď&#x192; = 3.5 which is between 2 and 5. Multilayer Perceptron The last classifier considered in this paper is MLP classifier which is one of Neural Networks. We can use three parameters: number of hidden layers, number of neurons per layer and number of iterations. This parameters seems to be independent so they were tested separately. First parameter that is adjusted is number of layers. In the test values of iterations = 100 and neurons per layer = 10 is used. As it can be seen in Figure 10 higher number of layers is increasing the error, so in further considerations only values of 1 and 2 are used. Since the number of layers is known, next parameter to adjust is a number of neuron 159

Fig. 10. Mean and variance of MLP classifier result adjusting number of layers.

per layer. Both for 1 and 2 layers tests were conducted with 100 iterations, testing values from range 5 to 50. After the tests numbers 15 and 17 are chosen for one layer and numbers 10 and 15 for two layers. Previously obtained four configurations are tested using different number of iterations. Result are presented in Table 5. Underlined result has the smallest value of mean and variance so its parameters are chosen as the best for this classifier. 3.3.

CLASSIFIERS COMPARISON

Simple comparison As it was previously stated in this stage of experiments we used the previously obtained best values of classifier to classify values from test dataset and accuracy is a performance rate. Because of randomness in learning MLP classifier, this process for MLP classifier is repeated ten times. Results of tests are presented in Table 6. We can see that only MLP classifier makes any mistake in this case, however its effectiveness is also high. In this case we can also examine confusion matrix which is presented in Table 7. As it could be expected most of misclassifications occur for mixed gestures. In section 3.1 out-layers observations are presented. In this experiment those outlayers are removed, to check how it affects the classification process. Results are presented in Table 8. As we can see, when we remove out-layers accuracy of MLP classifier increases slightly and other classifiers remains at 100% accuracy.

160

layers neurons iterations Mean Variance

Table 5. Result of MLP classifier. 1 15 100 150 200 250 300

400

500

0.52 1.83

0.68 2.28

0.56 1.95

0.52 1.83

0.56 1.95

0.68 1.6

300

400

500

0.76 2.72

0.52 1.83

0.68 2.93

100

150

200

1 17 250

0.52 1.83

0.56 2.27

0.6 2.28

0.76 2.81

0.64 2.17

100

150

200

2 10 250

300

400

500

0.96 5.21

0.84 2.68

0.84 3

0.76 2.49

0.88 3.1

0.84 9.47

0.92 3.19

300

400

500

0.6 2.06

0.56 1.95

100

150

200

2 15 250

0.4 1.45

0.64 2.17

0.48 1.71

Table 6. Result of simple comparison of classifiers. Accuracy kNN 100% SVM Polynomial 100% SVM Gaussian 100% MLP 94.6%

B-pose T-pose TB-pose BT-pose S-pose

Table 7. Result of simple comparison of classifiers. B-pose T-pose TB-pose BT-pose S-pose 95 0 0 4 0 0 99 0 0 1 4 0 96 0 0 11 0 0 89 0 0 6 0 0 94

161

No Winner 1 0 0 0 0

Table 8. Results of comparison without out-layers. Accuracy kNN 100% SVM Polynomial 100% SVM Gaussian 100% MLP 98%

3.4.

ADDITIONAL TESTS

Because of surprisingly high effectiveness of all classifiers, an additional test was conducted to check how it influences the accuracy. Most of further tests bases on training dataset with out-layers. Small training dataset First additional test used a smaller training dataset, which is composed of only observations of a single person. Results are presented in Table 9. Table 9. Result of comparison without out-layers. Accuracy kNN 100% SVM Polynomial 98% SVM Gaussian 100% MLP 60.6%

In this case the accuracy of two of the classifiers remained the same, SVM with Polynomial kernel works only slightly worse but MLP classifier has great decrease of accuracy. ‘Messy’ test dataset In this case test set is composed from a ‘messy’ set of gestures. Results are presented in Table 10. Table 10. Result of comparison with ‘messy’ test dataset. Accuracy kNN 100% SVM Polynomial 96% SVM Gaussian 96% MLP 83.6%

162

Only kNN classifier maintained its exactness, both versions of SVM classifiers decreased performance slightly and MLP has significant decrease of effectiveness. Noises in training dataset This test uses training set with several swapped labels. Results are in Table 11. Table 11. Result of comparison with noises in training dataset. Accuracy kNN 100% SVM Polynomial SVM Gaussian 96% MLP 68.2%

We can observe that this change, does not influence kNN or SVM classifiers with Gaussian kernel. Unfortunately for such training set second SVM classifier does not find any result. We can also see that MLP classifier is sensitive to noise as well. Decreased number of features Last additional test is different because it is not intended to decrease but to increase the performance. Classes used in this paper in fact could be varied based only on position of hands and feet. In this test number of features is decreased and there are only 12 features remaining - two hands and two feet. Another important information is that in this case training dataset does not contains out-layers. Results are presented in Table 12. Table 12. Result of comparison with decreased number of features. Accuracy kNN 100% SVM Polynomial 100 SVM Gaussian 100% MLP 99.6%

Our assumption occurs true, decreasing redundant features causes increase in performance of MLP classifier.

CONCLUSION

After all tests we can outline some conclusion. The most important is the fact that every classifier performed surprisingly well. If we take into consideration the case of 163

training dataset without out-layers and with decreased number of features, we can say that almost exact result were obtained for all classifiers. It may be due to a good method of features representation, but it should be further investigated using more complicated and similar gestures. It can be generalized that k-Nearest Neighbour classifier is the best solution considering only the accuracy. Support Vector Machines classifier is a better choice than Multilayer Perceptron because it is less sensitive to noise, non-ideal performances and training dataset size. We can also conclude that in all cases Gaussian kernels have higher or equal effectiveness than Polynomial. Small number of observations in the training dataset proved not to be a problem for both kNN and SVM classifiers. On the other hand it influences the effectiveness of MLP, so maybe bigger training dataset would give 100% of accuracy also for this classifier. Another important issue is the number of features. In case of gestures presented in this work, it is sufficient to use only hands and feet. Decreasing the number of features increases effectiveness of MLP classifier, so we can assume that we should keep as small number of features as it is possible. Unfortunately it depends on classes, but some joints like for instance neck, will probably not be used in any gesture, so it could be abandoned.

FUTURE WORK

Knowing the performance of different classifiers we need to consider their practicality. The best results are obtained with kNN classifier, but it requires most memory. In this case we can examine if similar Nearest Mean classifier would not be enough, requiring less memory. Second best result gives SVM classifier but it is designed only for two classes problem, and using additional strategy OAA, increase computational time, so it needs to be checked if it is acceptable for real-time application. Considering real-time application another issue is problem of lack of pose which should also be detected in some way and we need to take it into consideration. Therefore the next step will be testing the classifiers in real-time.

REFERENCES [1] K. K. Biswas and S.K. Basu. Gesture recognition using microsoft kinect. In Automation, Robotics and Applications (ICARA), 2011 5th International Conference on, pages 100103, 2011.

164

[2] Junxia Gu, Xiaoqing Ding, Shengjin Wang, and Y. Wu. Action and gait recognition from recovered 3-d human joints. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(4):10211033, 2010. [3] Yu Huang, T.S. Huang, and H. Niemann. Two-handed gesture tracking incorporating template warping with static segmentation. In Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, pages 275280, 2002. [4] Bogdan Kwolek. Visual system for tracking and interpreting selected human actions. Journal of WSCG, 11(1), 2003. [5] Yi Li. Multi-scenario gesture recognition using kinect. In Computer Games (CGAMES), 2012 17th International Conference on, pages 126130, 2012. [6] L. Miranda, T. Vieira, D. Martinez, T. Lewiner, A.W. Vieira, and M.F.M. Campos. Realtime gesture recognition from depth data through key poses learning and decision forests. In Graphics, Patterns and Images (SIBGRAPI), 2012 25th SIBGRAPI Conference on, pages 268275, 2012. [7] S. Mitra and T. Acharya. Gesture recognition: A survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 37(3):311324, 2007. [8] S. Monir, S. Rubya, and H.S. Ferdous. Rotation and scale invariant posture recognition using microsoft kinect skeletal tracking feature. In Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on, pages 404409, 2012. [9] A. Oikonomopoulos and M. Pantic. Human body gesture recognition using adapted auxiliary particle filtering. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on, pages 441446, 2007. [10] O. Patsadu, C. Nukoolkit, and B. Watanapa. Human gesture recognition using kinect camera. In Computer Science and Software Engineering (JCSSE), 2012 International Joint Conference on, pages 2832, 2012. [11] S.S. Rautaray and A. Agrawal. Design of gesture recognition system for dynamic user interface. In Technology Enhanced Education (ICTEE), 2012 IEEE International Conference on, pages 16, 2012. [12] Thomas SchlÂ¨omer, Benjamin Poppinga, Niels Henze, and Susanne Boll. Gesture recognition with a Wii controller. In Proceedings of the 2nd international conference on Tangible and embedded interaction, TEI 08, pages 1114, New York, NY, USA, 2008. ACM. [13] J. Schmidt, J. Fritsch, and B. Kwolek. Kernel particle filter for real-time 3d body tracking in monocular color images. In Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference on, pages 567572, 2006. [14] Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. Real-time human pose recognition in parts from single depth images. Commun. ACM, 56(1):116124, January 2013. [15] J. Suarez and R.R. Murphy. Hand gesture recognition with depth images: A review. In RO-MAN, 2012 IEEE, pages 411417, 2012.

165

[16] Yong Wang, Tianli Yu, L. Shi, and Zhu Li. Using human body gestures as inputs for gaming via depth analysis. In Multimedia and Expo, 2008 IEEE International Conference on, pages 993996, 2008. [17] Youwen Wang, Cheng Yang, Xiaoyu Wu, Shengmiao Xu, and Hui Li. Kinect based dynamic hand gesture recognition algorithm research. In Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2012 4th International Conference on, volume 1, pages 274279, 2012. [18] Ying Wu and Thomas S Huang. Vision-based gesture recognition: A review. Urbana, 51:61801, 1999. [19] Kiwon Yun, J. Honorio, D. Chattopadhyay, T.L. Berg, and D. Samaras. Two-person interaction detection using body-pose features and multiple instance learning. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 2835, 2012. Sources [20] ASUS Xtion http://www.asus.com/Multimedia/Xtion_PRO/ [21] OpenNI http://www.openni.org/ [22] NiTE http://www.openni.org/files/nite/#.UWBcfarWA1I

166

Computers System Engineering 2013 Keywords: traffic sign, recognition, detection

Â´ Â´ Aleksandra OSWIECI NSKA

SIGN ORIENTATION PROBLEM IN TRAFFIC SIGN RECOGNITION SYSTEM

Traffic Sign Recognition Systems, starting from simple driving support and comfort enhancements up to intelligent vehicles constitute a highly developing field of studies. This project presents the results of the experiments of a system detecting and recognizing traffic signs using OpenCV library. Input stream is preprocessed (morphology operators, smoothing) to improve the structure of the further processed image. Color-based segmentation with HLS color space provides easier color range narrowing as well as threshold and ROIs extraction. Shape segmentation module detects contours and analyzes its shape with their descriptors by approximating polygonal curves. Finally, correlation methods are applied to recognize patterns. The objective of this project is to discover the influence of the sign orientation on its recognition.

INTRODUCTION

Expanding use of automatic systems in both commercial and private areas relieves human in a large variety of actions. Advanced applications are aimed to provide intelligent systems, which do not need human supervision. This issue is worked out on mostly universities or the military laboratories. Traffic Sign Recognition Systems are currently used in many cars. Such systems consist of real-time detection and recognition of traffic signs. All that assists the driver by issuing appropriate warnings or commands. The problem of detection and recognition starts from the way the machine acquires and interprets input data. Once this problem is dealt with, more complex issues arise, such as decision making and even predicting the further objects. The main goal of this development is to create an intelligent vehicle that is able to move autonomously. The difficulties in detection and recognition depend on many different things, such as weather conditions, illumination change, shadowing or rotation/skewness of a sign. The last issue, sign orientation, may cause bad results in detection and recognition. Rotated and skewed sign entails an additional distortion of interpreted data. If the sign is detected in the region of interests (ROI), the pictograph of a sign is flattened or skewed. This may 167

deepen deformation, decreasing the probability of successful sign recognition. The problem of road sign recognition and detection is very popular, but still mined to find the most successful method of image analysis. Also reducing memory consumption and time proccessing is diserable as a significant factor. The most common approach is two stages, detection module to extract the sign area and recognition module to identify the sign. Pre-processing of image with threshold technique, Gaussian filter, Canny edge detection, Contour and Fit Elipse, Hough Transform, morphological filters is used in [1][2][3]. Mateusz Mazurâ&#x20AC;&#x2122;s approach [1] achieves detection with color-based segmentation and shape analysis while pictograph is recognised with CiSS (Circular Sampling Space) described by Kim and Araujo [4]. The point of this method is to depict the pictograph in a way indepenent of rotation and requires simple data base. Neural Networks technique, as in [2], recognizes patterns with satisfying processing time, but requires suitable data sets for training and validation. Information of RGB colors in image are input to Neural Networks. The structure of network, nodes and used techniques (i.e. the cross-validation) in implementation are significant to obtain suitable error, cost and time performance. Detection may be performed using Support Vector Machine (SVM) [3]. The output of color segmentation is classified using the Histogram of Oriented Gradients (HOG) feature and tree classifiers identify the content of the traffic sign. Building a good classifier or any method to recognize is complex problem due to variety of signs, their number and sometimes small differences between signs. The system described in this paper was created using OpenCV library [5]. It involves two stages of processing, detection and recognition. The detection part is achieved by combining existing methods to impose color based segmentation. However, HSL color space is used instead of RGB in order to gain eligible color range in image processing. Regions of interests selected with color segmentation are processed with Affine Transform, which compensates distortion. Finally, the actual sign recognition is achieved by template matching. Experiments were conducted with the yellow signs.

DETECTION

The first step of any pattern recognition is detection of the object. Its efficiency affects the further results significantly. It is important not only to obtain high recognition ratio, but to detect objects with satisfying results as well. The more objects are detected properly, the more of them is subjected to recognition. Desirable results in detection may be achieved by various methodologies and algorithms. To improve the structure of processed image, input stream is firstly filtered using morphology transformations (eroding, dilating) and smoothing functions. 168

Proposed method od detection is based on common, but successful approaches, namely color thresholding, searching for ROIs and shape analysis [1]. 2.1.

COLOR SPACE

Image analysis is strongly dependent on the choice of color space. Image color representation is a model which characterizes how the input data is perceived. Selected color space for this project was HSL, widely used in computer graphics. The choice of such color space was due to easier color range definition. HSL stands for Hue, Saturation and Lightness. Compared to RGB color space defined by three colors (Red, Green, Blue), HSL provides in some cases better color manipulation.

Fig. 1. HSL color space representation. Source: Wikibooks HSL

Other major advantage of HSL color space is insensitivity to illumination change. An input image taken from the camera is converted from RGB to HSL color space beforehand. Any further image analysis is done using HSL. 2.2.

HISTOGRAM EQUALIZATION

The imageâ&#x20AC;&#x2122;s histogram equalization may be required to improve the contrast in the image. The light intensity range is stretched out resulting in other distribution used to accomplish the equalization effect.

169

Fig. 2. Histogram equalization effect – before (left) and after (right)

2.3.

COLOR-BASED SEGMENTATION

As long as the equalization process is finished, the color extraction proceeds. The color ranges are well defined in terms of each single channel – hue, saturation and lightness. Selecting appropriate range ensures that sign of particular color will be found. The processed image is truncated with an appropriate color mask. This issue is crucial for detection efficiency. There are lots of difficulties in finding such ranges of particular channels that the number of detected signs is maximized, while the number of false positive results is maintained at the lowest level possible.

Fig. 3. Yellow color mask – binarization (extracting yellow color)

170

2.4.

ROI AND SHAPE ANALYSIS

According to defined color range (for example yellow), the search area for the potential sign is thresholded to extract ROI (Region Of Interest). In the region of interest the shape segmentation module detects contours and analyzes shape with shape descriptors. They approximate polygonal curves using Douglas-Peucker algorithm. Polygonal curves are approximated with another curves or polygons with less vertices. The distance between them is less than or equal to the specified precision. If detected shape is adequate, cropped image is processed in subsequent methods. Selected ROI must satisfy certain conditions. Resolution should be between 18x18 pix and 100x100 pix while the ratio between dimensions should be in range of 0.85 and 1.12. These constraints allow the removal of false positive results that may be recognized as signs.

Fig. 4. The cropped image of sign (ROI).

2.5.

AFFINE TRANSFORMATION

The cropped image is fitted to an appropriate form and then compared with a template. These actions lead to the further improvement of the correlation results. Affine Transformation can be expressed in the form of linear transformation (rotations, scale operations) followed by translation [5]. Because it represents a relation between two images, it is used to reshape obtained (cut) image of sign from posed orientation to well defined orientation (the sign surface is perpendicular to the camera). Affine Transformation requires some characteristic points as an input, which in case of yellow sign are the corners of a triangle. After finding characteristics points Affine Transformation is applied and the pictograph (cropped image) is converted to the pattern form with appropriate orientation.

RECOGNITION

Transformed images with pattern are subjected to a technique for finding areas of an image that match (are similar) to a template image. Template matching is used with 171

comparison method - normalized correlation. Correlation describes relation between two sets of data and how strongly they are linked together. Correlation coefficient equal to 1 means perfect correlation and value of 0 means no existing relation between sets of data. Image of sign subjected to recognition is adjusted to template image by resizing. Template image is of 40 pix width and 35 pix height pictograph. The template image and sign image are set together to obtain correlation coefficient. The database of signs includes 11 pictographs of warning signs, where each of them has a different meaning. Cropped image of sign is compared with every sign in the database. Correlation coefficient is a decisive factor in determining the final result in road sign recognition system.

RESULTS OF EXPERIMENT

Result of several tests are presented in this section. The system has not been fully developed yet and only one group of signs is recognized. The system works with yellow triangles, which are the warning signs. The correlation coefficient for well recognized sign is significantly higher than for others (false yellow sign pattern), about 0.2-0.3 more. The system is accurate with giveway (A-7) sign â&#x20AC;&#x201C; 100% of them is properly detected and recognized. More details are listed below. Size of the sign database: Number of signs under testing: With:

Correlation if properly recognized:

10 different yellow signs 10 7 detected and properly recognized (2 giveway signs and 5 other triangle yellow signs) and 3 not detected 0.75-0.82

It is reasonable that give-way signs are the most easily recognized due to lack of inner black pictograph in the triangle. Pictograph of yellow color triangle is easy to correlate with give-way sign with high correlation coefficient. Difficulties occurs when dealing with others. Use of plain correlation does not give satisfying results and should 172

Fig. 5. The images of yellow signs in database

be supported by other methods. 4.1.

SIGN ORIENTATION

Additional difficulties are caused by rotation and skewness of the sign. Tested road sign had different deformations (coming when the sign is rotated and/or skewed). Orientation distortions are compensated with Affine Transformation. The matter of recognition is strongly conjoined with the level of deformation. It increases when distortion (rotation, skewness) gets bigger. The system was tested with random signs with different degree of deformation. Efficiency of the system does not seem to be connected with sign orientation - the influence of sign orientation is not very significant. This might be an issue with a higher level of distortion caused by rotation and skewness levels deviated from pose when the sign surface is perpendicular to camera. Such situations, when the sign is heavily skewed, happen rarely. To a degree it is compensated by linear transformation, but image cannot be fully restored.

CONCLUSION

The project presents new approach with implemented known methods. The system has real time performance with small delay resulting from the computation, but common for image processing and still can be classified as real time working. Correlation was chosen to omit complicated methods like SVM or Neural Networks (which require much time and data to prepare) and as individual method does not give satisfying results in traffic sign recognition [2][3][6]. Other faced difficulties are rounded corners of Polish road signs, which impede proper corner detection and weaken contour detection. Because of Affine Transformation application, the orientation problem with road sign seems to be temporarily solved. But very important thing is position of camera inside or outside the car. The higher it is (closer to the level of sign surface), the less transformation must be done. It is desired due to distortion made by transforming. It can be halfway omitted by transformation of pattern (which is not distorted) to extracted sign. Also camera placement outside car is preferred.

173

To improve correlation results, it can be supported with morphological skeleton method. Moreover, another previously mentioned contour detector would be helpful. The system could be equipped with block correlation which is less sensitive to distortion than correlation. Finally, the system should be developed with other sign categories and GUI.

REFERENCES [1] MAZUR M., Road sign detection and recognition. AGH, 2009. [2] LORSAKUL A., SUTHAKORN J., Traffic Sign Recognition Using Neural Network on OpenCV: Toward Intelligent Vehicle/Driver Assistance System. Center for Biomedical and Robotics Technology (BART LAB), http://www.bartlab.org. [3] ZAKLUOTA F., STANCIULESCU B., Real-Time Sign Recognition in Three Stages. In Robotics and Autonomous Systems (2012), doi.org/10.1016/j.robot.2012.07.019. [4] KIM H., ARAUJO S., Rotation, scale and translation-invariant segmentation-free shape recognition. April 2007. [5] http://opencv.org/. [6] TOHKA J., Introduction to Pattern Recognition, Tampere University of Technology, 2013.

174

Computer Systems Engineering 2013 Keywords: Soft Computing, Identification Systems, Support Vector Machine, Artificial Neural Networks, Industrial Processes, Milling Dentral Process

Roberto Vega Ruiz∗ H´ector Quinti´an† Vicente Vera‡ Emilio Corchado§

INTELLIGENT MULTIPLATFORM APP FOR AN INDUSTRIAL PROCESS OPTIMIZATION

This study presents a novel Intelligent System based on the application of Soft Computing Models and Identification Systems, which makes possible to optimise industrial processes, such as a dental milling process. The main goal is the optimization of some parameters as time. This novel intelligent procedure is based on two steps. Firstly, statistical techniques and unsupervised learning are used to analyse the internal structure of the data set and the identification of the most relevant variables. Secondly, the most those variables are used to model the system using supervised learning such as artificial neural networks and support vector machines. The model has been succesfully tested using an industrial real data set.

Introduction

Soft Computing [16, 6] has been successfully used for industrial process modelling. Recently, different paradigms of Soft Computing have been applied to different industrial problems [4]. System identification [13, 10] has made possible to model, simulate and predict the behavior of many industrial applications successfully and in different areas, such as control [22], robotics [17], energy processes [1], milling machine [2], high precision [6], etc. A novel and economically advantageous application is the optimiza∗

University of Salamanca, Spain, e-mail: rvegaru@usal.es University of Salamanca, Spain, e-mail: hector.quintian@usal.es ‡ Facultad de Odontologa, University Complutense of Madrid, Spain, e-mail: viventevera@odon.ucm.es § University of Salamanca, Spain, e-mail: escorchado@usal.es †

175

tion process in the area of Odonto-Stomatology. Improved processing and optimization of parameters, such as time processing, for the development of dental pieces are the focus of rigorous studies today. The optimization process of machine parameters, such as the time parameter and accuracy, permits significant saving money due to the high number of dental pieces produced daily by the same high precision dental milling machine centre. One way to achieve this optimization is based in the use of emerging methodologies, such as neural networks [15], nature-inspired smart systems[14], fuzzy logic [19], support vector machine [9], data mining [7], visualization [5], genetic algorithms [11], case-based reasoning [25], among others. This contribution is organised as follows. Section 2 describes the two steps intelligent model for analysing and modelling the dataset. Section 3 introduces the industrial case study. Section 4 describes the experiments and the results obtained. Finally, the conclusions are set out and some comments on future research lines are presented.

Two Steps Model

A Two Steps Model (Fig. 1) is presented in this contribution. It is based on the analysis of the Internal Structure of a data set and the identification of the most relevant variables followed by the application of System Modelling to optimize the process. Firstly, the analysis of internal structure is done by using statistical methods as Principal Components Analysis (PCA) [21, 18] and unsupervised learning based models as Maximum Likelihood Hebian Learning (MLHL) [12] and Cooperative Maximum Likelihood Hebbian Learning (CMLHL) [24]. These models are also used to identify the most significant features. Finally, Identification Systems is applied through Artificial Neural Networks (ANN) [8] and Support Vector Machine (SVM) [20] to create the best model of an industrial process. This model has been built under a visual tool called DataEye.

176

Fig. 1. Two Intelligent Steps Model diagram 2.1.

Internal Structure Analysis

To begin with, PCA, MLHL and CMLHL are applied to analyze the internal structure of the data set from a case study. If it is possible to identify an internal structure, it means that the data set is informative and valid, otherwise it must be expanded with more data. 2.1.1.

Principal Component Analysis (PCA)

PCA [21, 18] is a statistical model which describes the variation in a set of multivariate data in terms of a set of uncorrelated variables, each of which is a linear combination of the original variables. The main goal is to derive new variables in decreasing order of importance, with the less loss of information. Applying PCA, it is possible to find a smaller group of underlying variables that describe the data. PCA has been the most frequently reported linear operation involving unsupervised learning for data compression.

177

2.1.2.

Maximum Likelihood Hebbian Learning (MLHL)

MLHL [21] is a neuronal implementation of Exploratory Projection Pursuit (EPP) [3] which it is a recent statistical method aimed at solving the difficult problem for identifying structure in high dimensionality data. However, not all projections will reveal this structure well. MLHL identifies interestingness by maximizing the probability of the residuals under specific probability density functions that are non-Gaussian. Let have an N -dimensional input vector, x, and M -dimensional output vector, y, with Wij being the weight linking the j th output. The learning rate, η is a small value which will be annealed to zero over the course of training the network and p is a parameter related to the energy function. The activation passing from input to output through the weights is described by Eq. 2. MLHL can be expressed [12] as: Weight change: 4Wij = ηyi sign(ej )|ej |p−1 (1) Feed-forward step: Yi =

N X

Wij xj , ∀i

(2)

j=1

Feedback step: ej = xj −

M X

Wij yi , ∀j

(3)

i=1

2.1.3.

Cooperative Maximum Likelihood Hebbian Learning (CMLHL)

CMLHL [24] is based on MLHL adding lateral connections. CMLHL can be expressed [24] as: Weight change: 4Wij = ηyi sign(ej )|ej |p−1

(4)

Feed-forward step: Yi =

N X

Wij xj , ∀i

j=1

178

(5)

Feedback step: ej = xj −

M X

Wij yi , ∀j

(6)

i=1

Lateral activation passing: yi (t + 1) = [yi (t) + τ (b − Ay)]+

(7)

Where: η is the learning rate, [ ]+ is a rectification necessary to ensure that the yvalues remain within the positive quadrant, τ is the ”strength” of the lateral connections, b is the bias parameter, p is a parameter related to the energy function and A is the symmetric matrix used to modify the response to the data. The effect of this matrix is based on the relation between the distances separating the output neurons. 2.2.

System modelling using identification algorithms

System Identification [13, 10] procedure comprises several steps: the selection of the models and their structures, the learning methods, the identification and optimization criteria and the validation method. Validation ensures that the selected model meets the necessary conditions for estimation and prediction. A good model is one that makes good predictions and it produces small errors [13], for this reason Artificial Neural Networks (ANN) as Multilayer Perceptron (MLP) [15] and Least Square Support Vector Regression (LS-SVR) [9] are applied in this study to get the best model. 2.2.1.

Artificial Neural Networks (ANN)

ANN [15] are one of the most interesting soft computing paradigms. It is used also under the frame of System Identification. When ANN is used, the purpose of an identification process is to determine the weight matrix based on the observations Z t , so as to obtain the relationships between the network nodes. 2.2.2.

Least Square Support Vector Regression (LS-SVR)

LS-SVR [9] is a modification of the algorithm of the Support Vector Machine (SVM) [20]. In SVR the main idea is to map the data into a high dimensional feature space F via a nonlinear mapping and to perform a linear regression in this space. 179

The application of LS-SVM for regression purposes is known as Least Square Support Vector Regression (LS-SVR). In LS-SVR, only 2 parameters γ, σ 2 are needed. Where σ 2 is the width of the used kernel and γ is a weight vector.

Optimising a Dental Milling Process

Advancements in dental technology can offer modern and innovative solutions to traditional dental problems. The driving force behind technological advances is a desire to provide with leading edge dental treatment that can be performed in a more efficient, effective, comfortable manner. The development of dental technology is advancing with the application of computer science in this field. The aim of manufacturing a tooth prosthesis is creating an impression of the patients mouth. To do this, the dentist jams silicone putty into the patients mouth. The dentist removes the negative mold and fills it with plaster. When mold is dried, this provides an exact replica of the patients dentition. Then, the 3-D model of the dentition is transferred to a CAM system, where an operator generates toolpaths for high-speed, multi-axis machining of the final work piece. The industrial case scenario is based on the real data gathered by means of a Machining Milling Center of HERMLE type-C 20 U (iTNC 530) (Fig. 2), with swivelling rotary (280 mm), with 5-axis, with a control system using high precision drills and bits, by optimizing the time error detection for manufacturing dental metal. The material (CrCo, Ti), the tools and the feed rates affect the conformation times. Since the processing involves polishing and milling the piece, the time of conformation determines its size. The milling process in the preparation of dental pieces is currently the most modern processing prosthesis in existence. When it finished, the completed crown, bridge or veneer is cut away from the blank, hand-finished to remove coping and runner marks, polished and attached to the patients damaged tooth using dental cement or affixed to a previously installed implant. The advantages of 5-axis machines are numerous allowing complete multilateral machining in a single cycle, reducing non-productive time, eliminating the lack of precision arising from the multiple ties of the piece and allowing better access to restricted areas difficult to reach. The angle adjustment can be freely defined and it is possible to use shorter and more rigid tools, which results in improved surface finish. The main parameter to estimate and optimize in this research is the time error for manufacturing. It is the difference between the estimated time by the machine itself and 180

(a) Milling Center of HERMLE type-C 20 U

(b) Metal pieces manufactured

Fig. 2. HERMLE high precision machine (a) and pieces manufactured process(b)

real work time. Negative values indicate that real time exceeds estimated time. 3.1.

Data Set

The real case study is described by an initial data set of 190 samples obtained by the dental scanner in the manufacturing of dental pieces with different tool types (plane, toric, spherical, and drill) characterized by 12 input variables (see Table 1). The input variables are the type of work, the thickness, the number of pieces, the radius of the tool, the tool, the revolutions per minute of the drill, the feed rate (X, Y and Z), the initial tool diameter, the initial temperature and the estimated duration of the work.

DataEye: A software implementation of the two step system

The two steps system developed has been implemented in a software tool named DataEye. The aim of this software is to facilitate the use of intelligent systems in general, and the developed two steps system in particular, to experts who do not have experience in the use of this algorithms. DataEye has been developed in PHP, JavaScript and HTML5. This tool is accessible from any device (Fig. 3) such as Pc, Smartphone and Tablet, having internet connection. To access the application, it is need to enter username and password. Once inside the

181

Table 1. Variables and range of values

Variables

Values

Work

1 to 7

Disk thickness (mm)

8 to 18

Pieces

1 to 4

Radius (mm)

0.25 to 1.5 Plane (1), Toric (2), Spherical (3), Drill (4)

Type of tool Revolutions per minute (RPM)

7,500 to 38,000

Feed X (mm)

0 to 3000

Feed Y (mm)

0 to 3000

Feed Z (mm)

50 to 2000

Initial Diameter (mm)

91.061 to 125.57

Initial Temperature (C)

21.6 to 31

Theoretical Working Time (s)

6 to 2,034

Difference between Theoretical and Real Time (s)

-3 to 565

Weathering (mm)

-8 to 22

Temperature Variation (C)

-5 to 6.7

182

(a) Applying CMLHL from MacBook Pro

(b) Applying PCA from iPhone

Fig. 3. DataEye is accesible from any device

application, it is needed to load a csv file with the dataset to be analysed. Following, the application shows the implementation of the first step. First, must be selected the not supervised algorithms (PCA, MLHL and CMLHL) to apply to the dataset (see figure 4). This algorithms are used to analyse the internal structure of the dataset, and decide if the dataset is enough informative. The results obtained by this algorithms are shown in a new window (see figure 5), this window provide several tool to analyse the results by the user, as: â&#x20AC;˘ Axis selector: Allows to change the projections showed in each axe (axis X and axis Y). â&#x20AC;˘ Grouping Tools: Allows to create/remove a group of points in the current projections. This tools allow the user to analyse the results in a fast and easy way, and to determine which variables are the most important for the internal structure of the dataset. This variables will be used later in the second step. Then, it is possible to access to the second step. First, must be selected which variables are inputs and which one are outputs (see figure 6). Then, a new window appears (figure 7) and must be selected the modelling algorithms to apply to the dataset (Artificial Neuronal Network (ANN), Support Vector Machine (SVM), Auto Regressive with eXternal input (ARX), Auto Regressive Moving Average (ARMAX) y Box Jenkins (BJ), etc.). 183

Fig. 4. Step 1, algorithms selection

Fig. 5. Step 1, results

184

Fig. 6. Step 2, Input/ouput selector

Fig. 7. Step 2, algorithms selection

Once the algorithms have finished, the results are shown in terms of FIT in a graph, showing the real and the predicted values for each algorithm in different colour (figure 8)

185

Fig. 8. Step 2, results 4.1.

Technologies Used

Figure 9 shows a summary of the different tools used for the implementation of DataEye. DataEye has been implemented in a web server to allow an access to all type of devices with internet connection. Each tool is explain following. 4.1.1.

SERVIDOR LINUX

As operative system in the server, it has been used Ubuntu Server. This server stores the different tool required for DataEye application. 4.1.2.

PHP

is a server scripting language designed for web development. PHP code is interpreted by a web server with a PHP processor module which generates the resulting web page: PHP commands can be embedded directly into an HTML source document rather than calling 186

Fig. 9. Technologies used to develop DataEye Application

an external file to process data. It is free software. PHP can be deployed on most web servers and also as a standalone shell on almost every operating system and platform. In DataEye application, PHP is used to: perform several calls to Matlab through the operating system console; to control the screens that make up the application; and to check data entered into forms and to validate file formats. 4.1.3.

MySQL

is an open source relational database management system. Information in a MySQL database is stored in the form of related tables. MySQL databases are typically used for web application development. In DataEye application, MySQL is used to: storing user credentials in the database. 4.1.4.

MATLAB

is a high level language and interactive environment for numerical computation, visualization, and programming. MATLAB can analyse data, develop algorithms, and create models and applications. The language, tools, and built-in mathematical functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java. In DataEye application, Matlab is used to: execute all the algorithms, both step 1 (analysis of the internal structure of the data set), as in step 2 (modelling).

187

4.1.5.

HTML5 & CSS3

is the latest version of the HTML labels language and is characterized by being compatible with any device as a Tablet, Smartphone or PC. In DataEye application, HTML5 & CSS3 is used to: it presents the graphical interface to the user, allowing the user to interact with DataEye application. 4.1.6.

Highcharts

it is a graphical library to be use with HTML5, and provide a easy way to present graph and interact with this graphs. In DataEye application, Highcharts is used to: present the graphs and modify the graphs. 4.1.7.

JavaScript

is an interpreted computer programming language. It was originally implemented as part of web browsers so that client-side scripts could interact with the user, control the browser, communicate asynchronously, and alter the document content that was displayed. In this application is used to interact with the user.

Experiments and Results

This multidisciplinary research analysed the data set to obtain the variables that are most important to optimize the time error in the manufacturing process. In the first step, PCA and some unsupervised learning algorithms as MLHL and CMLHL were applied to identify the internal data sets structure and compare the results obtained by every algorithm. In the case of PCA, the model is looking for those directions where variance is the biggest. An internal structure appears in the data set due to feed X, feed Y and feed Z increase when the X-axis values grow up and revolutions per minute (RPM) increase when the Y-axis values decrease. In addition, five clusters are identified (Fig.10). Once the internal structure of the data set is analysed it has been concluded that the most relevant variables are revolutions per minute (RPM), feed X, feed Y and feed Z variables. MLHL identifies five clusters as PCA projection but in this case the number of relevant variables is six instead of five as it can be seen in (Fig.11). Revolutions per minute (RPM), feed X, feed Y, feed Z, temperature, type of tool and difference between theoretical and real time are the most representative variables in this projection. It also has an 188

internal structure in all clusters where revolutions per minute decrease when the Y-axis values increase and temperature values increase when the X-axis values increased. In the last case, CMLHL algorithm was applied to improve MLHL results and it got the best representation as it was showed in (Fig.12). CMLHL projection is showed in (Fig.12). The results obtained were better than the others projections by PCA and MLHL. The clusters were well defined. Six clusters were identified and the most important variables were revolutions per minute (RPM), feed X, feed Y, feed Z, temperature, type of tool, difference between theoretical and real time and radius. It also has an internal structure in all clusters where revolutions per minute decrease when the Y-axis values increase and temperature values decrease when the X-axis values increased. It has been concluded that the most relevant variables of the data set are eight: revolutions per minute (RPM), feed X, feed Y, feed Z, temperature, type of tool, theoretical working time and radius. In the second step, these eight variables were selected to model the system. ANN and SVR are applied and compared.

Fig. 10. PCA projection

189

Fig. 11. MLHL projections with iters = 10, 000, lrate = 0.05, m = 4 and p = 0.02.

Fig. 12. CMLHL projections with iters = 100, 000, lrate = 0.008, m = 4, p = 0.25, Ď&#x201E; = 0.1.

190

Table 2. Results for ANN applying all variables and the 8 main variables

Variables Model All 8 Main

Fit Fit Test Train Test(%) Train(%) Variance Variance

ANN 40.9975 55.9809 22.2028 27.8434 ANN 67.5723 71.2486

8.2520

21.7662

The models are trained using an initial data set of 190 samples obtained by the dental scanner in the manufacturing of dental pieces. These samples are formed by 8 input variables (revolutions per minute (RPM), feed X, feed Y, feed Z, temperature, type of tool, theoretical working time and radius) and 1 output variable (difference between theoretical and real time). The number of samples in the data set is small, for this reason a 12-fold crossvalidation schema is used. This technique reduces variability, multiple rounds of crossvalidation are performed using different partitions, and the validation results are averaged over the rounds. The final model is obtained using the full data set. Crossvalidation is applied to ANN and SVR. MLP was tested with several difference structures, getting the best result using 4 hidden layers. The structure was made up by 12 neurons in the first layer, 3 neurons in the second layer, 10 neurons in the third layer and 6 neurons in the fourth layer. The selected activation function in the hidden layers was the tangent sigmoidal. The results applying MLP using all variables and the 8 main variables are shown in Table 2. It can be appreciated how the results using only the most relevant variables are better (67.5723%) than the ones which use all the variables (40.9975%). LS-SVR was tested through Matlab toolbox. In this toolbox, the tuning of the parameters is conducted in two steps. First, a state-of-the-art global optimization technique, Coupled Simulated Annealing (CSA) [23], determines suitable parameters according to specific criterion. These parameters were then given to a second optimization procedure simplex to perform a fine-tuning step and Radial Basis Function (RBF) kernel was selected. The optimal adjustment parameters are Ď&#x192; 2 and Îł. The results applying SVM are shown in Table 3. It can be appreciated how the results are similar applying feature selection (75.0069%) and selecting all variables (75.1545%). These results justify that feature selection allows to discard the variables that are not relevant and reduce the computational cost. Feature selection is necessary to identify the most relevant variables and reduce the computational cost. The results are also better applying SVM because the Fit Test param191

Table 3. Results for SVM applying all variables and the 8 main variables

Variables Model

Fit Fit Test Train Test(%) Train(%) Variance Variance

Ď&#x192;2

Îł

All

SVM 75.1545 89.9686

1.3554

8.8492 71.5273 193.8523

8 Main

SVM 75.0069 86.0872

2.1611

13.2415 3.94753 51.9611

eter (75.1545%) is greater than ANN (67.5723%) and the variance is also more reduced for SVM (1.3554). For this reason, it has been concluded that SVM model is better than ANN model.

Conclusions

This contribution has been presented to prove the advantages of the combination of statistical and connectionist unsupervised models with identification systems, providing a two steps model characterized by indicating the relevance of the data. It avoids wasting time attempting to model a system with data that do not represent the system dynamics. Feature selection has reduced the number of variables to be considered in the modeling process, with the least loss of information about the system. This reduces the complexity of the model and computational cost. An intelligent two steps model has been presented which has allowed analyzing the real data set formed by 190 samples where every sample has 15 variables. In the first step, some visualization techniques such as PCA, MLHL and CMLHL were applied. It has allowed to find the internal structure of the data set. At the first moment, an internal structure was found by applying PCA. MLHL and CMLHL are also applied and the most important variables were identified. In the second step, ANN and SVM were applied to model the system. It has been concluded that SVM is better than ANN because SVM Fit Test is greater in a 7.5822% and SVM Variance Test is less than ANN model in a 6.8966. Future research lines will be based on the analysis of some other features selection models as wrappers, filters and hybrid models. Also the application of other optimization models will be studied. Finally the application of this intelligent model will be tested under the frame of some other industrial use cases.

192

Acknowledgement This research is partially supported by the Spanish Ministry of Economy and Competitiveness under project TIN2010-21272- C02-01 (funded by the European Regional Development Fund), SA405A12-2 from Junta de Castilla y Le´on. The authors would also like to thank TARAMI (Madrid-Spain) for their collaboration in this research.

REFERENCES [1] AFSHAR P., BROWN M., MACIEJOWSKI J.M. and WANG H., Data-Based Robust Multiobjective Optimization of Interconnected Processes: Energy Efficiency Case Study in Papermaking. IEEE Transactions on Neural Networks, 2011, pp. 2324–2328. [2] ANCUTA A., COMPANY O. and PIERROT F., Modeling and optimization of Quadriglide, a Schnflies motion generator module for 5-axis milling machine-tools. Proceedings of the IEEE International Conference on Robotics and Automation, 2009, pp. 2174–2179. [3] BERRO A., MARIE-SAINTE S.L. and RUIZ-GAZEN A., Genetic algorithms and particle swarm optimization for exploratory projection pursuit . Ann. Math. Artif. Intell., vol. 60(12), 2010, pp. 153–178. ´ [4] CORCHADO E., ABRAHAM A. and SNASEL V., New trends on soft computing models in industrial and environmental applications. Neurocomputing, vol. 109, 2013, pp. 1–2. [5] CORCHADO E. and BARUQUE B., WeVoS-ViSOM: An ensemble summarization algorithm for enhanced data visualization. Neurocomputing, vol. 75(1), 2012, pp. 171–184. [6] CORCHADO E., SEDANO J., CURIEL L. and VILLAR J.R., Optimizing the operating conditions in a high precision industrial process using soft computing techniques. Expert Systems, vol. 29(3), 2012, pp. 276–299. [7] DEYPIR M., SADREDDINI M.H. and HASHEMI S., Towards a variable size sliding window model for frequent itemset mining over data streams. Computers & Industrial Engineering, vol. 63(1), 2012, pp. 161–172. [8] GHIASSI M. and NANGOY S., A dynamic artificial neural network model for forecasting nonlinear processes . Computers & Industrial Engineering, vol. 57(1), 2009, pp. 287–297. [9] GUO X. C., WU C., MARCHESE M. and LIANG Y., LS-SVR-based solving Volterra integral equations. Applied Mathematics and Computation, vol. 218(23), 2012, pp. 11404– 11409. ¨ ¨ T., System identification in a net[10] IRSHAD Y., MOSSBERG M. and SODERSTR OM worked environment using second order statistical properties. Automatica, vol. 49(2), 2013, pp. 652–659. ¨ [11] KARAKIS R., TEZ M., KILIC Y.A., KURU B. and GULER I., A genetic algorithm model based on artificial neural network for prediction of the axillary lymph node status in breast cancer. Eng. Appl. of AI, vol. 26(3), 2013, pp. 945–950.

193

¨ ´ [12] KROMER P., CORCHADO E., SNASEL V., PLATOS J. and GARC´Ia-HERNANDEZ L., Neural PCA and Maximum Likelihood Hebbian Learning on the GPU. Artificial Neural Networks and Machine Learning ICANN, 2012, pp. 132–139. [13] LJUNG L., System identification (2nd ed.): theory for the user. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1999. [14] MARINAKIS Y., MARINAKI M., DOUNIAS G., JANTZEN J. and BJERREGAARD B., Intelligent and nature inspired optimization methods in medicine: the Pap smear cell classification problem. Expert Systems, vol. 26(5), 2009, pp. 433–457. [15] NASSIF A.B., HO D. and CAPRETZ L.F., Towards an early software estimation using log-linear regression and a multilayer perceptron model. Journal of Systems and Software, vol. 86(1), 2013, pp. 144–160. [16] NGUYEN T.T., YANG S. and BRANKE J., Evolutionary dynamic optimization: A survey of the state of the art. Swarm and Evolutionary Computation, vol. 6, 2012, pp. 1–24. [17] PALLI G., BORFHESAN G. and MELCHIORRI C., Modeling, Identification, and Control of Tendon-Based Actuation Systems.. IEEE Transactions on Robotics, 2012, pp. 277–290. [18] PEARSON K., On Lines and Planes of Closest Fit to Systems of Points in Space . Philosophical Magazine, vol. 2, 1901, pp. 559–572. [19] PRATIHAR T.K. and PRATIHAR D.K., Design of cluster-wise optimal fuzzy logic controllers to model input-output relationships of some manufacturing processes. IJDMMM, vol. 1(2), 2009, pp. 178–205. ´ ´ [20] ROSALES-PEREZ A., ESCALANTE H.J., GONZALEZ J.A., REYES C.A. and COELLO C.A., Bias and Variance Multi-objective Optimization for Support Vector Machines Model Selection . IbPRIA, 2013, pp. 108–116. [21] SHEN D., SHEN H. and MARRON J.S., Consistency of sparse PCA in High Dimension, Low Sample Size contexts. J. Multivariate Analysis, vol. 115, 2013, pp. 317–333. [22] SMYK A. and TUDRUJ M., Genetic Algorithms Hierarchical Execution Control under a Global Application State Monitoring Infrastructure. Proceedings of the 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2013, pp. 16–23. [23] DE SOUZA S.X., SUYKENS J.A.K., VANDEWALLE J. and BOLLE´ D., Coupled Simulated Annealing. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 40(2), 2010, pp. 320–355. [24] VERA V., CORCHADO E., REDONDO R., SEDANO J. and GARCIA A.E., Applying soft computing techniques to optimise a dental milling process . Neurocomputing, vol. 109, 2013, pp. 94–104. ` [25] VILLANUEVA B.S. and SANCHEZ-MARR E` M., Case-Based Reasoning Applied to Textile Industry Processes. Proceedings of the Case-Based Reasoning Research and Development, 2012, pp. 428–442.

194