Polish-British Workshops Computer Systems Engineering Theory & Applications by Paweł K

POLISH-BRITISH WORKSHOP

COMPUTER SYSTEMS ENGINEERING THEORY & APPLICATIONS

Editors: Keith J. BURNHAM Leszek KOSZALKA Radoslaw RUDEK Piotr SKWORCOW Organised jointly by: • Control Theory and Applications Centre, Coventry University, UK • Chair of Systems and Computer Networks, Wroclaw University of Technology, Poland with support from the IET Control and Automation Professional Network

Reviewers: Keith J. BURNHAM Arkadiusz GRZYBOWSKI Adam JANIAK Andrzej KASPRZAK Leszek KOSZALKA Jens G. LINDEN Marcin MARKOWSKI Iwona POZNIAK-KOSZALKA Przemyslaw RYBA Henry SELVARAJ Ventzeslav VALEV Benoit VINSONNEAU Krzysztof WALKOWIAK

Cover page designer Aleksandra de’Ville

Typesetting: Camera-ready by authors Printed by: Drukarnia Oficyny Wydawniczej Politechniki Wrocławskiej, Wrocław 2011 Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland

ISBN 978-83-911675-9-5

POLISH-BRITISH WORKSHOP was held in Sokołowsko, Poland, May/June 2008 and Czarna Gora, Poland, June 2009

International Steering Committee Keith J. BURNHAM (the United Kingdom) Andrzej KASPRZAK (Poland) Henry SELVARAJ (the United States) Leszek KOSZALKA (Poland) Jens G. LINDEN (Germany) Pawel PODSIADLO (Australia) Iwona POZNIAK-KOSZALKA (Poland) Radu-Emil PRECUP (Romania) Radoslaw RUDEK (Poland) Piotr SKWORCOW (the United Kingdom) Gwidon STACHOWIAK (Australia) Jan STEFAN (Czech Republic) Ventzeslav VALEV (Bulgaria)

Local Organizing Committee 2008: 2009: Dariusz JANKOWSKI Pawel BOGALINSKI Katarzyna WALEWSKA Wojciech KMIECIK Marcin BAZYLUK Daniel KOWALCZYK Krzysztof KWIECIEŃ Urszula MOSKAL Conference Proceedings Editors Keith J. BURNHAM - editor Leszek KOSZALKA - editor Piotr SKWORCOW - editor Radoslaw RUDEK – editor Jens G. LINDEN – co-editor Iwona POZNIAK-KOSZALKA – co-editor

IET Control & Automation Professional Network The IET Control and Automation Professional Network is a global network run by, and on behalf of, professionals in the control and automation sector with the specialist and technical backup of the IET. Primarily concerned with the design, implementation, construction, development, analysis and understanding of control and automation systems, the network provides a resource for everyone involved in this area and facilitates the exchange of information on a global scale. This is achieved by undertaking a range of activities including a website with a range of services such as an events calendar and searchable online library, faceto-face networking at events and working relationships with other organisations. For more information on the network and how to join visit http://www.theiet.org/.

Preface It is with great pleasure that we as Editors write the preface of the Proceedings for the eighth and ninth Polish-British Workshops on Computer Systems Engineering: Theory and Applications, organized jointly by the Department of Systems and Computer Networks, Wroclaw University of Technology, Wroclaw, Poland and the Control Theory and Applications Centre, Coventry University, Coventry, UK. The Workshops took place in Sokolowsko (2008) and Czarna Gora (2009), with a number of papers presented by young researchers and engineers. The theme of the Workshops was focused on solving complex scientific and engineering problems in a wide area encompassing computer science, control engineering, information and communication technologies and operational research. Due to increasing populations worldwide and being driven by the scientific and technological developments, the systems that enable and/or enhance our day-to-day activities are becoming increasingly complex. To ensure sustainability of these systems, and indeed of our modern societies, there is an urgent need to solve many scientific and engineering problems. As a result, dealing with modelling, optimisation and control of complex large-scale systems, uncertain data, computational complexity as well as the need for highspeed communication, have all become of significant importance. The problems addressed and the solutions proposed in the papers presented at the Workshops and included in the Proceedings are closely linked to the issues currently faced by our society, such as efficient utilisation of energy and resources, design and operation of communication networks, modelling and control of complex dynamical systems and handling the complexity of information. We hope that these Proceedings will be of value to those researching in the relevant areas and that the material will inspire prospective researchers to become interested in seeking solutions to complex scientific and engineering problems.

The Polish-British Workshops have now become a traditional and integral part of the long- lasting collaboration between Wroclaw University of Technology and Coventry University, with the Workshops taking place every year since 2001. The Workshops bring together young researchers from different backgrounds and at different stages of their career, including undergraduate and MSc students, PhD students and post-doctoral researchers. It is a truly fantastic and quite unique opportunity for early-stage researchers to share their ideas and learn from the experience of others, to become inspired by the work carried out by their elder colleagues and to receive valuable feedback concerning their work from accomplished researchers, all in a pleasant and friendly environment surrounded by the picturesque mountains of Lower Silesia. None of this, however, would be possible without the continued efforts and commitment of the Polish-British Workshop founders: Dr Iwona PozniakKoszalka, Dr Leszek Koszalka and Prof. Keith J. Burnham. On behalf of all researchers who have attended the Polish-British Workshop series, including ourselves, we would like to express our sincere gratitude for making the Workshop series such a tremendous success, for sharing with others their extensive knowledge and experience, and for providing valuable guidance related to career and life choices faced by young researchers at this crucial stage of their careers.

Dr Piotr Skworcow, Water Software Systems, De Montfort University, Leicester, UK and Dr Radoslaw Rudek, Department of Information Technology, Wroclaw University of Economics, Poland Editors of the Proceedings and Members of the International Steering Committees for the Polish-British Workshops, 2008 and 2009.

Contents M. BAZYLUK, L. KOSZAŁKA, K. J. BURNHAM, R. RUDEK

DETERMINING THE INPUT BIAS ON EFFICIENCY OF METAHEURISTIC ALGORITHMS FOR A PARALLEL MACHINE SCHEDULING PROBLEM

P. BOGALINSKI, I. POŹNIAK-KOSZAŁKA, L. KOSZAŁKA, P. SKWORCOW

THE TWO DIMENSIONAL IRREGULAR-SHAPED NESTING PROBLEM

B. CZAJKA, I. POŹNIAK-KOSZAŁKA

SCHEDULING IN MULTI-PROCESSOR COMPUTER SYSTEMS – VARIOUS SCENARIO SIMULATIONS

T. DANNE, J. G. LINDEN, D. HILL, K. J. BURNHAM

MODELLING APPROACHES FOR A HEATING, VENTILATION AND AIR CONDITIONING SYSTEM

V. ERSANILLI, K. J. BURNHAM

A CONTINUOUS-TIME MODEL-BASED TYRE FAULT DETECTION ALGORITHM UTILISING AN UNKNOWN INPUT OBSERVER

J. GŁADYSZ, K. WALKOWIAK

THE HEURISTIC ALGORITHM BASED ON FLOW DEVIATION METHOD FOR SIMULTANEOUSLY UNICAST AND ANYCAST ROUTING IN CFA PROBLEM

T. KACPRZAK, L. KOSZAŁKA

COMPARISON OF ACTION CHOSING SCHEMES FOR Q-LEARNING

T. LARKOWSKI, J. G. LINDEN, K. J. BURNHAM

RECURSIVE BIAS-ELIMINATING LEAST SQUARES ALGORITHM FOR BILINEAR SYSTEMS

K. LENARSKI, A. KASPRZAK, P. SKWORCOW

ADVANCED TABU SEARCH STRATEGIES FOR TWO-LAYER NETWORK DIMENSIONING PROBLEM

104

M. KUCHARZAK, L. KOSZAŁKA, A. KASPRZAK

OPTIMIZATION ALGORITHMS FOR TWO LAYER NETWORK DIMENSIONING

116

D. PICHEN, I. POŹNIAK-KOSZAŁKA

A NEW TREND IN SOFTWARE DEVELOPMENT PROCESS USING DATABASE SYSTEMS

129

A. SMUTNICKI, K. WALKOWIAK

AN ALGORITHM FOR UNRESTORABLE FLOW OPTIMISATION PROBLEM USING P-CYCLES PROTECTION SCHEME

140

M. SUMISŁAWSKA, M. GARYCKI, L. KOSZAŁKA, K. J. BURNHAM, A. KASPRZAK

EFFICIENCY OF ALLOCATION ALGORITHMS IN MESH ORIENTED STRUCTURES DEPENDING ON PROBABILITY DISTRIBUTION OF THE DIMENSIONS OF INCOMING TASKS

159

B. TOKARSKI, L. KOSZAŁKA, P. SKWORCOW

SIMULATION BASED PERFORMANCE ANALYSIS OF ETHERNET MPI CLUSTER

172

M. ŻACZEK, M. WOŹNIAK

APPLYING DATA MINING TO NETWORK INTRUSION DETECTION

182

I. ZAJIC, K. J. BURNHAM

EXTENSION OF GENERALISED PREDICTIVE CONTROL TO HANDLE SISO BILINEAR SYSTEMS

188

T. CZYŻ, R. RUDEK

SCHEDULING JOBS ON AN ADAPTIVE PROCESSOR

200

W. KMIECIK, M. WÓJCIKOWSKI, A. KASPRZAK, L. KOSZAŁKA

TASK ALLOCATION IN MESH CONNECTED PROCESSORS USING LOCAL SEARCH METAHEURISTIC ALGORITHM

208

R. ŁYSIAK, I. POŹNIAK-KOSZAŁKA, L. KOSZAŁKA

ARTIFICIAL NEURAL NETWORK FOR IMPROVEMENT OF TASK ALLOCATION IN MESHCONNECTED PROCESSORS

220

M. SUMISLAWSKA, P.J. REEVE, K. J. BURNHAM, I. POŹNIAK-KOSZAŁKA, G. HEARNS

COMPUTER CONTROL ALGORITHM SIMULATION AND DEVELOPMENT WITH INDUSTRIAL APPLICATION

230

I. ZAJIC, K. J. BURNHAM, T. LARKOWSKI, D. HILL

DEHUMIDIFICATION UNIT CONTROL OPTIMISATION

240

Computer Systems Engineering 2008 Keywords: scheduling, parallel machines, total tardiness, simulated annealing, tabu search, genetic algorithm, ant colony optimization

Marcin BAZYLUK∗ Leszek KOSZAŁKA∗ Keith J. BURNHAM† Radosław RUDEK‡

DETERMINING THE INPUT BIAS ON EFFICIENCY OF METAHEURISTIC ALGORITHMS FOR A PARALLEL MACHINE SCHEDULING PROBLEM

The influence of input parameters on the efficiency of metaheuristic algorithms has been considered since their emergence. Calculations show that their calibration becomes an important issue leading to noticeable improve of quality. Nevertheless, this is often a long-lasting process. This paper tries to determine that dependence and suggests some policy on the optimization of finetuning. Another aspect is the efficiency variation of metaheuristics for different instance shapes and sizes. As the shape we understand the upper bounds of parameters describing a single solution, and as the size - the number of jobs and the number of available machines. Such variation is measured.

1. INTRODUCTION During the last decades the scheduling problems on parallel machines with the earliness-tardiness objective has been extensively analysed both by researchers and practitioners. In general, a scheduling problem on parallel machines focus on allocation of jobs to machines and determining their starting times such that the given objective is optimized. Scheduling problems on parallel machines are usually NP-hard and the problem under earliness-tardiness objectives is strongly NP-hard. Namely, Garey et al. [1] showed that the simplified problem on a single machine with symmetric earliness and ∗

Department of Systems and Computer Networks, Wrocław University of Technology, Poland. Control Theory and Applications Centre, Coventry University, Coventry, UK. ‡ Wrocław University of Economics, Poland. †

tardiness penalties is NP-hard. Yano et al. [2] proved the NP-hardness for the considered objectives with job weights proportional to their processing times. Sun et al. [3] studied the problem with identical parallel machines and a common due dates for all jobs and proved its ordinary NP-hard if the number of machines is given. Since the problem considered in this paper, is a more general, thus, it is not less complex.

2. PROBLEM FORMULATION There are given the set M of m machines and the set J of n jobs that have to be processed on these machines. Jobs are independent, non-preemptive and available for processing at time 0 and each job can be processed by one machine at a time. Before, we define the problem formally, let us introduce a useful notation and parameters as follows: i, j = 1, 2, . . . , n, job indexes, k = 1, 2, . . . , m, machine index, pik processing time of job i on machine k, wi weight of job i, di due date of job i, Ci completion time of job i, Ei earliness of job i, Ei = max{0, di − Ci }, Ti tardiness of job i, Ti = max{0, Ci − di }, 1 if job j immediately follows job i on machine k, , xijk = 0 otherwise, i = 0, 1, . . . , n, j = 1, 2, . . . , n, k = 1, 2, . . . , m yjk =

1 if job j is to be executed on machine k, . 0 otherwise, j = 1, 2, . . . , n, k = 1, 2, . . . , m

For i = 0, x0jk = 1 means that job j is scheduled as the first one on machine k. On this basis, the job processing time of job j is defined by the recursive formulae: Cj =

n X m X

xijk Ci + pjk .

(1)

i=0 k=0

Following the notation and parameters, we define the problem formally. The objective is to find such allocation of jobs to machines and their staring times on machines 10

that minimizes the following: f=

n X

wi (Ei + Ti )

(2)

i=1

under the following constraints: • each job is processed by one machine and preemption is not allowed: n m X X

xijk = 1, j = 1, 2, . . . , n,

(3)

i=1,i6=j k=1

• the is no idle times: n X

xijk = yjk , j = 1, 2, . . . , n, k = 1, 2, . . . , m,

(4)

i=1,i6=j

• each machine can process only one job at a time: n X

xijk ≤ yik , i = 1, 2, . . . , n, k = 1, 2, . . . , m.

(5)

j=1,j6=i

Since idle times are not allowed, thus, determining the staring times of jobs is reduced to their sequences on the machines.

3. METAHEURISTICS To solve the problem described in the previous section, we will use the well known metaheuristic algorithms that are described in the further part of this section. 3.1.

SIMULATED ANNEALING

The simulated annealing algorithm starts from an initial solution, generated by a modified Longest Processing Time, called LPT-MM (Longest Processing Time - MultiMachine) algorithm. Its mechanism is presented in Fig. 1. Starting from a given solution, the algorithm chooses the next solution by swap or insert of two randomly chosen jobs. The new solution replaces the recent solution with the following probability Cg − Cn P = exp , (6) T0 (1 − tmaxδt·100% ) 11

Step 1 (initialization) w

Schedule jobs j ∈ J according to the non-increasing order of p j . j

Step 2 (iterative scheduling) For each jobs i ∈ J do Find the first idle machine k. Allocate job i to machine k after the last job.

Fig. 1. Mechanism of the LPT-MM algorithm

where Cg is the total global lowest cost, Cn is the total cost of the neighbor solution, T0 is the initial temperature, t is the current time and tmax is the fixed time of calculations. The parameter δ is the temperature decrease factor expressed in [%]. On this basis, the parameter α for the linear schedule is calculated: α=

δT0 . tmax

(7)

Block diagram of the designed simulated annealing is presented in Fig. 3a. The algorithm running time is fixed. 3.2.

TABU SEARCH

Now we will propose a deterministic tabu search algorithm. It generates a complete neighborhood of a current solution from which it afterwards chooses the best one. Two neighbor types are proposed: swap and insert. To generate the initial solution the LPTMM algorithm (see Fig. 1) is used. The tabu search algorithm uses local search with a short term memory, called tabu list, that is organized as FIFO (First In First Out). Tabu list stores arcs (i, j, k, l) that means job j immediately follows job i on machine k where jobs i and j are respectively in positions l − 1 and l. When i = 0 job j is in the first position on machine k and l = 0. Block diagram of the proposed tabu search is shown in Fig. 3b. 3.3. GENETIC ALGORITHM

For more information on genetic algorithms (GAs), we refer to [5]. Due to many different approaches to the considered algorithm which have been presented in the literature, this section states exact parameters of the implemented one. Each iteration simulates one generation of chromosomes, where we are dealing with a population of ancestors and a population of descendants. For the number of ancestors as n we are about to get n(n − 1) descendants and their set is generated by mating every possible 12

START • The given set of parents. LOOP • Choose one parent randomly. • Find the first job on the chosen parent which has not been placed on the descendant yet. • If the machine assigned to that job is the same on both parents, then make this same selection on the job, too; else choose one of the machines of the parents randomly and assign it to the job. • Place the job-machine pair on the first empty position of the descendant. • If all genes of the descendant chromosomes are set, STOP; else proceed to the next gene on the descendant and repeat LOOP.

Fig. 2. MCUOX crossover algorithm

pair in the ancestors set (chromosomes i, j: 1 ≤ i, j ≤ n, i 6= j). What was proved experimentally in [6] for the problem examined here simplified by setting the due dates of all jobs to 0, usage of popular PMX crossover in the GA would make it impossible to find satisfying results. Therefore, we implemented of a multi-component uniform order-based crossover operator (MCUOX) proposed in [7] where a single gene accomodates both an object and the associated selection for that object. Thus, for our case each gene corresponds to a job-machine pair. Construction of a descendant from two parent chromosomes is presented in Fig. 2. The crossover is executed with a fixed probability, thus, sometimes the descendant chromosome is simply a copy of its first ancestor. For mutation two mechanisms are used: the swap and the bit one. First of these randomly chooses two positions on a chromosome and replaces their contents. The second one incorporates reassignment of a random job to a randomly chosen machine in a chromosome. Probability of selecting chromosomes for crossover from a population of descendants is proportional to the value of their fitness function and for a given chromosome is defined as Fg , (8) Pg = G X Fg g=1

where G being the size of the population and Fg = CTc (g)−1 being the fitness function of chromosome g, inversely proportional to the objective one where C is a fixed constant. 13

The block diagram of GA is presented in Fig. 3c. 3.4.

ANT COLONY OPTIMIZATION

Ant algorithms are a new promising approach to solving optimization problems based on the behaviour of ant colonies. First ant systems were developed to attack the problems presenting their similarity to ant colonies behaviour such as the TSP [8], QAP [9] or VRP [10]. Later they have been extended for the problem of JS but the literature on this topic is rather limited. Some metaheuristic methods applied for similar scheduling problems were analysed in [11]. In every generation each of m ants constructs a solution. It iteratively chooses a random job and places it on a random machine at the first empty position. The probability that the ant makes a step, which means placing job j after job i on machine k is defined as (τijk )α (ηijk )β ) (9) Pijk = P α β j∈J,k∈M (τijk ) (ηijk )

where τijk is the amount of pheromone on that choice. We also introduce the heuristic value ηijk , as proposed in [13] which is the cost calculated for execution of job j on machine k in this case. Exponents α and β are constants determining the relative influence of respectively the pheromone value and the heuristic on the decision of the ant. After all ants have finished their paths some of the old pheromone is evaporated by multiplying it with a factor 0 < ρ < 1: ∀i,j∈J,k∈M τijk ← τijk · ρ

(10)

This avoids the old pheromone from having too strong influence on future decisions. Finally mb best ants, where mb ≤ m add some pheromone to every step they have made on their tours. The amount of pheromone applied to a single step is equal Q/Tc , where Tc is the objective function value of the solution found by the ant and Q is the objective function value of the solution found by LPT-MM, described in Fig. 1 multiplied by a constant value λ: ∀i,j∈J,k∈M (xijk = 1) τijk ← τijk +

Q . Tc

(11)

To prevent from reducing some steps to 0 and making the probability of other ones too large we define the minimum (τmin ) and maximum (τmax ) pheromone values of each step. The block diagram is presented in Fig. 3d.

(a) simulated annealing

(b) tabu search

(d) ant colony optimization

Fig. 3. Block diagrams of the implemented metaheuristics

4. EXPERIMENTATION SYSTEM According to the objectives of this paper, this section contains a conception of analysis, which includes measuring the dependence of metaheuristic algorithms efficiency upon the input parameters. Figures presented in this section contains block diagrams compliant with the rules defined by the theory of systems. Input parameters used by the objects modelled in this chapter are presented in Table 1. Since no benchmark problems were found in the literature for the problem in particular shape considered here, random generation of instances seemed to be the only way of preparing their large number required for the experiments. The detailed plan of experimentation is as follows: 15

i). Perform the calibration of the internal input parameters for all four metaheuristics. Simultaneously measure the sensitivity of the algorithms to the variation of such parameters. First, generate a set of random instances large enough with given parameters (Fig. 4) and hand them to the algorithms. Then solve all instances Table 1. Input parameters used in figures Symbol T0 δ Λ G Ms , Mb X τmin , τmax ρ m β tC tI n, m wmax , pmax , dmax

Description initial temperature temperature decrease factor tabu list size size of population probability of swap / bit mutation probability of crossover minimal / maximal pheromone value evaporate factor number of ants exponent used in eq. 9 (α = 1) time of calculations the algorithms is interrupted for no improvenent in this time number of jobs / machines upper bound of job weights / processing times / due dates

Fig. 4. Analysis of algorithms sensitivity to input parameters

Used by simulated annealing simulated annealing tabu search genetic algorithm genetic algorithm genetic algorithm ant system ant system ant system ant system all algorithms all algorithms instance generator instance generator

Fig. 5. Analysis of algorithms sensitivity to instance size

repeatedly using different values of internal input parameters. ii). Inspect the efficiency variation of the algorithms for different instance sizes. Repeatedly generate large sets of instances with fixed upper bounds (wmax , pmax , dmax ) and different jobs and machines numbers (n, m). Then pass each set to the four metaheuristics and engage them with the same calculation times (tC , tI ) and the input parameters, received from the previous experiment (Fig. 5). iii). Measure the sensitivity of four metaheuristics to the variation of upper bounds of parameters describing the instances (wmax , pmax , dmax ). For this generate sets of instances with different values of these bounds and fixed jobs and machines numbers (n, m). Then hand these sets to the four algorithms and engage them with the finetuned internal input parameters (Fig. 6).

Fig. 6. Analysis of algorithms sensitivity to instance shape

5. RESEARCH All tests were run on a PC with CPU Intel R Celeron R 2.66GHz, 1024MB RAM, Microsoft R Windows R Server 2003 Enterprise Edition operating system and .NET Framework 2.0. 5.1.

PARAMETERS CALIBRATION

An external simulated annealing algorithm has been designed to manage the the finetuning process used in the application. The finetuned parameters of all four considered algorithms for example instances are presented in Table 2 (for tC = 10s and tI = 5s). The calculation time tC is the amount of time after which the algorithm is stopped unconditionally. The interruption time tI is the amount of time after which the algorithm is interrupted only if there is no improvement of the objective function of the best solution found. Symbols in the tables and in the following figures are the same as used in Table 1. The average efficiency evolution for varying selected execution parameters of some instance sizes is presented in Fig. 7-10 which are collected in Table 3. The average 18

Table 2. Finetuning results for 10s (5s) time n 150

100

m 15 10 5 15 10 5 15 10 5

T0 13 16 17 23 18 23 20 21 20

SA δ[%] 85 94 94 95 96 96 97 95 96

TS Λ 3 5 8 8 12 14 11 12 12

G 28 30 32 32 38 36 33 34 39

Ms 0.15 0.09 0.14 0.08 0.16 0.12 0.10 0.09 0.10

GA Mb 0.14 0.12 0.08 0.10 0.13 0.14 0.13 0.13 0.10

X 1.00 1.00 1.00 1.00 0.95 1.00 1.00 1.00 0.97

τmin 100 100 150 200 100 150 150 100 100

τmax 1200 1000 1100 800 1000 1200 1200 1100 1000

ACO ρ 0.85 0.80 0.85 0.90 0.81 0.75 0.90 0.80 0.78

m 10 10 12 24 41 38 37 36 48

β 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

Table 3. Finetuning progress Alg.

tC [s]

tI [s]

Finetuned parameters

TS TS

100 50

10 5

10 10

5 5

100

0.5

Λ = {1, 2, 3, . . . , 20} Λ = {1, 2, 3, . . . , 20} T0 = {12, 14, 16, . . . , 28}, δ = {84%, 86%, 88%, . . . , 100%} T0 = {14, 16, 18, . . . , 28}, δ = {86%, 88%, 90%, . . . , 100%} G = {24, 26, 28, . . . , 42}, X = {0.92, 0.93, 0.94, . . . , 1.00} G = {31, 33, 35, . . . , 43}, X = {0.93, 0.94, 0.95, . . . , 1} Ms = {0.02, 0.04, 0.06, . . . , 0.20}, Mb = {0.02, 0.04, 0.06, . . . , 0.20} Ms = {0.02, 0.04, 0.06, . . . , 0.24}, Mb = {0.02, 0.04, 0.06, . . . , 0.2}

100

ρ = {0.75, 0.77, 0.79, . . . , 0.95}, m = {25, 27, 29 . . . , 45}

ρ = {0.70, 0.72, 0.74, . . . , 0.90}, m = {30, 32, 34, . . . , 50}

100

0.5

τmin = {25, 50, 75, . . . , 200}, τmax = {850, 900, 950 . . . , 1200}

0.5

τmin = {25, 50, 75 . . . , 200}, τmax = {850, 900, 950, . . . , 1200}

Constant param. -

Fig. 7a 7b

Ms = 0.1, Mb = 0.1 Ms = 0.1, Mb = 0.1 G = 30, X =1 G = 15, X = 0.9 τmin = 100, τmax = 1000, β = 0.5 τmin = 100, τmax = 1000, β = 0.5 ρ = 0.85, m = 18, β = 0.5 ρ = 0.9, m = 15, β = 0.5

9a 9b 9c 9d 10a

10b

10c

10d

objective function deterioration factor dTc = Tc∗ /Tc · 100%, presented in z-axis of the figures is calculated by dividing the best objective function value in the examined argument space Tc∗ by the current objective function value. In x-axis and y-axis (x-axis only 19

20 Average Tc deterioration [%]

Average Tc deterioration [%]

60 50 40 30 20 10 0 0

10 Tabu list size

(a) 100 jobs & 10 machines

10 Tabu list size

(b) 50 jobs & 5 machines Fig. 7. Finetuning of TS

80 Average T deterioration [%]

35 40

Average T deterioration [%]

25 20

0 100

95 25

rat

ure

0 100

15 Te mp e

cre a

se [

20 15

85 10

era

tur ed

ec rea

se [

ture

pera

l tem Initia

(a) 100 jobs & 10 machines

ratu l tempe

Initia

(b) 50 jobs & 5 machines Fig. 8. Finetuning of SA

for the tabu search) different values of the finetuned parameters are presented. The average deterioration of the objective function in comparison with the simulated annealing is shown in the z-axis (y-axis for the tabu search). Figures proove strong sensitivity of all implemented algorithms to the input parameters. Also, going through them we can reach the conclusion that approximating that dependence with a mathematical function turns out to be infeasible. An external calibration algorithm appears to be the only conclusion. The following conclusions emerge: • The value of the temperature decrease factor (SA) equals around 95% (Fig. 8) but using too large (≥98%) value leads to dramatical deterioration. Influence of the 20

Average T deterioration [%]

6 3.5 4 3 2

2.5

0 0.92 0.93 0.94 0.95 0.96 Cr os 0.97 so ve rp 0.98 ro ba 0.99 bil ity

5 10

0 0.93

3 0.94

1.5

30 2

0.95

os s

ov er

ize

on s ulati

Pop

35 0.97

0.5

35 1

0.96

(a) G and X, 100 jobs & 10 machines

0.98 pro 0.99 ba bil ity

40 1

pu Po

on lati

siz

(b) G and X, 50 jobs & 5 machines

Average T deterioration [%]

1.6 1.4

5 10 5

1.2 1

2 1

0.8

0 0

0.05

Average T deterioration [%]

4.5

0.2

0 0.1 0.15 0.2

0.05

0.15

ity

0.2

bil

abil

a Sw

0.4

pro b

pro

0.15

ob pr

ion

tati on

ion

0.15

0.1

tat

ity bil

0.1 0.1 Bit

0.6

0.05 0.05

Bit

0 0

0.2 0.25

Swap

y obabilit

ion pr

mutat

(d) Ms and Mb , 50 jobs & 5 machines

Fig. 9. Finetuning of GA

initial temperature is not so strong but setting it precisely will improve the solution quality by a few percentage points. • Influence of the tabu list size in TS upon the algorithm effectiveness sometimes reminds the normal distribution (Fig. 7a) with only one local optimum which is quite easy to locate but can also be monotically decreasing (Fig. 7b). • The crossover probability of GA should be at least 0.95 but if there is no time for finetuning setting its value to 1 should not cause strong effectiveness deteriorarion. On the other hand, the population size G is a crucial aspect. Following Fig. 9a we can see three local optima arranged in a line. The area of swap and bit mutation probabilities is strongly diversified, however keeping 0.1 < Mb , Ms < 0.2 should guarantee that the objective function value will not deteriorate by more that 5%. 21

Average Tc deterioration [%]

Average T deterioration [%]

2.5 2.5 4 2

2 4 2

32 34

1 35

1.5

0 30

1.5

0.8 An

ts no

pora

Eva

0.95

0.75

0.5

0.8

0.85 0.9

An t

r facto

(a) ρ and m, 100 jobs & 10 machines

0.7

0.75

cto

0.85

ra apo

te fa

0.9

(b) ρ and m, 50 jobs & 5 machines

2 1.8

Average T deterioration [%]

1.4 2 1.2 1 1

0 1200 1150 1100 1050 M 1000 a

er om on

0.8

0 1200

0.6 200 150

200

100 50

850 800

Min.

0.2

x. p

100

phero

900

rom

1 e on

e mon

150

1100 0.4 1000

950 900

Average T deterioration [%]

5 1.6 3

800

rom

he .p Min

(d) τmin and τmax , 50 jobs & 5 machines

Fig. 10. Finetuning of AS

• The value of β in AS was for most cases fixed at 0.5. The minimal values of the objective function in the space of evaporate factor ρ and ants number m in AS are often surrounded by maxima (Fig. 10ab) and therefore are hard to find for the finetuning algorithm based on SA. However, setting them improperly worsens the quality in less that 2%. Most of the local optima are located for the 0.82 < ρ < 0.88, (Fig. 10a). While τmin = 50 and τmax = 950 are the pheromone lower and upper bounds for m = 5 and n = 50 (Fig. 10d), these values indicate a local maximum for m = 10 and n = 100 (Fig. 10c). 5.2.

INFLUENCE OF INSTANCE SIZE

After the finetuning phase is completed we can proceed to measure the sensitivity of the four developed algorithms to the size of input instances with the calibrated internal 22

parameters which will guarantee their highest possible effectiveness. The size of an instance is described by two factors: jobs number n and machines number m. Most conducted research involved 100 repetitions of each algorithm with the same input characteristics to calculate the mean value of the objective function. Only for some larger instances which needed ten minutes and more to finish a single optimization experiment that number was reduced to 50. The average evolution of the objective function for different instance sizes and tC = 10s, tI = 5s is presented in Fig. 11. Decreasing the number of jobs is accompanied by the approach of tabu search quality to simulated annealing. It finally overtakes the SA for some time for n = 50. Since in all conducted experiments the simulated annealing algorithm was predominant over its rivals for the whole time, we will use its current objective function value as a reference point for comparing the three other ones. Hence

12000

2.5 TS GA SA AS

10000

1.5

6000

8000

x 10

1 4000 0.5

2000 0 0

0 0

time [s]

(a) 100 jobs & 10 machines

(b) 100 jobs & 5 machines

5000

6000 5000

4000

4000 c

3000 3000

2000 2000 1000 0 0

1000 2

0 0

time [s]

6 time [s]

(d) 50 jobs & 5 machines

Fig. 11. Objective function evolution

3.5 5

3 10

3 8

2.5 2

5 6

1 0 100

0 100

1.5 80

4 80

10 60 6

2 0

(a) SA-TS

2 0

10 60

(b) SA-GA

1.1 14 1.05 15

1.4

1.3 10

1.2

0.95

1.1

0.9

0.85

0.9 0.8 6

0.8

0.7

0 100

0.75 80

10 60

8 4

20 N

2 0

80 4

100

2 40 8

0.65

20 10

0.7

60 6

(d) AS-GA

Fig. 12. Efficiency margin of the metaheuristics

the quality of TS, AS and GA will be expressed as the proportion of their objective function to the objective function of SA. Successive research was conducted to compare the quality variation of four algorithms for the changing size of an instance. Superiority of SA over three other algorithms is presented in Fig. 12abc, and superiority of AS over GA in Fig. 12d where F is the proportion mentioned above, m is the number of machines and n is the number of jobs. Figures show that the implemented algorithms are strongly sensitive to the instance size but this sensitivity shows some level of linearity and therefore seems to be predictable. Both local search algorithms proove to be more suitable for the examined range of parameters. The quality of evolutionary ones is comparable. The following conclusions appear: • The proportion of tabu search to simulated annealing quality for 10s of calcula24

tions is maintained in the level of approx. 1.3 for all instances except the ones with 90 jobs and more (Fig. 12a). This is caused by rapid growth of calculations for the tabu search. For example the number of jobs n = 80, we have a neighborhood size equal 32 n(n + 1), which is 9720. After adding 20 more jobs (25%) this size will grow to 15150, which is 56% more. • For the genetic algorithm, its efectiveness comparable to the SA is remarkable only for m = 1. However, there is also an interesting increase of quality for m ≥ 2 (Fig. 12b). Low values of F for n = 10 is irrelevant since 10s of calculations is definitely enough to find the optimal or near-optimal solution in this case. The quality of GA starts to raise for m ≥ 5 and this increase is stronger for a longer calculation time. • The ant colony optimization seems to be dependent upon both jobs and machines number, too. It differs from the GA in the location of the weakest point which is also smoother (Fig. 12c). Finding the explanation for why a strong deterioration of evolutionary algorithms is noticed for some interval in the machines number could be the topic of a separate paper. • The margin between the quality of AS and GA shows some continuity (Fig. 12d) for the growth of calculation time. Superiority of the ant system emerges for the decreasing number of machines and the increasing number of jobs. This emergence shows a high level of linearity. 5.3.

INFLUENCE OF INSTANCE SHAPE

The following subsection contains results of the comparative study conducted to inspect the influence of the instance parameters upper bounds on the quality of four investigated algorithms. These upper bounds are: the maximal job weight wmax , the maximal job processing time on a machine pmax , and the maximal job due date dmax . Table 4. Influence of instance shape n 100 50 100 50 100 50

m 10 5 10 5 10 5

tC [s] 10 10 10 10 10 10

tI [s] 5 5 5 5 5 5

wmin 0 0 0 0 0 0

wmax 5 to 50 5 to 50 10 10 10 10

dmin 0 0 0 0 0 0

dmax 50 50 10 to 100 10 to 100 500 500

pmin 0 0 0 0 0 0

pmax 10 10 10 10 10 to 100 10 to 100

Fig. 13a 13b 13c 13d 13e 13f

x 10

2.5 SA TS GA AS

2 1.5

0.5

x 10

10 15 20 25 30 35 40 45 50 wmax

(a) wmax , 100 jobs & 10 machines

10 15 20 25 30 35 40 45 50 wmax

(b) wmax , 50 jobs & 5 machines

x 10

10000 8000

1.5

6000 1

4000 0.5

2000 0

10 20 30 40 50 60 70 80 90 100 dmax

(d) dmax , 50 jobs & 5 machines

10 20 30 40 50 60 70 80 90 100 dmax

x 10

5 10 Tc

3 2 1

10 20 30 40 50 60 70 80 90 100 pmax

(e) pmax , 100 jobs & 10 machines

10 20 30 40 50 60 70 80 90 100 pmax

(f) pmax , 50 jobs & 5 machines

Fig. 13. Objective function dependence upon the instance shape

The lower bounds were set to default values wmin = 1, pmin = 1, dmin = 0. Variation of efficiency was monitored for different values of upper bounds. That issue could be also comprehended as the shape of instances. Figures illustrating such influence are 26

collected in table 4 which also containts the detailed input parameters in all conducted experiments. wmax and wmin are respectively the maximum and minimum values of job weight, dmax and dmin - the maximum and minimum job due date, pmax and pmin - the maximum and minimum job processing time on a selected machine. n and m are respectively jobs and machines number, tC and tI - calculation and interruption time. Fig. 13 proves that changing the instance shape by modifying the maximal values of job weights, job due dates and job processing times on the machines does not strongly influence the effectiveness of four investigated algorithms. Quality margin between the four algorithms remains constant in all experiments.

6. CONCLUSIONS For the considered problem four metaheuristic algorithms were implemented to deal with such a problem. Three of these are the simulated annealing (SA), the tabu search (TS), and the genetic algorithm (GA) which have become very popular in recent time. This is proved by large body of literature covering wide perspectives of their implementation [4, 5, 7, 12, 14]. The fourth algorithm was the ant colony optimization (ACO) [8, 9, 10, 11]. Complex computational experiments have been conducted on randomly generated instances to compare the performances of the implemented metaheuristics. Comparison of the results obtained by four developed metaheuristics with the optimal ones was impossible and the only option was to compare them with each other. Sensitivity of the four algorithms to the values of the input parameters which probably cannot be approximated due to its complicated distribution. Therefore, tuning is essential each time one of the algorithms is engaged. The exception is the tabu list size in the tabu search which can be foreseen as proposed in [14]. No dependence upon the range of parameters describing the jobs and the machines of a single problem instance. Strong influence of the number of jobs and the number of machines on the effectiveness of the used algorithms. This influence showed a high level of linearity.

REFERENCES [1] GAREY M.R., TARJAN R.E. and WILFONG G.T., One-processor scheduling with symmetric earliness and tardiness penalties. Mathematics of Operations Research, vol. 13, 1988, pp. 330–348. [2] YANO C.A. and KIM Y., Algorithm for a class of single machine weighted tardiness and earliness problems. European Journal of Operational Research, vol. 52, 1991, pp. 167–178.

[3] SUN H. and WANG G., Parallel machine earliness and tardiness scheduling with proportional weights. Computers & Operations Research, vol. 30, 2003, pp. 801–808. [4] CAO D., CHEN M. and WAN G., Parallel machine selection and job scheduling to minimize machine cost and job tardiness. Computers & Operations Research, vol. 32, 2005, pp. 1995–2012. [5] DAVIS L., Handbook of genetic algorithms. Van Nostrand Reinhold: New York, NY, 1991. [6] BAZYLUK M., KOSZALKA L. and BURNHAM K.J., Using heuristic algorithms for parallel machines job scheduling problem. Proceedings of the 6th PBW, 2006, 9–29. [7] SIVRIKAYA-SERIFOGLU F. and ULUSOY G., Parallel machine scheduling with earliness and tardiness penalties. Computers & Oper. Research, vol. 26, 1999, pp. 773–787. [8] DORIGO M. and GAMBARDELLA L.M., Ant colonies for the travelling salesman problem. BioSystems, vol. 43, 1997, pp. 73–81. [9] MANIEZZO V. and COLORNI A., The ant system applied to the quadratic assignment problem. Knowledge and Data Engineering, vol. 11, 1999, pp. 769–778. [10] BULLNHEIMER B., HARTL R.F. and STRAUSS C., An improved ant system algorithm for the vehicle routing problem. Annals of Oper. Research, vol. 89, 1999, pp. 319–328. [11] LIAO C. and JUAN H., An ant colony optimization for single-machine tardiness scheduling with sequence-dependent setups. Comp. & Oper. Research, vol. 34, 2007, pp. 1899–1909. [12] RABADI G. and ANAGNOSTOPOULOS G.C., A heuristic algorithm for the just-in-time single machine scheduling problem with setups: a comparison with simulated annealing. International Journal of Advanced Manufacturing Technology, vol. 32, 2007, pp. 326–335. [13] MIDDENDORF M., REISCHLE F. and SCHMECK H., Multi colony ant algorithms. Journal of Heuristics, vol. 8, 2002, pp. 305–320. [14] BILGE U., KIRAC F., KURTULAN M. and PEKGUN P., A tabu search algorithm for parallel machine total tardiness problem. Computers & Operations Research, vol. 31, 2004, pp. 397–414.

Computer Systems Engineering 2008 Keywords: nesting, packing problem, cutting problem, metaheuristic algorithm, Tabu Search

Paweł BOGALIŃSKI* Iwona POŹNIAK-KOSZAŁKA* Leszek KOSZAŁKA* Piotr SKWORCOW†

THE TWO DIMENSIONAL IRREGULAR-SHAPED NESTING PROBLEM

In this paper, we analyse the nesting problem that is faced by manufacturing processes where cutting and packing are involved. It concerns a non-overlapping placement of two-dimensional irregular shape objects on a bounded area. We describes new algorithms for solving the nesting problem. The first algorithm is based on Tabu Search and the second algorithm, called Shaking Algorithm, is proposed by the authors. Both algorithms were implemented and tested using the proposed experimentation system.

1. INTRODUCTION The nesting problem concerns the placement of a number of irregularly shaped objects on a two-dimensional fixed area, such that the objects do not overlap. Nesting problem is also known as the strip nesting problem or irregular strip nesting problem [1]. Nesting problem occurs in variety of industries, particularly in manufacturing processes where cutting and packing operations are involved. Examples of such industries and processes include aircraft and ship construction, textile, clothing and footwear production, furniture manufacturing etc. Often the cost of the material being cut is a significant component of the total production cost and thus saving the material is an important matter. A solution of the nesting problem should therefore attempt to minimize the amount of wasted material which is equivalent to maximizing the utilization of the material. An example of a 2D nesting problem related to cutting parts * †

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. Water Software Systems, De Montfort University, Leicester, UK. 29

of clothes from a roll of fabric is illustrated in Figure 1. In this case to minimise the production cost it is necessary to maximize the utilization of the fabric or equivalently minimize the waste of fabric, while cutting all required parts of clothes.

Fig. 1. A nesting problem example – tight packing parts of Shirts

This paper concerns a variant of the nesting problem for irregular shape objects and describes two different methods developed to solve nesting problem. The paper is organized as follows: Section 2 contains problem formulation. Solution to the problem and developed algorithms are presented in Section 3. Section 4 contains a description of experimentation system developed to test the performance of algorithms for solving nesting problem. Section 5 provides final remarks. 2. PROBLEM DESCRIPTION The term nesting has been used to describe a wide variety of two dimensional cutting and packing problems. All such problems involve a non-overlapping placement of a set of irregular two-dimensional shapes within a defined region of twodimensional space. Most problems can be categorized as follows [2]: • Decision problem. Decide whether a set of shapes fit within a given region. • Knapsack problem. Given a set of shapes and a region, find a placement of a subset of shapes that maximizes the utilization (area covered) of the region. • Bin packing problem. Given a set of shapes and a set of regions, minimize the number of regions needed to place all shapes. 30

•

Strip packing problem. Given a set of shapes and a width W, minimize the length of rectangular region with width W such that all shapes are contained in the region. This paper concerns a strip packing (nesting) problem, i.e. the objective is to find a non-overlapping placement of shapes within the bounds of the material, such that the length L of required material is minimal. This is equivalent to the maximization of utilization of the material. Utilization, denoted U, of the material for a given solution can be expressed as: (1) where S denotes sum of areas of all shapes. The terms length and width are traditionally used in the nesting problem literature. In this paper the term height, denoted h, is used as an equivalent of length, thus the aim of the optimization is minimization of the height of required material, see Figure 2.

Fig. 2. Strip nesting problem - problem statement

The aim of this work is to formulate algorithms for solving nesting problem and develop and implement an application for evaluation of performance of the implemented algorithms, leading to establishing what is the best approach for solving nesting problem. The main criterion when evaluating the efficiency of the algorithms implemented is the height of material required to place a defined set of shapes. 3. SOLUTION TO THE PROBLEM Solving the nesting problem progresses in three stages: 31

1. 2.

Quantization – this stage is representation of the shapes as maps of pixels. Each shape is characterized by array of bits. Optimization – this stage is the main and the most important step. During this stage an optimal or near-optimal solution is found. The nesting problem is NPhard even fort rectangular shapes (and material) with no rotation allowed [1,2]. A complete overview is not practical, thus to find a good placement of shapes some alternative or heuristic methods should be used. In this paper two methods are presented: Tabu Search algorithm and a method called “Shaking”. Printing – this stage is the process of conversion of the array of bits representation into graphical shapes. 3.1 QUANTIZATION

Quantization is necessary to process images representing real objects (parts of clothes, metal elements etc.). Each image is translated to a raster model, i.e. polygons are represented by matrices [2]. An example of a polygon and its raster model equivalent is illustrated in Figure 3.

Fig. 3. Polygon and its equivalent raster model. Source: Nielsen B.K., Odgaard A., Fast Neighbourhood Search for the Nesting Problem.[2]

In a raster model, each pixel of an image is represented by two coordinates, therefore each shape to be placed on a material is represented by matrix of coordinates. The resolution of each raster model is fixed. 3.2 TABU SEARCH ALGORITHM

Tabu search is a mathematical optimization method. Tabu search enhances the performance of a local search method by using memory structures: once a potential 32

solution has been determined, it is marked as "taboo" so that the algorithm does not ‘visit’ that possibility again [3]. For further information about Tabu search see [9-10]. Tabu search algorithm has been applied to solve a wide range of optimisation problems, such as job-shop scheduling or Travelling Salesman Problem (TSP). Application of Tabu search method to solve the strip nesting problem is based on an idea of expressing the strip nesting problem as a TSP. Each city in TSP is represented by a shape in nesting problem while each route from the starting city to the destination city (sequence of cities) in TSP is represented by a sequence of shapes on the material in nesting problem. Using such formulation, finding the best solution for nesting problem involves finding an optimal sequence of shapes on the material. Each shape is identified by a unique number and each placement of shapes is represented by a sequence of numbers. Order of numbers in the sequence corresponds to their position on the material in the following manner: the first number in the sequence means that the shape identified by this number is placed in the right bottom corner of the material, the shape corresponding to the following number in the sequence is placed on the left of the first shape and so on. For each sequence of shapes the algorithm is searching the neighbourhood of permutation (type swap), and when a minimal value is found (local minimum of height of the material used), the information about the move which resulted in this solution, is placed in the taboo list. The moves already placed in the taboo list are not allowed to avoid oscillations around local optimum. Tabu search algorithm starts searching from a random sequence describing placement of shapes. Subsequently, the algorithm checks all solutions in the neighbourhood of the current sequence. From this neighbourhood the algorithm chooses the best solution, i.e. the one corresponding to the minimal height of the material used. This selected solution becomes the base solution, so at the next step the neighbourhood of this solution is checked and so on. 3.3 SHAKING ALGORITHM

The second algorithm developed and implemented by the authors is based on analogy to our daily lives. Given a set of specific objects which we want to arrange to pack them in a bag, we usually arrange them according to some scheme (algorithm), e.g. heavier objects on the bottom, lighter objects on top, with all objects packed tightly to maximize the utilization of space. We can also put everything in the bag randomly and to make some free space, shake energetically the bag. The concept of shaking is applied to nesting problem in a manner described below. For each shape its physical properties like mass and momentum are considered. An area is defined where the shapes are allowed to move and a simulated gravitation field is applied. The shapes can change their position and oscillate in relation to their initial 33

location. Each shape has assigned a speed vector, which is changing due to the simulated gravitation field and due to interactions between the shapes. When a shape collides with another shape then the law of conservation of momentum is applied. The algorithm runs until the shapes are settled. This method can be used in an effort to improve the solution obtained from the Tabu search algorithm described in previous section, or can be applied on its own, i.e. with a random initial location of shapes. 4.

EXPERIMENTATION SYSTEM AND EXAMPLES

An experimentation system to test the developed methods has been developed in C# using Microsoft® Visual Studio 2005 Professional Edition programming environment. The developed experimentation system allows to change the main parameters of both developed algorithms. Results of all simulations are presented as a graphical representation of shapes placement and can be saved to a text file. The application interface enables to compare the results of Tabu search and “shaking” algorithms. The main window of the application interface is shown in Figure 4.

Fig. 4. The main window of application and an example comparison of results obtained for Tabu search and for Tabu search with “shaking” 34

Fig. 5. Example shapes to be placed on a material (left) and results of Tabu search algorithm (right)

5. CONCLUSIONS AND PERSPECTIVES Nesting problem is a complex computing problem which occurs in variety of industries, particularly in manufacturing processes where cutting and packing operations are involved. A solution of the nesting problem attempts to place a set of shapes on a material such that the amount of wasted material is minimal. In this paper two methods for solving the nesting problem were proposed, namely Tabu search algorithm and “shaking” algorithm. An experimentation system to assess the performance of both algorithms has been developed and tested. In this paper only example results were presented and further work on improving the experimentation system is ongoing. It is planned to further develop the algorithms, e.g. by allowing rotation of shapes, and to consider different quality measures. REFERENCES [1] NIELSEN B.K., An efficient solution method for relaxed variants of the nesting problem, The Australian Theory Symposium 2007, Ballarat, Australia. [2] NIELSEN B.K., ODGAARD A., Fast Neighbourhood Search for the Nesting Problem, Technical Report no. 03/02, DIKU, University of Copenhagen, DK-2100 Copenhagen, Denmark, February 14, 2003 [3] GENDREAU M. http://www.ifi.uio.no/infheur/Bakgrunn/Intro_to_TS_Gendreau.htm – An Introduction to Tabu Search, 15 January 2008. [4] GLOVER, F. Tabu Search — Part I, ORSA Journal on Computing 1989 1: 3, 190-206. [5] GLOVER, F. Tabu Search — Part II, ORSA Journal on Computing 1990 2: 1, 4-32.

Computer Systems Engineering 2008 Keywords: Computer simulation, algorithms, scheduling

Bartosz CZAJKA* Iwona POŹNIAK-KOSZAŁKA*

SCHEDULING IN MULTI-PROCESSOR COMPUTER SYSTEMS – VARIOUS SCENARIO SIMULATIONS

In the paper, we analyse some multi-processor scheduling problems. Since such problems are NPhard, therefore, we provide approximation algorithms that are based on the well known metaheuristics. Furthermore, we provide a dedicated experimentation system that allows to evaluate different solution algorithms.

1. INTRODUCTION Schedulers are use in almost every science discipline such as mathematics, biology, economy and especially in computer science. Recently, computer companies propose new excellent hardware and software solutions for home or business use. Almost every new machine is based on more than one processor or core. It becomes a standard that we use dual core or quarto core computers, the question is how to operate on many CPUs or cores, effectively. Multiprocessor scheduling [2] is a very complex problem [9]. Having many processors and many queues we desire to answer the question how to schedule tasks in an efficient way for making operating systems [5] more productive. Choosing the best algorithm for a given category of scheduling problem can be done on the basis of simulation. The first version of the experimentation system that is considered in this paper, was presented in [1]. In this paper, we focus on the developed version of that system called MESMS2 (Multilevel Experimentation for Simulation of Multiprocessor Scheduling). It is based on a concept of a virtual multiprocessor simulation. Using MESMS2, we can test algorithms on many virtual CPUs. Moreover, during simulation, user can choose which queue has to be executed (number of queues is also limited by user). Furthermore, this paper give an information about architectures used in multiprocessor systems and reveals some tests based on MESMS2. *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. 36

The paper gives some basic information about scheduling [3], [4]. Then, we present the MESMS2 system, in particular its opportunities and environment parameters. In the next part of the paper we show, step by step, how to make an image of real testing environment. The last part concerns investigations made using the MESMS2 system, on different simulation scenarios including presentation of some results, discussion and conclusions. 2. PROBLEM DESCRIPTION The general concept of scheduling is presented in Fig.1, where each CPU (C) decides which process (P) is picked to be executed. The considered problem can be formulated as follows: P = 1,2,….… NPQ – processes in Q queue I = 1,2,……. NIP– Instructions in process P

Threads

I120 C112

Q = 1,2,…….NQ – queues C272

A = 1,2 …….NA – algorithms C = 1,2,…….NCAQ – CPU C using A algorithm picking process P from Q queue

P202

C361

A1 Q2 A7 A6

Fig. 1. The concept idea of scheduling

During simulation, the system determines simulation variables (some of them can be regarded as indices of performance) which need to be optimized. First, we introduce simulation variables: TQ – the total simulation time for each queue Q [ms], tP – the waiting time of process P in queue Q [ms], tIP– the time of instruction I in process P [ms], tACn – the duration of the scheduling algorithm A for n iterations and CPU C [ms], tAQ – the duration of the algorithm A for each queue Q [ms]. Next, the optimized objectives are defined. The main objective is to choose the inner parameters such that the following are minimized:

F1 = T1 + T2 + ⋅ ⋅ ⋅ + TQ - total simulation time P1

i =1

F2Q = ∑ ti + ∑ ti + ⋅ ⋅ ⋅ + ∑ ti

- processes waiting time for one queue

F3 = ∑ F2i - total processes waiting time i =1 N

i =1

F4C = ∑ t A F5 = ∑ F4i - total algorithms durations

3. SIMULATION - SCENARIOS Two typical computer architecture will be consider, each of them as a new individual simulation scenario. SMP (Symetric Multiprocessing) and MPP (Massively Parallel Computing) are two most using architecture in computer systems, SMP is shown in Fig. 2.

Fig. 2. SMP – architecture

In this approach every CPU shares memory through a bus communication, since the only one operating system exists. This architecture can also be applied to multi core processors e.g., Dual Core.

Fig. 3. MPP - architecture

The second scenario is less complicated and its concept is show in Fig.3. Each CPU has its own memory which is not shared. Moreover operating system is dedicated for a single processor. Using this approach it is very important that the data which is proceed should be prepared for separable processors e.g., database searching – each CPU is searching a different letter. 4. ALGORITHMS In this paper, we focus on four different scheduling algorithms. The first is based on some priority – scheduler chooses process by its priority (e.g. processing first a process that is more import for the system). The second idea depends on arriving time (e.g. First Come First Serve). The third idea prefers processes with some parameters or character of information they carry (e.g. SJF, where processes are handled according to their sizes). The fourth group contains algorithms which do not depend on process information and structure (e.g. Round Robin). The detailed description of these algorithms can be found in literature, e.g. [1], [2] and [4]. The proposed system MESMS2 gives opportunity for using the number of 10 scheduling algorithms, including algorithms such as Priority (P), First Came – First Served (FCFS), Shortest Job First (SJF), Round-Robin(R-R). The other available algorithms are hybrid algorithms composed from elements belonging to those four groups. It is worth mentioning that using the program modularity, it is very easy to add additional algorithms to MESMS2 by its proper implementation [6].

5. EXPERIMENTATION ENVIRONMENT 5.1. APPLICATION

The MESMS2 system was designed and implemented in C# language [7]. As an implementation environment Visual Studio 2005 [8] was chosen. The application can be run on any computer with operating system based on Windows family but Windows 2003, XP Professional, Windows Vista are recommended. We also considered hardware configurations and we realized that multi-core processors would increase simulation efficiency and would give more trusted results. MESMS2 is organized in six main modules, including experiment design module, CPU generator, queue generator, synchronization module, results collector and presentation module. At the beginning of the simulation a user sets inner parameters (every created queue or CPU can have own specified parameters). CPU speed is given in percentage scale (0% -200%). Moreover, one need to define queue which would be an initial working object for processor. Position “Algorithm” gives opportunities to decide the way how we pick process from queue. Considering queues: we are able to control its capacity by determining the number of processes and their parameters, which are to be used for generating processes in the Queue Generator Module. The obtained results are converted and some plots are generated to present behaviour of the processes. 5.2. EXPERIMENTS

We compare two presented scenarios (SMP, MPP) by creating virtual experiments on MESMS2. Environment were created by special application module. Experiment 1 is based on SMP solution, experiment 2 on MPP, 135 processes were created and their parameters [1] are randomly generated. Simulation inner parameters are as follows: • number of processes: 120, • priority range: 1-10, • instructions range: 2-5, • instruction duration range: 1s – 2s, • algorithms: SJF, Round-Robin, Priority, FCFS, • processors homogenous: 4. Fig. 4. presents inner parameters, axis x show each processes axis y its values divided into instructions. 40

Fig. 4. Processes size - Inner Parameters

It can be seen that processes are regular sizes around 4s , and we have only one queue with 130 tasks. Fig. 5 presents priority composition for each process.

Fig. 5. Processes priorities - Inner Parameters

Next, we present as a comparison of 2 scenarios in the first SMP, only one queue exist with 120 processes in the second MPP we split those processes into 4 different queues. Each queue had the same number of processes. Simulation time graph is shown in Fig. 6 and presents step by step simulation events and Fig. 7 shows compared scenarios.

Fig.6. SMP - Simulation time

Fig.7. MPP - Simulation time graph

Comparison of simulation time plots gives the view for the processes during experiments. In MPP mode it is more regular course then in SPP, furthermore plots are sizeable and it is possible to enlarge defined areas. Next parameter which was compared is the total waiting time presented on Fig. 8 and Fig. 9. It can be seen that in MPP mode the sequence of processes has less influence on results than in SMP mode.

Fig.8. SMP - Total waiting time

Fig.9. MPP - Total waiting time

5.3. TASK SCHEDULE – CPU EXPLOIT

There is one more important parameter CPU exploit, which depends on the processor speed, and the algorithm. When we consider homogenous processors single exploit would be comparable. Fig. 10 presents results for SMP scenario: CPU 1 did 25,11% all of the tasks, rest of CPU are very similar all of the values are above 25%.

Fig.10. SMP - Tasks schedule

The second scenario MPP reveals that the total execution time is longer and equals 157546 ms and the exploit is about 25%. Those values tell us about system stability and functionality – when one processor finishes all the task from dedicated queue it change the context and pick another from a different queue.

Fig. 11. MPP - Tasks schedule 43

6. CONCLUSIONS AND FUTURE WORK We implemented an application to test properties of algorithms for some scheduling problem. The proposed system MESMS2 let us to build customized environment that allows to create many queues and CPUs, such that all parameters can be fixed. In the our future work, we will add some new functions to the system, and develop a generation module of MESMS2. REFERENCES [1] CZAJKA B., I. POŹNIAK-KOSZAŁKA I., Comparison of scheduling algorithms in multiprocessor systems using multilevel experimentation system. Proceedings of the 16th International Conference on System Science. Vol. II, Wroclaw 2007. [2] KANN V., Multiprocessor Scheduling. http://www.ensta.fr/~diam/ro/online/viggo.html, 2008. [3] AAS J., Understanding the Linux 2.6.8.1 CPU Scheduler. Silicon Graphics Inc, (SGI) ,(ebook), 2005, pp. 10-13. [4] SILBERSCHATZ A., GALVIN P.B., GAGNE G., Foundation of Operating Systems. WNT, Warsaw 2006 (in Polish). [5] TANENBAUM A.S., WOODHULL A.S., Operating Systems Design and Implementation, Third Ed., Prentice Hall, 2006. [6] HEJLSBERG A., WILTAMUTH S., GOLDE P., The C# Programming Language. Addison Wesley Professional, 2004. [7] FERGUSON J., PATTERSON B., BOUTQUIN P., C# Bible (Paperback), John Wiley & Sons, 2002. [8] Visual Studio 2005 SDK. Development Environment Model, Microsoft. Retrieved on 01.05.2008. [9] BRUCKER P., KNUST S., Complexity results for scheduling problems. http://www.mathematik.uni-osnabrueck.de/research/OR/class/ , 2008

Computer Systems Engineering 2008 Keywords: control, dehumidification, HVAC, modelling, optimisation, simulation

Thomas DANNE† Jens G. LINDEN† Dean HILL‡ Keith J. BURNHAM†

MODELLING APPROACHES FOR A HEATING, VENTILATION AND AIR CONDITIONING SYSTEM

In this paper an approach for modelling a heating ventilation and air conditioning (HVAC) plant is developed. The model is the first step towards model based control optimisation. As the plant consists of different parts the model reflects this: each component of the plant is modelled separately within a black, grey, or white box approach. The components modelled are: mixing section (non-dynamic, white box), dehumidification unit consisting of a silica-gel desiccant wheel (discrete, black box), cooling coil unit (continuous, grey box) and the dynamics of a clean room which is supplied by the HVAC plant (continuous, grey box). In this paper, the models for the mixing section, dehumidification unit and the clean room are derived.

1. INTRODUCTION Heating, Ventilation and Air Conditioning (HVAC) plants are widely used to condition the air in clean air production areas. Although these systems are commonly used, most of the HVAC plant controllers operate in a detuned manner, which often leads to inefficient operation. The consequences of poor HVAC control are rarely catastrophic [7], which can give an impression of a satisfactory plant operation even though unnecessarily large actuator excursions lead to excessive wear and tear and increased energy consumption. Abbott Diabetes Care (ADC), an industrial collaborator of the Control Theory and Applications Centre (CTAC) at Coventry University uses over 70 separate HVAC plants to condition the air in clean production areas. The specifications for the room temper† ‡

Control Theory and Applications Centre, Coventry University, Coventry, UK Abbott Diabetes Care, Witney, Oxfordshire, UK

ature and humidity of the air are tight and must not be violated as the quality of the product produced in the areas is highly dependent on these specifications. Consequently considerable effort has been made to install reliable HVAC systems. The HVAC is an important part of the operation of the overall process and the annual energy costs for running this part of the plant alone currently amounts to some £2 m. However, rather surprisingly, even though the clean rooms have different capacities, a common control strategy is used across all plants (namely PID-control with similar gains), which indicates potential for reduction of energy costs. In recognition of this, and based on the assumption that there could be potential for improvement in performance, there is an attendant need to analyse the system from a control point of view to search for an optimised setup. For this purpose, one such plant is chosen and modelled in order to check for optimisation potential. After describing the details of the plant, the different components are itemised, modelled and validated. The various sections are briefly mentioned and more detailed descriptions are given in the following sections. A static white box model is established for the mixing section on the basis of energy and mass conservation laws [2]. The most complex component of the plant is the dehumidification unit, consisting of a silica-gel desiccant wheel. Such thermodynamic devices can be modelled using a white box technique, as shown by e.g. [3] and [6]. Also, an artificial neural network could be used to replicate the units behaviour [3]. In this investigation, an alternative approach is chosen and the unit is modelled by a linear ARX (Auto-Regressive with eXogenous inputs) model for the humidity behaviour and a bilinear ARMAX (ARX with a Moving Average noise term) representation for modelling the temperature behaviour. The room is modelled within a grey box approach. First white box models are derived from the conservation of energy and mass law for temperature and humidity. The model parameters are then estimated by minimising a specified cost function and introducing a bilinear term into the temperature model. Details on the bilinear approach may be found in [5].

2. PLANT DETAILS The HVAC plant considered supplies a room of volumetric dimension 53 m2 ×2.75 m 3 with an air flow rate of 1.389 ms . A schematic of the plant is given in Figure 1. With reference to Figure 1 the return air is extracted from the room (a) and mixed with fresh air (f). The fresh air is pre-treated by a central fresh air plant. The amount of fresh air added to the return air is regulated by a damper at a fixed level of 12%. The mixed air (b) enters the pre-cooling coil which is deactivated during normal operation of the plant. It 46

Return Air Temperature Signal Return Air Humidity Signal

from Fresh Air Plant f Damper

Exhaust c

Outside Air g d

Pre- DehumidifiCooling cation Unit Coil

Air Handling Unit

Key a return air b mixed air c pre-dehum. air d post-dehum. air e conditioned air g outside air

Room to Air Lock

Fig. 1. Schematic of HVAC plant.

is therefore not a subject of this investigation. To remove moisture, the air is progressed through the dehumidification unit. The air (post-dehumidification air) leaving the dehumidification unit (d) is dry and warm. This air is required to be cooled by the cooling coil unit (CCU), denoted by , which is integrated within the air handling unit (AHU). The AHU also contains a heating coil unit, denoted by , which is only used for cold start ups, as well as a driving fan which are both not considered in the modelling exercise here. The conditioned air (e) is then progressed to the room with a portion being drawn off to an air lock. The air supplying the air lock is considered as the exhaust from the plant. The plant is adjusted such that the amount of air progressed to the air lock matches the amount of air delivered by the fresh air plant. The actuators that affect the control of the plant comprise of the gas valve at the dehumidification unit and flow valves for both the cool and hot water for the cooling and heating coil unit respectively. The three valves are controlled by three separate PID-controllers. The feedback signals for temperature and humidity are measured in the return air (a) duct. The temperature controller includes an interlocking structure such that only one of the two coils is active (either heating or cooling). Measuring the current temperature and humidity from the return air is necessary so that disturbances in the room (heat and humidity loads) are taken into account. This also leads to an unintentional but unavoidable time delay in the closed loop control system. For a consistent nomenclature, the labels used in Figure 1 are also used as subscripts for properties, such as, mass flow rate ṁ, volumetric flow rate V̇ , temperature ϑ, specific humidity ω, etc. For instance, the quantity ϑa denotes the temperature a the point a. 47

2.1.

ENERGY CONSUMPTION

Since the purpose of the model is primarily for controller tuning optimisation, it is necessary to consider how the energy is dissipated by the plant. In the case of the dehumidification unit, this is rather straightforward as the amount of gas combusted is assumed to be linearly related to the gas valve position. Since more cross-coupling is involved in the process of cooling the air, it is considerably more complicated to estimate the energy consumption at the CCU. In fact, the cooling coil is a multiple-input-multiple-output (MIMO) system. The three inputs are the temperature of the incoming air, the temperature of the incoming cool water and its mass flow rate. Both, output water and output air temperature are dependent on these inputs. The energy consumption can be estimated by taking into account the incoming and outgoing water temperatures and the mass flow rate. 2.2.

TYPICAL OPERATING RANGE

The set points of the controllers are set to a dry bulb temperature∗ of ϑa = 21.5◦ C and a dew point temperature of ϑDP,a = −14◦ C, which corresponds to a specific hukg † midity of ωa = 1.1 × 10−3 kg (dry air) . The return air (a) temperature and humidity will typically be in this region. The fresh air (f) is either pre-cooled or heated to 10◦ C by an external fresh air plant. Assuming that the outside air is warmer than 10◦ C, the cooling process might also dehumidify the air. However, the humidity of the air could, in general, be at any possible value. The fresh air is now required to be mixed with the return air. kg The ambiguous of the mixed air are typically ϑb = 20◦ C and ωb = 2 × 10−3 kg (dry air) . As the pre-cooling coil is not in use, the temperature and humidity values of (b) also apply for the pre-dehumidification air (c). The dehumidification unit dries the air to kg typically ωs = 0.5 × 10−3 kg (dry air) . At the same time, this leads to a temperature rise denoted ∆T of about ∆T = 10K. To cool the air, the CCU is used. A typical temperature of the conditioned air is ϑe = 16◦ C. The humidity is not effected by the cooling coil. The temperature of the cool water entering the cooling coil is approximately 6◦ C which is significantly higher than the dew point temperature of the air kg ˆ ϑDP,d ≈ −22.5◦ C). The heat exchange is hence sensible (ωa = 0.5 × 10−3 kg (dry air) = only, which means that no condensation occurs during the cooling process. ∗ The dry bulb temperature is the temperature measured with a thermometer whose bulb is dry. It is hence the ordinary air temperature. kg † ’ is mathematically ‘odd’, it is commonly used in thermodynamics to clarify Although the unit ‘ kg (dry air) that the specific humidity ω is related to the mass of the dry air rather then the mass of the wet air.

ϑb ωb

ωd

ϑe ωe = ωd

Ω̇r

cc Cooling Coil Model

ϑd

Dehumidification Unit Model

Q̇r

ϑg

ωg

ωf

ϑf

Mixing Section Model

Q̇c

Room Model

ϑa ωa

ϑa ωa Fig. 2. Overall plant model structure.

3. MODELLING The developed model of the HVAC system consists of four sub-models representing different components. Ideal sensors are assumed, i.e. infinite bandwidth, which implies that the dynamics of all sensors can be neglected. The sub-systems considered are: mixing section model, dehumidification unit model, cooling coil model and room model; each is modelled as a MIMO system. The combination of the four sub-models leads to the overall model, which is again a MIMO system. The overall structure of the model is shown in Figure 2, in which cc and cg are the signals controlling the cool water valve and gas valve respectively, Q̇c represents the cooling load at the CCU, whereas Q̇r and Ω̇r represent the heat and moisture load of the room. The models are derived using appropriate tools, using different approaches and difTable 1. Details of all sub-models.

Component Mixing section Dehumidification unit Cooling coil unit

Approach

Model Type

Parameter Estimation Method

White box

Static

From physical properties

Black box

Discrete

Grey box

Continous

Room

Grey box

Continous

Recursive Least Squares + Extended Recursive Least Squares From physical properties + cost function minimisation From physical properties + cost function minimisation

Mixing Section Fresh Air

f Damper

Mixed Air

a Return Air Fig. 3. Mixing Section.

ferent methods for estimation of the model parameters. Table 1 provides an overview of the details of the four models. 3.1. MIXING AREA MODEL

A schematic of the mixing section is shown in Figure 3. In order derive a white box model the following assumptions are made: ideal gas behaviour, perfect mixing of the air, constant barometric pressure process, negligible thermal and moisture storage by components, perfect insulation (hence adiabatic process) and negligible infiltration and exfiltration effects. Furthermore, for modelling purposes, fiction and transient behaviour of the mixing section are neglected, i.e. assuming a constant cross sectional speed of the air through the duct. The mixed air flow speed (b) is about 5.7 m s . The mixing section is 0.5 m×0.5 m 3 has a volume of approximately 0.125 m , hence it takes 5.7(m/s) ≈ 44 ms for the air to progress through this section. Therefore, it is assumed that the flow dynamics due to the volume of the section are negligible. Equations derived from mass and energy conservation laws for adiabatic mixing of air streams are given e.g. by [2]. Applying these to the given case leads to:

ωb = hb =

ωa V̇a ρ (ϑa , ωa ) + ωf V̇f ρ (ϑf , ωf ) V̇a ρ (ϑa , ωa ) + V̇f ρ (ϑf , ωf )

+ c1

ha (ϑa , ωa ) V̇a ρ (ϑa , ωa ) + hf (ϑf , ωf ) V̇f ρ (ϑf , ωf ) V̇a ρ (ϑa , ωa ) + V̇f ρ (ϑf , ωf )

(1) (2)

where, noting the subscripts a, b and f , the dry-bulb temperatures and specific humidities are denoted by ϑ and ω, respectively. The quantity V̇ represents the different volumetric flow rates (which are fixed). The compensating offset term c1 has been added after testing the model. The specific gravity, denoted ρ, is dependent on temperature and the specific humidity. Using standard thermodynamic equations [2] an expression for ρ can be obtained as: 50

Mixing Section Model:

Temperature Error Histogram

Humidity Error Histogram Number of Occurences

Number of Occurences

Mixing Section Model:

3000 2000 1000

4000

2000

0 -2 0 -0.2 -1 0 1 0 0.2 (−0.99%) (5.3%) (−10.6%) (−5.3%) . (0.99%) . kg Simulation Error in 10−4 kg (dry Simulation Error in ◦ C air) Fig. 4. Mixing section model: simulation error histogram for temperature and humidity.

pω p − 0.622+ω ρ= Ra (ϑ + 273.15)

(3)

where Ra denotes the gas constant of air and p the barometric pressure. Note that (3) is only applicable in SI-units, so that ϑ is required to be expressed in ◦ C. The enthalpy of wet air is dependent on both, temperature and humidity of the air. Using standard thermodynamic equations [2], one can derive an expression for humidity, denoted h(ϑ, ω): h(ϑ, ω) = ϑ Cp,a + ω (hv (0◦ C) + ϑ Cp,v )

(4)

which may be rearranged to give the temperature for a given enthalpy and humidity denoted ϑ(h, ω):

ϑ(h, ω) =

h − ω hv (0◦ C) Cp,a + ωCp,v

(5)

where Cp,a and Cp,v are the specific heat capacities of air and water vapour, respectively, and hv (0◦ C) is the enthalpy of water vapour at 0◦ C. Equations (1)-(5) define the nondynamic model for the mixing section. The above static equations have been implemented in the SIMULINK environment. In order to validate the model it is tested with data obtained from one full day operation of the HVAC plant. The fixed parameters are given by: 51

from Gas Distribution System cg

Outside Air

Gas Heated Air Burner

Exhaust Air

Dry Air (d)

Process Air (c)

Fig. 5. Schematic of dehumidification unit.

Ra = 287 kgJK Cp,a = 1005 kgJK

p = 101.325 × 103 P a Cp,v = 1830 kgJK

V̇f = 0.173 ms V̇a = 1.216 ms J hv (0◦ C) = 2501.3 × 103 kg

The model is simulated with a sampling interval of 5s. The histogram of the simulation error for both models is shown in Figure 4. These results show that there are minor simulation errors. The simulation mean squared error (MSE) for the temperature kg 2 is 5.79 × 10−3 K 2 and 10.8 × 10−9 ( kg (dry air) ) for the humidity. 3.2.

DEHUMIDIFICATION UNIT MODEL

The main component of the dehumidification unit is an adsorbent carrying wheel. It exhibits an aluminium honeycomb structure. The large aluminium surface is coated with silica-gel. The silica-gel itself has a large internal surface and is, therefore, able to adsorb large amounts of water. As the removal of moisture is an exothermal process, adsorbtion heat occurs [1]. Consequently, to remove the water from the silica-gel, heat is required to be applied to reverse the process. To realise this repeated process, a design as shown in Figure 5 is used. The wheel rotates at a constant angular velocity of about 0.2 rpm. The process air is blown through the main part (about 34 ) of the wheel, during which time the air is dried. Furthermore, the temperature of the air rises due to adsorbtion heat as well as radiated heat from the gas burner. The latter is mounted in the same housing and is not perfectly insulated. As the wheel rotates, the wet silica-gel enters the reactivation area which is about 14 of the wheel. To dry the silica-gel outside air is heated with a gas burner and passed through the wheel. The air used for drying the silica-gel is then discharged as exhaust. The valve which regulates the gas flow to the burner is the actuator of this sub-system. The process is continually repeated until the silica-gel looses its humidity absorb52

ing properties. This is typically of the order of six years of constant operation, and is noticeable by a significant reduction in efficiency. An attempt has been made to model this complex process using the laws of physics and with a neural network [3, 6]. However, a more satisfying result has been obtained in this investigation by making use of a multiple-input-single-output (MISO) linear ARX model for the dried air humidity and a MISO bilinear ARMAX model for the dried air temperature behaviour. In order to find suitable structures for these models, different (bi)-linear ARX/ARMAX models have been investigated. Hence, the order of the system, the number of bilinear terms and the number of noise terms are systematically varied and the model performance assessed using the simulation MSE criterion. The two models presented in this paper give a sufficient performance, whilst remaining parsimonious, i.e. having limited complexity. The models are implemented and simulated in MATLAB, with a sampling interval of 5s. The model parameters are estimated using the Recursive Least Square (RLS) algorithm for the ARX system, and the Extended Recursive Least Square (ERLS) algorithm for the ARMAX system. The RLS algorithm [4] is given as: θ̂ k = θ̂ k−1 + Lk yk − ϕTk θ̂k−1

(6)

Lk =

(7)

Pk =

P k−1 ϕk

1 + ϕTk P k−1 ϕk P k−1 −

P k−1 ϕk ϕTk P k−1 1 + ϕTk P k−1 ϕk

(8)

where θ̂ k denotes the estimated parameter vector, yk is the measured output and ϕk is the observation vector at the sampling instance k. The inputs for both systems can be combined into a single input vector, denoted uk and given by:

uk =

cg,k ϑg,k ωg,k ϑb,k ωb,k

(9)

For modelling the humidity behaviour, a second order, 5 input, 1 output ARX model

Temperature ϑd in ◦ C

Dehumidification unit model 32 28 24

Measurement Simulation 2

Humidity ωd in 10−3 kg kg

Time in h 3

Measurement Simulation

2 1 2

8 Time in h

Fig. 6. Simulation of dehumidification unit.

structure is given by: θω =

a1 a2 b1 b2 b3 b4 b5 T ϕω,k = −ωd,k−1 −ωd,k−2 uk

(10) (11)

ωd,k = ϕTω,k θ ω + ek

(12)

which is found to have satisfactory simulation performance in terms of MSE. The parameters are estimated with the standard RLS algorithm (6)-(8). The simulation MSE 2 for the data set used for parameter estimation is 1.93 × 10−8 ( kg kg ) . For an unseen test 2 data set, the performance index is 3.59×10−8 ( kg kg ) . The measured and simulated output of the test data set is shown in the lower plot of Figure 6. The estimated parameters are given in Table 2. To model the temperature behaviour of the dehumidification unit, a more complex structure is required. A second order, 5 input ARMAX model with 1 bilinear term and a noise model order of 1 is used. The bilinear term is used to accommodate the product

Table 2. Estimated parameters of humidity model. a1 −0.439

a2 −0.412

b1 −1.53 10−4

b2 −6.66 10−6

b3 7.60 10−3

b4 −4.04 10−7

b5 0.133

Table 3. Estimated parameters of temperature model. a1 −1.96

a2 0.963

b1 9.66 10−4

b2 −3.26 10−4

b3 1.07

b4 1.76 10−3

b5 −2.38

n1 −5.11 10−5

c1 0.883

of the gas valve position cg normalised to a scale from 0 to 1 and the output temperature ϑd . Hence, the structure is given by: θϑ = ϕϑ,k =

a1 a2 b1 b2 b3 b4 b5 n1 c1

(13)

−ϑd,k−1 −ϑd,k−2 uk−1 (ϑd,k−1 × cg,k−1 ) êk−1

ϑd,k = ϕTϑ,k θ ϑ + ek

(14) (15)

Due to the estimated prediction error êk−1 in the observation vector ϕϑ,k , the observation vector is dependent on theta. The prediction error ek is required to be estimated at every time step using the latest estimated parameter vector θ̂ k−1 to obtain unbiased results. This is termed ERLS, see [4]. The estimated values of the parameters obtained are given in Table 3. The simulation MSE obtained with this method is 0.464 K 2 for the estimation data set and 0.288 K 2 for an unseen data set. It is surprising that the performance with an unseen data set in this particular instance is even better than for the data set used for estimation, but this is a single observation and further work would need to be carried out in order to make any conclusive statement on this. The simulated output is compared with the measured output in the upper plot of Figure 6. Return Air Q̇a Ω̇1

System Boundary ωr hr Ω̇l Q̇a Heat and Moisture Load Fig. 7. Schematic of room model.

Conditioned Air Q̇e Ω̇e

3.3.

ROOM MODEL

The room is also modelled using two separate yet coupled models for return temperature and humidity. Here, both temperature and humidity are modelled using a grey box approach. The bases for both models are white box models which make use of the mass and energy conservation laws. The assumptions and conditions for both models are: ideal gas behaviour, perfect mixing, constant barometric pressure process, negligible infiltration and exfiltration effects. Furthermore, the influence of humidity on the enthalpy of the air is neglected. The moisture flow rate is denoted Ω̇ ( kg s ) and Q̇ denotes enthalpy flow ( Js i.e. W ). Considering the schematic of the room (Figure 7), one can derive the following equation to model the humidity of the room by applying the conservation of mass law for the water vapour inside the room. The humidity in the room, denoted ωr , is assumed to be equal to the return air humidity, denoted ωa , i.e. d ωa = ch,1 (ωe − ωa ) + ch,1 ch,2 Ω̇l dt

(16)

1 . In order to optimise the model with respect to simulation with ch,1 = mṁr and ch,2 = ṁ performance, the value of ch,1 is judiciously varied. Furthermore, a time delay of duration Td is assumed to be presented in the model output. The optimised values of ch,1 and Td are searched by use of a cost function optimisation method. The MSE between simulation and measured data is defined as the cost. The minimum of this is searched by utilising the inbuilt MATLAB function fminsearch. The system has been simulated with a sampling interval of 5s within the SIMULINK environment. The estimated parameters for the humidity model are ch,1 = 8.61×10−3 ( 1s ) (white model value: 8.41×10−3 ( 1s )), s ) and Td = 60 s. ch,2 = 6.92 × 10−1 ( kg The room temperature model is derived in a similar way. Again, the room air temperature ϑr is assumed to be equal to the return air temperature ϑa :

ṁ Q̇l d ϑa = (ϑe − ϑa ) + dt mr mr Cp,a

(17)

This model is derived from the conservation of energy law. An optimal parametrisation for this model is found by introducing a bilinear term and, in a similar manner to the humidity model, the approach is applied, yielding: d ϑa = ct,1 (ϑe − ϑa ) + ct,2 Q̇l + ct,3 (ϑ5 × ϑ1 ) dt 56

(18)

Return Temperature ϑa in ◦ C Return Humidity kg ωa in 10−3 kg (dry air)

Room temperature model 24 22

Measurement Simulation

20 5

15 10 Room humidity model

Measurement Simulation

0 5

Time in h

Fig. 8. Room model: Performance with test data.

The final model structure (18) has been implemented in SIMULINK. The optimised ◦ parameters are ct,1 = 6.97 × 10−4 ( 1s ), ct,2 = 6.18 × 10−7 ( JC ) and ct,3 = −4.5 × 10−5 ( ◦ C1 s ). The performance of both models (upper plot return temperature, lower plot return humidity) with unseen test data is displayed in Figure 8. The MSE between simulated kg 2 2 and measured data is 9.78 × 10−9 ( kg (dry air) ) and 0.0519 K for the humidity and temperature models, respectively.

4. CONCLUSIONS AND FURTHER WORK Models for the mixing section, the dehumidification unit and the clean room have been derived and validated for an Heating, Ventilation and Air Conditioning control systemw. After deriving a model for the cooling coil unit, the models are required to be combined to assess the overall simulation performance. On the basis of this model, different control strategies can be tested. As well as achieving an improved PID-controller tuning, which is straightforward to implement, a four-term bilinear PID-controller [5] which could deal with the discovered non-linearities is to be tested. Finally, the new control strategies are to be applied to the plant to evaluate the performance on the actual system.

REFERENCES [1] BRUNDERETT G.W. Handbook of dehumidification technology. Butterworths, London, 1987. [2] CENGEL Y.A. and BOLES M.A. Thermodynamics: An engineering approach. McGraw– Hill, London, 1994. [3] CENJUDO J.M., MORENO R. and CARRILLO A. Physical and neural network models of a silica–gel desiccant wheel. Energy and Buildings, vol. 34, 2002, pp. 837–844. [4] LJUNG L. System identification: Theory for the user., Prentice Hall PTR, Upper Saddle River NJ, 1999. [5] MARTINEAU S., BURNHAM K.J., HAAS O.C.L., ANDREWS G. and HEELEY A. Four–term bilinear PID controller applied to an industrial furnace., Control engineering practice, vol.12, 2004, pp.457–464. [6] NIA F.E., VAN PAASSEN D. and SAIDI M.H., Modeling and simulation of a desiccant wheel for air contitioning. Energy and buildings, vol. 38, 2006, pp.1230–1239. [7] UNDERWOOD C.P. HVAC control systems: Modelling, analysis and design., E. & F.N. Spon, London, 1999.

Computer Systems Engineering 2008 Keywords: continuous, estimation, unknown input observer, fault-detection, recursive least squares, suspension

Vincent ERSANILLI* Keith BURNHAM*

A CONTINUOUS-TIME MODEL-BASED TYRE FAULT DETECTION ALGORITHM UTILISING AN UNKNOWN INPUT OBSERVER

This paper investigates a continuous-time model-based approach to fault diagnosis in vehicle tyres. An unknown input observer is used to overcome the problem of the unknown input to the system, namely the road disturbance. A suspension model of a quarter car is constructed from first principles and state space and transfer function models are obtained. The coefficients of the transfer function are estimated in continuous-time using a standard recursive least squares scheme, which provides the basis of the fault detection mechanism.

1. INTRODUCTION The motivation for the work in this paper arises from an investigation into fault detection schemes for vehicle suspension systems which avoid the direct measurement of tyre pressure. Measuring tyre pressure directly from a rotating wheel whilst the vehicle is in motion is problematic and necessitates the use of radio frequency transmitters and receivers and battery operated sensors [8]. The proposed system estimates tyre pressure based on chassis mounted acceleration sensor measurements. Fault detection in suspension systems via discrete-time (DT) methods has been investigated in [9]. It was reported that under certain conditions it was not always possible to isolate particular faults. In an attempt to increase the sensitivity whilst reducing the number of false alarms a combination of recursive least squares (RLS) and cautious least squares (CLS) was proposed [2]. Studies in [5] and [3] have shown *

Control Theory and Applications Centre, Coventry, UK 59

that it is theoretically possible to isolate faults using continuous-time (CT) model approaches with a state variable filter and RLS for parameter estimation. A problem with the model-based parameter estimation approach is that the input to the system is unknown i.e. the road surface is not known to the algorithm in advance. The solution to this problem within this work is the inclusion of an unknown input observer which estimates the road surface input from the chassis acceleration, based on knowledge of the suspension system. The design of the observer is based on the work in [1] where the idea from a reduced order observer perspective was developed. This paper is organised as follows. Section 2 deals with the vehicle suspension model and issues surrounding the selection of sampling interval. Section 3 shows how the unknown input observer is designed. Section 4 details the CT model and the estimation scheme. Section 5 outlines the simulation method. Section 6 gives detailed results and an analysis of the simulation studies. The conclusions are presented in Section 7. 2. VEHICLE SUSPENSION MODEL Fig. 1 represents the vehicle suspension model for this work in which a quarter car, consisting of a quarter of the chassis (sprung mass, ), wheel assembly (un-sprung ), suspension spring , suspension damper and tyre spring is mass, considered. The input stimulus to the system is essentially a displacement, denoted , from the road surface. Using Newton’s law of motion the system may be expressed as + − + − =0 (1) +

−

(2)

where and denote the displacement of sprung and un-sprung mass, respectively ( and denote the velocity and acceleration in both cases). A convenient state space representation given by = + and = (3) with state vector

= =−

# =

= !

leads to (4a) −

−

(4b) (4c)

" =

−

#−

(4d)

Fig. 1. Vehicle suspension schematic

Having defined the state vector, the representation takes the following state space vector-matrix form: 0 1 0 & ()! ()! )! *! *! * = %% ! 0 0 0 % )! ,! ()-()! $*+! *+! *+! 1 = 1()! *!

(,! *!

0 / 1/ (,! / *+! .

)! *!

−1

)! *!

,! *!

0 &00 +% 0 / % )- / $*+! .

(5a)

(5b)

The output corresponding to the first row represents the suspension deflection, which is the relative displacement between and and the output corresponding to the second row represents chassis acceleration . Values of the vehicle suspension components are given in Table 1.

Table 1. Vehicle suspension component values Parameter

Symbol

Value 350 kg

Sprung mass Un-sprung mass

45 kg

Suspension stiffness

15000 N/m

Tyre stiffness

200000 N/m

Damper value

1100 Ns/m

The vertical acceleration of the chassis is the main output of interest for this system. This is the variable measured on the vehicle. In terms of the model this quantity is given by (4b). The secondary output of interest corresponding to the fast mode is that of the un-sprung mass, comprising the wheel, tyre, brake and axle assembly, given by (4d). Other measured outputs will not be considered in this paper with the exception of suspension deflection as this has an impact on the sampling frequency used in the model. The outputs of the system can be expressed in terms of their transfer functions by applying 3 4 = 5 6 47 − 8

(6)

where 5 9 is a particular row of the output matrix :. For the un-sprung mass this leads to an acceleration transfer function given by =>

;

?.AB C > ABD." < >EBD" >BA #D

(7)

Similarly, for the chassis this leads to an acceleration transfer function given by =>

EBD" C >BA #D <

?.AB C > ABD." < >EBD" >BA #D

(8)

The poles of these transfer functions are identical and are given by two pairs of complex poles, namely

F , = −12.42 ± 67.51N 62

F ,# = −1.38 ± 6.21N

Taking the reciprocal of the real part indicates that the time constant of the fastest mode (associated with the wheel dynamics) is 80.5 ms and the slowest (associated with the chassis) is 0.725 s. These represent typical results for a vehicle suspension configuration, such as in Fig. 1. The ratio of the two dynamic modes is typically of the order 10:1 for the un-sprung and sprung mass respectively, see for example [6]. 2.1 SAMPLING INTERVAL

Measurements of the chassis acceleration are sampled at an interval Q . This interval must be selected carefully to capture the dynamics of the dominant modes in the system. Ideally the sampling interval should be one tenth of the time constant of the fastest mode of the system [7] to capture the dynamics of the system. This leads to a sampling interval Q of 8.05 ms and a theoretically ideal sampling frequency of 124 Hz. 3. UNKNOWN INPUT OBSERVER This approach to observer design divides the state vector in two parts, one part not depending on unknown inputs and the second part depending on the unknown input. The system (3) is equivalent to =8 + T=: Q= UR

+ RS

(9a) (9b) (10)

∈ ℜX , S ∈ where Q is a non singular matrix and U ∈ ℜX Y X; and ∈ ℜX , X X ℜ , T ∈ ℜ , are the state, known input, unknown input and output vector, respectively. Since p≥m, rank(D)=m, rank(C)=p and the pair (C, A) are observable, one can proceed. Suppose = Q = Q[ \ with

∈ ℜX; ,

∈ ℜ and

8 8 = Q ; 8Q = 1 8

(11)

8 2 8

(12)

= Q;

(13)

0 2 7 : = :Q = :U :R R = Q; R = 1

(14) (15)

the relation (9) can be written =8 +8 + +8 + +7 S =8 T=: +: .

(16)

The state is dependent on the unknown input v whereas is not, which makes superior candidate for estimation [1]. The input-free system becomes =8 T=:

+8 +:

(17a) (17b)

Suppose we create a non-singular matrix ] = :R ^

(18)

] 2 ]

(19)

with ^ ∈ ℜ_ Y _; and denoting ]; = 1 with ] ∈ ℜ

, ] ∈ ℜ _;

, verifying

] :R ]; ] = 1 ] :R

7 ]^ 2=10 ]^ 64

0 7_; 2

(20)

pre-multiplying both sides of measurement equation (17) by ] ; leads to ] T = ] :U

+ ] :R

, ] T = ] :U

+ ] :R

(21)

Combining (20) and (21) gives: ] T = ] :U ] T = ] :U The state

(22) (23)

is then deduced from (22) such that = ] T − ] :U

(24)

hence substituting (24) into (17) gives = 8̀ Tb = :̀ where 8̀ = 8

− 8 ] :U, a = 8 ] ,

+a T (25)

: = ] :U and Tb = ] T.

If the pair c8̀ , :̀ d is observable or detectable, following the conventional Luenberger observer design procedure [7], it is possible to design a reduced order observer for the unknown input free system (25) e = c8̀ − f:̀ d e + where f ∈ ℜ X; Then

Y _;

+ fT

(26)

f = f] + a . g e = Q e = Q [] T − ] :Ug\

(27)

and e → as i → ∞. Based on the reduced order observer described by (26) and (27), an estimation of unknown inputs can be obtained

Se = ] T + 3# g + 3" T + 3A where

(28)

3# = ] :Uf] :U + ] :U8 ] :UU − ] :U8 − 8 + 8 ] : 3" = −] :Uf] − ] :U8 ] − 8 ] 3A = −] :U −

(29) (30) (31)

4. PARAMETER ESTIMATION RLS is the method used here to estimate the coefficients of the transfer functions of the suspension model. RLS is a straightforward online estimation algorithm, yet it is optimal in the mean square error (MSE) sense when the assumptions on linearity of the model and Gaussian properties of the measurement noise hold. Although auto regressive moving average (ARMA) additive noise has been adopted for the noise models, the estimator is found to perform adequately, as will be demonstrated in the results presented in Section 5. 4.1 THE CONTINUOUS-TIME SYSTEM MODEL

The RLS algorithm is used to estimate the coefficients of a CT differential equation model based on sampled data measurements of the input and output variables obtained in DT. Consider the linear differential equation representation klm k l

k l(o m k l(o

+ ⋯ + nX

i = qr

k* k *

+⋯+ q

(32)

Taking Laplace transforms, and assuming zero initial conditions, the transfer function corresponding to the above differential equation takes the form t

s 4 =u

] 4

(32a)

where s 4 and ] 4 denote the Laplace transforms of the noise free system output i and the available noise free input i , respectively. The transfer function numerator and denominator polynomials are given by 4 = qr 4 + q 4 66

;

+ ⋯+ q

(33)

8 4 = 4 X + n 4 X; + ⋯ + nX

(34)

where s is the Laplace variable. The CT system model input and noise free output i and i , respectively, are sampled at discrete intervals i , … , iw. In the case of uniformly sampled data (as in the vehicle suspension simulation) at each sampling interval ∆t, where ix = ∆i, the measured output is assumed to be corrupted by an additive measurement noise z ix , i.e. T ix =

ix + z ix

(35)

where ix is the sampled CT deterministic, noise free output of the CT system and, as in the DT case, ξ ix is modelled as a DT ARMA process {c| (o d

ξ ix = } | (o ~ ix

~ ix = U 0, •

(36)

The problem is to estimate the parameters of the CT differential equation (or transfer function) model (32) from N sampled data pairs comprising the available noise free w input and noise corrupted output, denoted € w = { ix ; T ix }x; . The system „ sampling instant is expressed in the following pseudo estimation equation at the linear regression form T…X ix = †…6 ix ‡ˆ + z ix

†…6 ix = [−T…X;

ix … − T…r ix

‡ˆ = n … nX qr … q

(37) …

ix … …

ix \

(38) (39)

where the subscript f denotes hybrid filtering which involves a CT filter. First the prefiltered derivatives which are sampled at instant ∆i are obtained as the inputs to the integrators in the CT implementation of the state variable pre-filter 1⁄8 4 , as shown in Fig. 2.

Fig. 2. State variable filter

Ideally the coefficients of the pre-filter match those of the unknown system [11]. In practice these would be initialised with approximate values and iteratively updated with the new estimates as they become available. In this work, however, rounded values close to those of the coefficients corresponding to the nominal CT suspension system are used. Further consideration would need to be given as to updating the coefficients in an application such as fault detection. 4.2 REPLICATING FAULTS IN THE SYSTEM MODEL

There are many ways in which a suspension may degrade but only tyre faults are considered here. In particular, a slow deflation which results in a gradual reduction in tyre stiffness of some 50% is considered. This fault scenario is replicated by creating a matrix of theoretical values of the model parameters starting at sample – with the nominal (no fault) values for the parameters, denoted ‡X… .The parameters are linearly, incrementally changed from the sample where the fault starts, denoted –… up to the sample when the fault ends, denoted –…— with the faulty values of the faulty parameter vector, denoted ‡… . From the sample –…— to the end of the simulation, –— , the values remain fixed at ‡… . 5. SIMULATION STUDIES For the purposes of fault detection a robust diagnosis with no false alarms of faults is required. To achieve this goal a matrix of tests is implemented and a majority voting 68

system is proposed, similar to the type that is used in aircraft [10]. The fault decision algorithm is presented with the result of three tests and a majority verdict decides the diagnosis and hence alerts the driver of a problem with tyre pressure. The tests are detecting changes in the system in three distinct ways. The primary approach is parameter estimation and is carried forward from the work in [4]. This technique is augmented by analysis of the input estimation: variance and the phase portrait. The simulation studies show that the estimated parameters no longer converge to the true model parameters. The cause of this is most likely to be the estimation of the input, which is only an approximation of the road surface. This behaviour is not particularly problematic as the estimations tend to settle to a steady value when the system is in steady state and during a fault are changing in sympathy with the actual model parameters. A persistent change in the parameters can then be deemed to be a fault. For practical applications, bounds and conditions should be placed on the variation of the estimated parameters for a diagnosis to take place. With further testing work the value of the estimated parameters could be linked directly to pressure in the tyre. 6. ESTIMATION RESULTS Fig. 3 shows estimation of parameter ˜ in a typical test run with no fault. Contrast this result with Fig. 4 which shows the fault occurring at 6 minutes and stabilising at 12 minutes. Fig. 5 shows the mean variance of the input estimation as it evolves over time, starting with the fault free condition, with the fault being introduced at the 40% of the total test time and stabilising at around 70%. Fig. 6 compares the phase portrait of the input estimation before and after the fault has occurred and stabilised. 7. CONCLUSIONS A quarter car suspension model and unknown input observer was developed. The parameters of the transfer function model were estimated with no access to the real (road) input. Diagnostics were developed to identify changes in the system relating to tyre pressure decrease and a majority voting system was proposed. The diagnostic tests show that it is possible to distinguish between the system in a nominal state and the faulty condition, by the use of three different tests. During the course of the simulation studies it became clear that tuning the observer made a significant difference to the ability of the diagnostic algorithms to track changes in the system. The observer design is dependent on the system matrix A and so computing the observer with modified values of tyre spring, , moved the poles of the observer. 69

Fig. 3. Parameter estimations of ˜ in the fault free condition

Fig. 4. Parameter estimations of ˜ as a fault occurs at 6 minutes and stabilises at 12 minutes

Fig. 5. Mean input estimation variance as a fault occurs at 40% and stabilises 70%

Fig. 6. Phase portrait of the input estimation before and during a fault

This behaviour highlights a property of the fault detector: during a fault, the configuration of the observer is no longer theoretically optimum. The solution to this problem was to start with an observer that is optimally configured for the faulty condition, which happens to work adequately for the fault free condition and is an 71

improvement over the observer which is configured for the fault free case. Further work will include an investigation of the possibility of a multiple model approach with models for a variety of different system states. With further testing work the value of the estimated parameters could be linked to pressure in the tyre to give an estimation of the real pressure rather than merely indicated a change in the pressure. Majority voting is the proposed method of defining a fault and this could be further developed into a pattern matching algorithm that can match test outcomes with vehicle states i.e. change in mass, road surface, vehicle speed to improve the accuracy over a range of driving scenarios. REFERENCES [1] BOUBAKER O., Full order observer design for linear systems with unknown inputs, IEEE International Conference on Industrial Technology, 2004 [2] BURNHAM K. J., Self-tuning Control for Bilinear Systems, PhD Thesis, Coventry Polytechnic, Coventry, UK, 1991 [3] ERSANILLI V. E., BURNHAM K. J., KING P. J., Comparison of Continuous-Time and Discrete-Time Vehicle Models as Candidates for Suspension System Fault Detection. IAR Workshop on Advanced Control and Diagnosis, Coventry, UK, 2008. [4] ERSANILLI V. E., Fault Detection for Vehicle Suspension. MSc Dissertation, Coventry University, UK, 2008. [5] FRIEDRICH C., Condition Monitoring and Fault Detection, MSc Dissertation, Coventry University, UK, 2006. [6] GILLESPIE T., Fundamentals of Vehicle Dynamics, Society of Automotive Engineers, Warrendale, USA, 1992. [7] NISE N., Control Systems Engineering, John Wiley and Sons, Inc., USA, 2004. [8] VELUPILLAI S., GÜVENÇ L. Tire Pressure Monitoring, IEEE Control Systems Magazine, Dec. 2007, pp. 22-25. [9] WALKER C. J., A Cautious Fault Detection Algorithm, BEng project dissertation, Coventry University, UK, 1991 [10] WEIZHONG Y., JAMES L., GOEBEL K. K., A multiple classifier system for aircraft engine fault diagnosis, Proc. of the 60th meeting of the society for machinery failure prevention technology, 2006, pp. 291-300, [11] YOUNG P. C., The Refined Instrumental Variable Method, Unified Estimation of Discrete and Continuous-Time Transfer Function Models, Journées Identification at Modélisation expérimentale, Poitiers, France, 2006

Computer Systems Engineering 2008 Keywords: unicast, anycast, CFA, routing, flow, capacity, Top-Down, Flow Deviation

Jakub GŁADYSZ* Krzysztof WALKOWIAK*

THE HEURISTIC ALGORITHM BASED ON FLOW DEVIATION METHOD FOR SIMULTANEOUSLY UNICAST AND ANYCAST ROUTING IN CFA PROBLEM In this paper, we present heuristic algorithms to solve the capacity and flow assignment (CFA) problem simultaneously for unicast and anycast flows. By introducing anycast (one-to-one-of many) flow and replicas of servers we increase reliability and decrease summary flow in network. As a criterion function we use the total average delay with the budget constraint. To obtain an initial selection we use a heuristic algorithm based on the top-down method. Next we try to decrease vector of flows using CFD_DEL algorithm for unicast and anycast connections. Finally, we present results of computational tests.

INTRODUCTION

Due to increasing number of people using the Internet and due to growing flows of Information, the mechanism of designing computer networks is a matter of great importance. In designing computer networks there exist a few kind of problems [2]: flow assignment problem (FA), capacity assignment problem (CA), capacity and flow assignment problem (CFA), topology, capacity and flow assignment (TCFA). In the literature there are many papers touching FA and CFA problems for unicast (one-toone) and multicast (one-to-many) connections [2][5]. Anycast paradigm is a new technique to deliver packets in computer networks which was implemented in Internet Protocol version 6.0. It is the point-to-point flow of packets between a single client and the “nearest” destination server. The idea behind anycast is that a client wants to download or send packets to any one of several possible servers offering a particular service or application [4]. This flow has become more popular since users started *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. 73

downloading of books, movies, music etc. One of technology that applies anycast traffic is Content Delivery Network (CDN) [6]. In this paper, we consider Capacity and Flow Assignment Problem for multicommodity, non-bifurcated flow. The goal of our model is to minimize function associated with flow and capacity. That function can be the total average delay which was formulated by Kleinrock [3]. In this problem network topology, location of severs, set of capacities and link cost, set of unicast and anycast demands are given. Unicast connections are defined as triple: origin node, destination node and bandwidth requirement, anycast is defined as client node and bandwidth requirement to and from server. In the problem we know a set of routes. We assume that all connections should be established and summary flow in each arc can not be bigger than its capacity. This problem was formulated in [1].

PROBLEM FORMULATION

To represent the problem we use the following notation: Sets: the set of selections, ℜ

the set of variables x kp , which are equal to one,

the set of variables y al , which are equal to one,

π kp

the index set of candidate routes(path) for connection p,

P AN , PUN z (a )

the set of anycast and unicast connections, the set of capacities for channel a .

Constants: k δ pa

1 if arc a belong to route k realizing connection p , 0 otherwise,

cal

value of capacity respectively to y al ,

k al

value of link cost respectively to y al ,

τ ( p)

index of the connection associated with connection p,

o(π )

the origin node for connection p,

d (π pk )

the destination node for connection p ,

budget,

k p

the total message arrival rate from external sources,

Variables:

the summary flow in arc a

x kp

the binary variable, 1 if route k belong to connection p , 0 otherwise

capacity of arc a

y al

1 if capacity of arc a is cal

This problem can be formulated below:

min T ( Z ) = f ,y

∑y −f κ a∈A

(1)

Subject to:

∑ x = 1 x ∈ {0,1} ∀p ∈ P, ∀k ∈ Π k p

k p

k∈Π p

∑ x d (π ) = ∑ xτ o(π τ

k∈Π p

k p

k ( p)

)

∑δ x Q ≤ y

( p)

k∈Πτ ( p )

fa = ∑

p∈P

y al ∈ {0,1}

k pa

k∈Π p

k p

∀p ∈ P AN ∀a ∈ A

∑ y = 1 ∀a ∈ A ∀l ∈ z(a)

l∈z ( a )

l a

(2) (3) (4) (5)

y a = ∑ y al cal

(6)

D( Z ) = ∑ k al y al ≤ B

(7)

l∈z (a )

a∈A

Z = ( X , Y ) , X = U {x kp } , Y = Ul { y al } a ,l : y a =1

k , p: x kp =1

(8)

Constraint (2) guarantees that only one route can be chosen for one connection. Equation (3) guarantees that two routes associated with the same anycast demand connect the same pair of nodes. Condition (4) states that flow in each arc cannot be bigger than its capacity. Condition (5) ensures that only one value of capacity can be taken for each arc. Equation (6) denotes value of capacity in arc a . Constrains (7) states that summary link cost can’t be bigger than the budget. Point (8) is a definition

of a selection Z including set X and set Y . Set X and set Y include all variables x and y , respectively, which are equal to one. 3

ALGORITHM

We set the initial selection using a heuristic algorithm that is based on Top-Down method. The initial selection is found according to the following steps: Step1 For each channel a in representation Z1 set value of capacity to a maximum value. For each unicast connection find the shortest path using l aT ( f ) metric for zeroflow in network. Then find the shortest path to the server and from server for anycast demand to satisfy anycast condition d (π ik ) = o(π τj( p ) ) . Then calculate flow in each channel. Step2 Calculate T ( Z 1i ) . If capacity condition (4) is not satisfied for network S1i , then the algorithm stops. The problem has no solution. Otherwise go to step (3). Step3 Check budget constraint. If D ( Z1i ) > B then go to step (4), otherwise

D( Z1i ) ≤ B go to step (5). Step4

Take

such

value

capacity

channel

a , for which value

(c − c ) w is the least possible. Variable waT is the first partial k am − k al derivative of function T (Z ) over arc capacity. Set i = i + 1 and go to step (2). Step5 The initial selection is found. Stop algorithm and set Z1 = Z1i . Let f and c be the vectors of flows and capacities for feasible initial selection Z 1 . of ∆KTa

l a

m a

T a

Then use CFD_DEL algorithm for the fixed vector of capacity. This algorithm was proposed in [6] and it jointly optimizes unicast and anycast flows in a network. It decreases criterion function T (Z ) by minimizing vector of flow. 4

COMPUTATIONAL RESULTS

To solve CFA problem we split it to CA and FA problems. Based on Top-Down algorithm we obtain feasible values of capacity for zero-flow. Based on CFD_DEL algorithm we try to minimize vector of flow in a network for fixed values of capacity.

The experiments were conducted with two main purposes in mind. First we will compare values of criterion function for a feasible solution obtained by Top-Down algorithm and results by CFD_DEL. Next, we will examine a value of the criterion function according to the numbers of servers-replicas in a network. In computational experiments we used the network topology with 14 nodes and 56 links and assumed that 30% traffic in network is one-to-one-of-many, other traffic is one-to-one. Fig.1 shows the topology of the test network. 9

12 2

1 6

8 13 Fig. 1. Topology of network

Results of two Algorithms for one, two and three servers are presented in Fig. 2-4. 0,6 0,5

T(Z)

0,4 Top-Down

0,3

CFD_DEL

0,2 0,1 0 1

8 9 10 11 12 13 14

Localization of server in network

Fig. 2. Values of T(Z) for one server

0,6 0,5

T(Z)

0,4 0,3

Top-Down

0,2

CFD_DEL

0,1 0 1

7 13 19 25 31 37 43 49 55 61 67 73 79 85 Different localizations of two servers

Fig. 3. Values of T(Z) for different localization of two servers

0,4 0,35 0,3 T(Z)

0,25 0,2

Top-Down

0,15

CFD_DEL

0,1 0,05 0 1

9 11 13 15 17 19 21 23 25 27 29 31

Different localizations of three servers

Fig. 4. Values of T(Z) for different localization of three servers

In Fig. 2-4 we can notice that algorithm CFD_DEL reduces the feasible initial selection obtained by the algorithm Top-Down. Comparing results of experiments the algorithm CFD_DEL reduces the criterion function for one replica server located in the network to 34%, for two replica servers to 37%, and for three replica servers to 43%.

Table 1. Average values of T(Z) in view of number of nodes obtained by Top-Down and CFD_DEL algorithm

Number of servers T(Z) for Top-Down T(Z) for CFD_DEL

1 0.3078 0.2016

2 0.2931 0.1828

3 0.2768 0.1571

Table 2. Minimal values of T(Z) and nodes for which minimal values are obtained by Top-Down algorithm

Number of servers Min T(Z) Localization of servers in network:

1 0.2307 5

2 0.1932 5,10

3 0.1900 4,5,11

Table 3. Minimal values of T(Z) and nodes for which minimal values are obtained by CFD_DEL algorithm

Number of servers Min T(Z) Localization of servers in network:

1 0.1470 5

2 0.1265 3,12

3 0.1221 2,5,11

5. FINAL REMARKS In this paper, we presented a model and solution methods for CFA problem for unicast and anycast flows. We used two heuristics algorithms – first to find an initial feasible solution, second to minimize the criterion function in view of flow in the network. Results of the experiments were presented in Fig. 2-4. In Tables 1-3 we can notice that the next server in the network decreases the criterion function and the minimal value of T(Z) for both algorithms. We must remember that every next server decreases total average delay, decreases flow in network, but increases cost of building it. Tables 1-3 show that for both Algorithms the best localization of servers-replicas is different. For example the best localization of servers-replicas for algorithm Top-Down is in nodes 5, 10 and 4, 5, 11 but for algorithm CFD_DEL in nodes 3,12 and 2,5,11. To examine results obtained by CFD_DEL algorithm we must find a optimal solution by an exact algorithm (e.g. branch-and-bound method [1]).

REFERENCES [1] GŁADYSZ J., WALKOWIAK K., Branch-and-Bound Algorithm for Simultaneously Unicast and Anycast Routing in CFA Problem, 42th Spring International Conference, Modelling and Simulation of Systems 2008, Czech Rublic, MARQ Ostrava, pp. 108-115. [2] KASPRZAK A., Projektowanie struktur rozległych sieci komputerowych, Monography, Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław, Poland 2001 (in Polish). [3] KLEINROCK L., GERLA M., FRATTA L., The Flow Deviation Method: An Approach to Store-and-Forward Communication Network Design, Network – an International Journey, John Wiley & Sons , 1973, pp. 97-132. [4] METZ CH., IP Anycast Point-to-(Any) Point Communication, IEEE Internet Computing, March-April 2002, pp. 94-98. [5] PIORO M., MEDHI D., Routing, Flow and Capacity Design in communication and Computer Network, Morgan Kaufman Publishers, 2004. [6] WALKOWIAK K., Algorytmy wyznaczania przepływów typu unicast and anycast w przeżywalnych sieciach zorientowanych połączeniowo, Monography, Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław 2007 (in Polish).

Computer Systems Engineering 2008 Keywords: Reinforcement Learning, Q–learning, Boltzman Scheme, Counter Scheme, Adaptive Scheme

Tomasz KACPRZAK* Leszek KOSZAŁKA*

COMPARISON OF ACTION CHOSING SCHEMES FOR Q-LEARNING A very important issue when implementing Q-learning algorithm regarding its performance is an action choosing scheme. Q-learning algorithm does not define how to choose an action to perform during single step – it is a decision of designer to implement most effective scheme in a particular system. In order to obtain maximum result in specified time, the designer has to decide about exploration and exploitation in action choosing scheme. Well described methods, such as greedy, probability–based or temperature–based can be applied as such scheme. This paper studies and compares effectiveness of different action choosing methods considering various environments and introduces a new method Adaptive Scheme, which evaluates how profitable exploration is and uses this information in further decision making.

1. REINFORCEMENT LEARNING Reinforcement Learning (RL) is a method of solving decision making problems in an unknown environment. Almost every environment can be represented with a state graph or matrix and a set of actions, which are available to perform in this state [1][2][3]. The decision module chooses and performs an action in a particular state. The aim of the module is to make such decisions, which will move – or maintain – the system in certain state, or prevent the system from moving to particular state. In RL this matrix is not known to the decision module at the beginning of operation. The decision system performs actions and obtains the reinforcement from the environment. This reinforcement is called “reward”, and can be positive or negative in terms of reaching the goal of a decision system. There must be a non-empty set of states, which are considered as “absorbing”, after reaching which the trial is finished and next trial *

Department of Systems and Computer Networks, Wrocław University of Technology, Poland. 81

begins [2]. RL can be considered as learning through “trial and error”. RL algorithms can be divided into strategy–based, and action–based. First learns by assuming a strategy in the beginning and updating and improving it during learning. Second learns the best action to perform in particular state. This work covers the topic of action–based RL that is the most popular implementation of Q-learning. 2. Q - LEARNING The Q-learning algorithm learns to takes the given action in the given state such that the objectives are optimized [1]. The algorithm uses the Q–matrix, in which the relations between stages and actions are given. These values are calculated according to the current value, the reinforcement (reward) and discounted value resulted from the previous actions. The Q–matrix is updated after every time an action is performed. The general idea of the algorithm can be presented as follows [1]: 1: for each time step t do 2: observe current state xt; 3: at := choose action (xt, Qt); 4: perform action at;

5: observe reinforcement rt and next state xt+1; 6: ∆ := rt + γmaxa Qt (xt+1, a) – Q(xt, at); 7: updateβ (Q(xt, at), ∆). 8: end for each.

where γ is a discount factor. If γ is close to 1, the algorithm tends to choose an action, which leads to bigger, but more time distant profits, and when it is close to 0, it prefers instant rewards. Q-learning algorithm does not specify how the action is chosen (step 3). This paper compares and evaluates popular methods of action choosing. 3. ACTION CHOOSING SCHEMES The problem of exploration is very important in Q-learning algorithm - it is essential for the learning system to obtain sufficient knowledge of the environment in which it performs, realized by proper action choosing [1]. Of course, the main goal of the system is to obtain results (exploit), and exploration is most frequently an action, which does not aim to get profits. Exploration can be considered as experimental behavior of system, in which more time is spent to know environment better, and to get greater rewards in the future. The proper balance between exploration and exploitation has to be achieved during the operation of the system. Action Choosing 82

Schemes aim to achieve this balance. The most popular Action Choosing Schemes are discussed in the further part of this section. • Greedy – the system does not experiment at all. In every step, it uses the first action it performed, about which it is known, that leads to the goal. In this work Greedy Scheme will be treated as reference for comparison purposes. • ε – Greedy – a probability–based scheme, in which the random action is chosen with probability ε > 0 and greedy action with probability (1 – ε) [1]. • Boltzman Scheme – a scheme which uses the Boltzman Distribution, described by:

π (x, a ∗ ) =

•

( (

) )

exp Q x, a ∗ / T ∑a exp(Q(x, a ) / T )

(1)

where T > 0 is called temperature, and it regulates the degree of randomness in action choosing. Values close to 0 cause almost deterministic action selection and values close to 1 – almost random. In Boltzman Scheme the smaller the probability of selection of non-greedy action is, then the most greedy action has a greater value than the others. During learning, the temperature is “cooled” by decreasing T. Counter Scheme – these scheme is based on assigning one or more parameters to each action in each state, next to Q–value. The parameters are used in process of action selection. It can contain such information as last selection time, number of times this action has been selected in this trial. In this work the Counter Strategy chooses the greedy action with probability ε > 0 and action which has not been performed for the longest time from the set of non–greedy actions. ε is in fact a temperature indicator. During trials it is decreased according to the parameters. Adaptive Scheme – this new method is introduced in this paper and it is based on Counter Scheme. The additional feature of this scheme is that it monitors how efficient the exploration is by measuring the average number of changes of Q values when algorithm chooses a random action to perform – the Efficiency Factor. Next, it sets temperature accordingly to exploration efficiency factor. This mechanism also resembles the way human learn – if learning of new solutions do not bring any profits, it abandons learning and focuses on results. The size of averaging window affects the sensitivity of scheme.

4. EXPERIMENT SETTINGS The aim of the algorithms is to obtain maximum results in time specified by the designer. To do it, schemes responsible for action choosing have to maintain the proper balance between exploration and exploitation. During initial analysis of the problem it can be easily recognized that for very long time of trials (which are considerably longer than it is needed for an algorithm to fully learn the environment) the exploration method is not very significant for results of the algorithm. Also no further study is needed for those “long run” cases. More interesting is the case, when learning time is comparatively shorter, equal or a bit longer to experiment time. This will be the scope of this paper. No ending conditions were specified for the algorithm, the trial was performed until the high predictability of further results. The results are measured by the summation of reward values that the schemes obtained in the given time. The experiments were performed in the grid environments, (see Fig. 1). It contains absorbing states (grey) that are also rewards and forbidden states (black).

Fig. 1. Grid Environments used for experiments. a) simple case. b) middle-sized complex environment, with significantly larger rewards hidden in corners, what makes them very difficult to reach.

5. EXPERIMENTATION SYSTEM To perform experiments a simulating software was created in Visual C++ environment (Fig. 2). The simulator allows to create 2D environments, perform tests of various time length, and compare action choosing schemes. It operates in two modes – demonstration mode and comparison 84 mode, in which it performs simulation

for all schemes in same environment. The reliability of comparison is assured by calculating average score of user – defined number of simulations. The program can also present how Q-learning works and serve for demonstration purposes. In this environment the designer can specify following input: • rectangular board of various size with rewards and obstacles, • the discount factor γ (for all schemes), • random action probability (for ε–greedy algorithm), • the initial temperature Ti and the temperature decreasing step dT (for Boltzman and Counter schemes), • the averaging window size – the number of past results, which is used to calculate average score in time, • the averaging window size for adaptive scheme – the number of past profitable random actions taken into consideration in computing Efficiency Factor, • the time period of the simulation (for comparison mode only), • the trial count – the number of simulations (to assure reliability).

Fig. 2. Screenshot of the proposed experimentation system.

6. EXPERIMENTS The comparison was based on many simulations in different environments. Results for many types of environments were similar, nevertheless two different cases were defined: simple and complex environment case, based on Environments a) and b) (see Fig. 1). The comparison was performed for these cases varying with exploration scheme, environment (presented above) and the scheme cooling time. The discount factor was also tested, but the results did not bring significant conclusions. 6.1. “COOLING” TIME ADJUSTMENT

Fig. 3 presents average score in time, when cooling was time: 1000 time ticks. Fig. 5 presents the average for cooling time of 400. Both cases discount rate was set to γ = 0.8. The Averaging Window for Adaptive Scheme was set to 5, which gave best results in simulations. For environment (a) and γ = 0.3 average time results were very similar to Fig. 3 and Fig. 4.

Fig. 3. Simulation results for environment (a), γ = 0.8, “cooling” time set to 1000.

Fig. 4. Simulation results for environment86(a), γ = 0.8, “cooling” time set to 300.

Points marked with squares present the time, when the total score of particular scheme exceeded the score of greedy scheme. It can be easily noticed that in both cases Counter, Boltzman and Adaptive Schemes obtained better average score than Greedy Scheme, what indicates, that spending more time for learning in this Environment is profitable. In fact learning is profitable in majority of tested environments. However it also can be noticed, that for Boltzman and Counter Schemes the “cooling” time is a crucial factor – setting it too long can cause spending more time for exploration when Environment is already explored (as seen in when comparing Fig. 3 and Fig. 4). Too short “cooling” time causes the exploration to be incomplete, on the other hand system gains scores in early stage of learning (Fig 4). Adaptive Scheme, which adjusts temperature on it’s own, always reached the highest possible average, however, it reaches the maximum in long – term. In mid – term it obtains the average score comparative to Counter Scheme. 6.2. COMPLEX ENVIRONMENT

Environment b) has to be considered “complex”, because it contains rewards, which require much exploration – they are hidden in the corners of the board. Exploring them brings significant profits in terms of average to the algorithm. Simulation in this environment was performed only using γ = 0.8, because use of small Discount Factor would make these hidden rewards insignificant. Fig. 5 presents the average score for Environment b).

Fig. 5. Simulation results for environment (b), γ = 0.8, “cooling” time set to 5000.

It can be seen that the best average score was obtained by Boltzman scheme. This observation was made also in other “tricky” environments. The Counter Scheme average score was significantly lower, on the other hand, it performed well in the beginning of experiment. Adaptive Scheme also performed on the satisfying level in 87

the beginning, and kept constant improvement rate later on. During longer trial it proved to reach the maximum average. 6.3. CONSIDERING DISCOUNT FACTOR

The situation was slightly changed when smaller discount factor was considered at the same “cooling” time. At first, it can be noticed that overall average is higher, when discount factor γ = 0.3. That is because of internal characteristics of Environment – it is rather simple, rewards are regularly distributed within Environment and the value of rewards are comparable. Adaptation of γ is not a subject of this work, but seems to be interesting subject of research. Observation of average scores in case 2 (Fig. 4) leads to conclusion, that for this conditions Counter Scheme outperformed the Boltzman Scheme – it had significantly better average score and also better total score in all moments of time. Observation of simulation indicated, that Counter Scheme tends to “travel long distances” in Environment, by following one direction and rather not going back. That seems to be a reason why Counter Scheme is eligible for using high discounts factors and not low – as it will be presented after analysis of the next plot. Fig. 5 presents the case of γ = 0.3 and “cooling” time of 300. In this case the Boltzman scheme occurred to be the best in the long term, since it reached maximum in the most cases. Furthermore, it did not stop to learn, despite zero temperature. That is because Boltzman scheme always uses probability when choosing the action to perform and the decreasing of the temperature diminishes only the probability of choosing “non–greedy” action. In situation of low discount factor and regular environment it will still sometimes choose “non - greedy” action and therefore learn. This phenomena is significant for very limited range of specific environments.

Fig. 6. Simulation results for Environment (a), γ = 0.3, “cooling” time set to 300. 88

7. CONCLUSIONS In this paper, the different exploration schemes were analysed. The experiments showed that the simple Greedy Scheme provided very good results for a short running time. The ε–greedy solution did not seem to obtain neither results nor average comparable to other schemes using various ε values. For a mean running time applications Counter Scheme obtained best results among all schemes (Fig. 4). For the long running time the Boltzman Scheme seems to outperform other schemes. The adjustment of the temperature can be a problem for Boltzman and Counter Scheme when the designer does not know the length of a trial or the characteristics of the environment. In this cases Adaptive Scheme can be successfully applied. It performed well in mean running times and is ready to learn until it reaches the maximum average score. REFERENCES [1] [2] [3] [4] [5] [6]

BOLC L., ZAREMBA P., Wprowadzenie do uczenia się maszyn, Akademicka Oficyna Wydawnicza, 1993 (In Polish). CICHOSZ P., Systemy uczące się, Warszawa, WNT, 2007 (In Polish). DE FARIAS DP, MEGIDDO N., “Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments”, http://books.nips.cc/papers/files/nips17/NIPS2004_0071.pdf. TEKNOMO K., http://people.revoledu.com/kardi/tutorial/ReinforcementLearning, 7 march 2008. MITCHELL T., Machine learning, McGraw-Hill Companies, Inc., 1997. http://wazniak.mimuw.edu.pl/index.php?title=Sztuczna_inteligencja/SI_Modu%C5%82_13__Uczenie_si%C4%99_ze_wzmocnieniem, 2008-03-07 (In Polish).

Computer Systems Engineering 2008 Keywords: identification, parameter estimation, errors-in-variables

Tomasz LARKOWSKI∗ Jens G. LINDEN∗ Keith J. BURNHAM∗

RECURSIVE BIAS-ELIMINATING LEAST SQUARES ALGORITHM FOR BILINEAR SYSTEMS

The paper presents a recursive approach for the identification of single-input single-output discretetime time-invariant errors-in-variables bilinear system models. The technique is based on the extension of the bias compensated least squares and bias-eliminating least squares methods for bilinear systems. Consequently, since the constituent algorithms are constructed within the least squares framework, the required computational burden is relatively low. A numerical simulation study compares the proposed algorithm to other EIV methods.

1. INTRODUCTION The errors-in-variables (EIV) framework addresses the identification of systems where all the measured variables are corrupted by noise. This formulation extends the standard approach, which postulates that only the output signals are uncertain and the input is known exactly. The EIV class of problems is found to be most important when the determination of the internal physical laws describing the system is of a prime interest, as opposed to the prediction of the system output [17]. The EIV approach has gained increased attention during the last decade in various wide ranging engineering, scientific and socio-economic fields. A detailed description of the existing approaches can be found in [13, 17, 18, 19]. A diagrammatic illustration of the standard EIV system setup, is presented in Figure 1. The variables u0k , y0k denote the noise free input/output, uk , yk the measured input/output signals and ũk , ỹk the additive noise sequences corrupting u0k , y0k , respectively. Consequently, the true input and output signals, i.e. u0k and y0k , respectively, ∗

Control Theory and Applications Centre, Coventry, UK

u0k

y0k

SYSTEM

ũk

ỹk

Fig. 1. Typical EIV systems setup.

are only available via the noisy measurements uk = u0k + ũk ,

yk = y0k + ỹk .

(1)

In the field of modelling for nonlinear systems, the bilinear system (BS) models have been exploited to advantage in various practical applications, e.g. control plants, biological and chemical phenomena, earth and sun science, nuclear fission, fault diagnosis and supervision, see [5, 14] or [15]. The fact that the BS models are so widely applicable stems the need to extend the EIV approaches developed for linear systems to encompass the BS case. One interesting approach for linear EIV systems is the so called bias-eliminating least squares (BELS), first proposed in [21, 22, 23]. The BELS method was subsequently analysed in [6] and [7], whereas its extension to handle BS was proposed in [12]. This paper addresses the recursive realisation of the bilinear bias-eliminating least squares (BBELS) technique.

2. NOTATION AND PROBLEM STATEMENT BS models are characterised by a nonlinearity in the form of a product between the input and state. In general, regarding the state-space representation, a discrete timeinvariant single-input single-output (SISO) BS can be described by, see [2]: xk+1 = Axk + Bu0k + Gu0k xk , y0k = Cxk + Du0k ,

x0 = x̄0 ,

(2a) (2b)

where xk ∈ Rnx denotes the state vector and x̄0 its initial value, with u0k ∈ R and y0k ∈ R being the noise-free input and output sequences, respectively. The timeinvariant matrices A, B, C, D and G are of appropriate dimension and characterise the 91

dynamical behaviour of the system. It is to be noted that an input dependent system matrix can be expressed as A(u0k ) = A + u0k G yielding input dependent steady-state and dynamic characteristics. Different methods for the discretisation of the continuous BS can be found, see [3] or [4]. In this paper, attention is placed on a particular class of the discrete time-invariant SISO BS that can be represented by the following nonlinear autoregressive with exogenous input process, i.e. A(q

−1

)y0k = B(q

−1

)u0k +

nη X

ηii u0k−i y0k−i ,

(3)

i=1

where nη ≤ nb ≤ na and q −1 is the backward shift operator, defined by xk q −1 , xk−1 . The polynomials A(q −1 ) and B(q −1 ) are given as follows A(q −1 ) , 1 + a1 q −1 + . . . + ana q −na , B(q

−1

) , b1 q

−1

+ . . . + bn b q

−nb

(4a) (4b)

A discussion regarding the state-space realisability of the input/output description of BS can be found in [8, 9]. The BS given by (3) belongs to a class of diagonal BS (see [16] for more details). Diagonal BS models are possibly the most commonly utilised class of BS for the purpose of industrial applications, see [1, 5, 20]. Furthermore, the diagonal BS exhibit a crucial property of interest, namely, that there exists no correlation in the bilinear terms between the coupled input and output signals, i.e. E[u0k−i y0k−i ] = 0, where E[·] denotes the expected value operator, see [16]. Additionally, although not exploited here, the state-space realisability of the diagonal BS is also guaranteed, see [9]. In the remainder of this paper the reference will be made to the diagonal BS, exclusively. The following assumptions are postulated: A1. The diagonal BS is time-invariant, asymptotically stable, observable and controllable. A2. The system structure, i.e. na , nb and nη , is known a priori. A3. The true input is white, zero mean, bounded and persistently exciting of sufficiently high order. A4. The corrupting input/output noise sequences are zero mean, ergodic, white signals with unknown variances σũ and σỹ , respectively, mutually uncorrelated and uncorrelated with the noise free signals u0k and y0k , respectively. 92

With reference to the linear case, the assumptions postulated here are typical in the EIV framework, see [17]. Whilst this property is not true for the general class of BS, A3 implies that E[y0k ] = 0, see [16]. The system parameter vector is defined as θ T , aT bT η T ∈ R n θ , (5)

where

aT , a1 . . . a n a ∈ R n a , bT , b1 . . . b n b ∈ R n b , η T , η11 . . . ηnη nη ∈ Rnη

(6a) (6b) (6c)

with nθ = na + nb + nη . The regressor vectors for the measured data are given by ϕTk , ϕTyk ϕTuk ϕTρk ∈ Rnθ (7)

with

where the notation

ϕyk , −yk−1 . . . −yk−na ∈ Rna , ϕuk , uk−1 . . . uk−nb ∈ Rnb , ϕρk , ρk−1 . . . ρk−nη ∈ Rnη ,

(8a) (8b) (8c)

ρk−i , uk−i yk−i

(9)

is used to denote the bilinear product terms. The corresponding noise contributions in the regressor vectors are denoted with a tilde, e.g. ϕ̃k , whereas the noise free signals are denoted with a zero subscript, e.g. ϕ0k . The notation Σcd is used as a general notion for the covariance matrix of the vectors ck and dk , whereas ξcf is utilised for a covariance vector with fk being a scalar, i.e. Σcd , E ck dTk , Σc , E ck cTk , ξcf , E[ck fk ]. (10) The corresponding estimates denoted by [ˆ·] are given as N

Σ̂cd ,

1 X ck dTk , N k=1

Σ̂c ,

1 X ck cTk , N k=1

ξ̂cf ,

1 X ck fk , N k=1

(11)

where N denotes the number of data samples. In addition, 0g×h and Ig denote the null matrix of arbitrary dimension g × h and the identity matrix of arbitrary dimension g, respectively. The dynamic identification problem for diagonal BS in the EIV framework considered here is formulated as follows: Problem 1 [Dynamic diagonal BS EIV identification problem] N Given N samples of the measured signals {uk }N k=1 and {yk }k=1 , determine the vector ϑT , θ T σũ σỹ ∈ Rnθ +2 . (12)

3. BIAS COMPENSATED LEAST SQUARES This section provides a brief review of the bias compensated least squares technique for diagonal BS, see [10, 12] for further details. The bilinear bias compensated least squares (BBCLS) algorithm for the class of diagonal BS comprises of equations (13a), (13b) and (13c), see [12]. These correspond to the bilinear bias compensation rule, the noise covariance matrix and the auto-correlation of noise on the bilinear terms, respectively, i.e. θ̂BBCLS , (Σϕ − Σϕ̃ )−1 ξϕy ,   σỹ Ina 0 0 Σϕ̃ ,  0 σũ Inb 0 , 0 0 σρ̃ Inη σρ̃ , σu σỹ + σy σũ − σũ σỹ ,

(13a) (13b) (13c)

where σu , E[u2k ] and σy , E[yk2 ] are the variances of the measured system input and output signals, respectively. Equation (13a) can be alternatively re-expressed as θ̂BBCLS = θ̂LS + Σ−1 ϕ Σϕ̃ θ̂BBCLS ,

(14)

where θ̂LS denotes the least squares (LS) estimate. A recursive realisation of the BBCLS approach was presented in [11]. It is implied from the BBCLS algorithm that the knowledge regarding noise variances corrupting input/output of a system together with variances of measured input/output signals is sufficient to obtain unbiased estimates of the true system parameters. Whilst the variances of the input/output signals can be estimated directly from available measurements, two more equations are required to determine the variances of input/output noise sequences. This issue is addresses in the subsequent section. 94

4. OFFLINE BILINEAR BIAS ELIMINATING LEAST SQUARES In this section the offline BBELS algorithm will be briefly reviewed. In general, for the BELS based approaches the two additional equations in order to determine the input and output noise variances are formed by considering an overparametrised system, i.e. a system with an augmented parameter in A(q −1 ) or B(q −1 ) polynomial. Since in both cases the considerations are analogous, the first option is considered here. The overparametrised system (3) is given by Ă(q

−1

)y0k = B(q

−1

)u0k +

nη X

ηii u0k−i y0k−i ,

(15)

i=1

where Ă(q −1 ) , 1 + a1 q −1 + . . . + ana q −na + ăna +1 q −na −1 .

(16)

The additional parameter, denoted with a breve is null by definition, i.e. ăna +1 , 0

(17)

such that the augmented system (15) is formally equivalent to the system (3). The parameter vector corresponding to Ă(q −1 ) is ăT , aT ăna +1 ∈ Rna +1 . (18) The augmented parameter vector, denoted θ̆, is given by θ̆ T , ăT bT η T ∈ Rnθ +1 .

The augmented regressor vector for the measurements is defined as ϕ̆Tk , ϕ̆Tyk ϕTuk ϕTρk ,

(19)

(20)

where

ϕ̆Tyk , ϕTyk

−yk−na−1 .

(21)

The BBCLS scheme for the augmented system (15), in accordance with (14), is given by ˆ ˆ ˆ θ̆BBCLS = θ̆LS + Σ−1 ˜ θ̆BBCLS ϕ̆ Σϕ̆ 95

(22a)

with  σỹ Ina +1 0 σũ Inb Σϕ̆˜ ,  0 0 0

 0 0 . σρ̃ Inη

(22b)

The utilisation of (17) implies that the following linear constraint must be satisfied H T θ̆ = 0,

(23)

H T , hT 0 . . . 0 ∈ Rnθ +1 , hT , 0 . . . 0 1 ∈ Rna +1 .

(24a)

where

(24b)

The following notation for the inverse of the matrix Σ−1 ϕ̆ is introduced 

 Σ11 Σ12 Σ13 (nθ +1)×(nθ +1)  T  Σ−1 , ϕ̆ , Σ12 Σ22 Σ23 ∈ R T T Σ13 Σ23 Σ33

(25)

where Σ11 ∈ R(na +1)×(na +1) , Σ12 ∈ R(na +1)×nb , Σ13 ∈ R(na +1)×nη . Lemma 1 Considering the overparametrised system (15), the following equality holds ˆLS , σỹ hT Σ11 ă + σũ hT Σ12 b + σρ̃ hT Σ13 η. −hT ă

(26)

Proof 1 See [12] for details. Lemma 2 The asymptotic expression for the expected error of the LS method, denoted ˆ 1 L V (θ̆LS ), where L = N − na , with respect to the overparametrised system (15) is given by 1 ˆ ˆT ă) + σũ b̂T b + σρ̃ η̂ T η. V (θ̆LS ) , σỹ (1 + ă LS LS LS N →∞ L lim

(27)

Proof 2 See [12] for details. Merging the BBCLS rule with Lemmas (1) and (2) allows to solve the identification problem specified by Problem 1. Denoting an iteration index by i and the maximum number of iterations by Imax , the offline BBELS algorithm is summarised as: 96

Algorithm 1 (BBELS algorithm) ˆ î ˆ 1. Compute θ̆LS , σ̂u , σ̂y and set i = 0, θ̆BBELS = θ̆LS , σ̂ρ̃i = 0 while i < Imax do 2. i = i + 1 3. Solve # −1 " T ˆ i T 11 i−1 i−1 T Σ12 b̂i−1 ˆ −h ăLS − σ̂ρ̃i−1 hT Σ13 η̂BBELS σ̂ỹ h h Σ ă BBELS BBELS = ˆ i−1 T i−1 1 ˆT ă î−1 σ̂ũi b̂TLS b̂i−1 1 + ă LS BBELS BBELS L V (θ̆LS ) − σ̂ρ̃ η̂LS η̂BBELS 4. Calculate: σ̂ρ̃i = σ̂u σ̂ỹi + σ̂y σ̂ũi − σ̂ũi σ̂ỹi î ˆ î−1 i θ̆ 5. Compute: θ̆BBELS = θ̆LS + Σ−1 ˜ BBELS ϕ̆ Σϕ̆ end

5. ONLINE BILINEAR BIAS ELIMINATING LEAST SQUARES Since the Algorithm 1 is iterative, it can be easily transformed into a recursive scheme. ˆ This requires online updates of the expressions: Σ−1 ϕ̆ , V (θ̆LS ), σ̂ũ and σ̂ỹ . The update of the matrix Σ−1 ϕ̆ is carried out utilising the recursive LS (RLS) algorithm [24]. Denoting P̆k = (Σϕ̆ )−1 and introducing the user chosen k0 > na , the recursive BBELS (RBBELS) algorithm is given by: Algorithm 2 (RBBELS algorithm) P P 1. Set k = k0 , P̆k = 103 Inθ +1 , σ̂uk = k1 ki=1 u2k , σ̂yk = k1 ki=1 yk2 ˆk ˆk ˆk k Compute θ̆LS , and set θ̆BBELS = θ̆LS , σ̂ρ̃ = 0 for k = k0 + 1 . . . N 2. Calculate

−1 Lk = P̆k−1 ϕ̆k 1 + ϕ̆Tk P̆k−1 ϕ̆k ˆk ˆk−1 ˆk−1 θ̆LS = θ̆LS + Lk yk − ϕ̆Tk θ̆LS

P̆k = P̆k−1 − Lk ϕ̆Tk P̆k−1 ˆk−1 ˆk ˆk 2 ) V (θ̆LS ) = yk − ϕ̆Tk θ̆LS + (k − na − 2)V (θ̆LS 3. Solve # −1" k T 11 k−1 ˆk − σ̂ k−1 hT Σk η̂ k−1 T Σ12 b̂k−1 ˆ −hT ă σ̂ỹ h Σk ă h 13 LS ρ̃ BBELS BBELS k BBELS = ˆk k T k−1 k−1 k T k−1 1 ˆk )T ă ˆk−1 σ̂ũk 1+(ă (η̂LS ) η̂BBELS LS BBELS (b̂LS ) b̂BBELS k−n −1 V (θ̆LS ) − σ̂ρ̃ a

4. Calculate 1 k − 1 k−1 σ̂u + u2 k k−1 k 1 k − 1 k−1 σ̂y + y2 σ̂yk = k k−1 k σ̂ρ̃k = σ̂uk σ̂ỹk + σ̂yk σ̂ũk − σ̂ũk σ̂ỹk

σ̂uk =

ˆk ˆk ˆk−1 5. Compute: θ̆BBELS = θ̆LS + P̆k Σkϕ̆˜ θ̆BBELS end ˆk ) is based on a present esNote that the recursive computation of the expression V (θ̆LS k timate of the parameter vector θ̆LS , i.e. at the time instance k. This introduces a sysˆk tematic error to the expression V (θ̆LS ), which is propagated subsequently through the ˆN entire identification procedure. Therefore, the value of V (θ̆LS ), computed at the last time instant will not be, in general, equal to the corresponding value obtained from the offline BBELS algorithm. Consequently, the estimates of the vector ϑ resulting from the online and offline algorithms will also differ. This issue is crucial in the initial period of the identification as the estimates of θ̆LS are of a very low quality and hence can have a significant effect on the accuracy as well as on the convergence of the entire recursive algorithm. In order to alleviate this problem it can be considered to use an offline exˆk pression for the calculation of V (θ̆LS ) during the first M recursions. Although, this will not eliminate the total mismatch, but the effect of the first M − 1 imprecise estimates of θ̆LS is removed and the accuracy of the estimated ϑ is improved. Other possibility could 98

ˆk be to utilise the online update of V (θ̆LS ) with the offline expression being used every m recursions over a data window of a fixed length l ≪ N . Moreover, it is also noted that, according to Lemmas (1) and (2), the computation of σũk and σỹk (also in the case of the offline algorithm) involves the term σ̂ρ̃k which is approximated by its previous value, i.e. the value at the time instance k − 1. This procedure introduces an additional degree of an approximation and hence also uncertainty to the overall RBBELS algorithm. It must be stated that the BBELS algorithm and hence also the RBBELS approach is not always convergent, especially for low signal-to-noise ratios. An extensive study regarding the convergence of the BELS method for linear systems was presented in [7]. It was shown that the convergence of the BELS based techniques depends not only on the values of the signal-to-noise ratios on the input and output but also on a particular model structure. Consequently, it is conjectured that also the BBELS and thus RBBELS approach are subject to this property.

6. SIMULATION STUDIES This section provides a numerical evaluation and comparison of the proposed RBBELS approach with the RLS and the offline BBELS algorithm. The SISO diagonal BS system with na = 2, nb = nη = 1 is simulated for N = 10000 samples. The parameter vector to be identified is given by ϑT = 1.200 0.900 0.600 0.100 0.016 0.005 . (28)

The input sequence is white and uniformly distributed with |u0k | < 0.354. The selected values of the input and output noise variances correspond to the signal-to-noise ratios equal approximately 9dB on both the input and output. In the case of the BBELS algorithm the iterations are restricted to 10, i.e. Imax = 10. The RBBELS approach is initialised with k0 = 100. The results of the estimation procedure for a single particular realisation of the simulation are depicted in Figure 2 and Figure 3. Considering Figure 2, a significant bias is seen on the estimates corresponding to the RLS. The BBELS approach obtained estimates of the model parameter vector relatively close to their true values. It is noted that the estimates of θ produced by the RBBELS converge to their offline counterparts over the successive recursions. Moreover, the RBBELS algorithm was able to achieve estimates virtually indistinguishable from the BBELS approach at the last recursion step, i.e. for k = N . Analogous findings are noted when considering Figure 3. These observations can be seen as an indication 99

0.9

−0.95

true BBELS RBBELS RLS

−1.05 −1.1

0.85

−1

0.8

0.75

−1.15

0.7

−1.2 2000

4000

6000

0.65

8000 10000

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

0.15 0.7

η11

0.6 0.5

0.1

0.05

0.4 2000

4000

6000

8000 10000

Fig. 2. The results of the identification procedure using RBBELS, BBELS and RLS algorithms.

supporting the appropriateness of the recursive scheme. Furthermore, the scattered values of the estimates of the vector ϑ in the initial part of the identification procedure can be related to the relatively low precision of the estimated variances of input and output signals, i.e. σ̂u and σ̂y , respectively.

7. CONCLUSIONS An approach for the online identification of a discrete time-invariant single-input single-output errors-in-variables diagonal bilinear systems has been presented. The technique is based on the extension of the bias compensated least squares and biaseliminating least squares methods for bilinear systems. Since, based on the least squares principle, the required computational burden is relatively low. The method proposed has been demonstrated via a numerical study. Further work could aim to relax the assump100

−3

x 10

true BBELS RBBELS

0.016 0.015

σỹ

σũ

0.014 0.013

0.012 2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

0.19

0.05

0.18

σy

σu

0.17 0.045

0.16 0.15

0.04

0.14 2000

4000

6000

8000 10000

Fig. 3. The results of the identification procedure using RBBELS, BBELS and RLS algorithms.

tion referring to the whiteness of the input signal.

REFERENCES [1] BURNHAM K. J. Self-tuning Control for Bilinear Systems. PhD thesis, Coventry Polytechnic, 1991. [2] DUNOYE, A. Bilinear Self-tuning Control and Bilinearisation of Nonlinear Industrial Systems. PhD thesis, Coventry University, 1996. [3] DUNOYE, A., BALMER L., BURNHAM K. J. and JAME, D. J. G. On the discretisation of single-input single-output bilinear systems. Int. J. of Control, vol. 68(2), pp. 361–372, 1997. [4] EKMAN M. Identification of linear systems with errors in variables using separable nonlinear least squares. In Proc. of 16th IFAC World Congress, Prague, Czech Republic, 2005. [5] EKMAN M. Modeling and Control of Bilinear Systems: Applications to the Activated Sludge Process. PhD thesis, Uppsala University, 2005. 101

[6] HONG M., SÖDERSTRÖM T. and ZHENG W. X. A simplified form of the biaseliminating least squares method for errors-in-variables identification. IEEE Trans. on Automatic Control, vol. 52(9), pp. 1754–1756, 2007. [7] HONG M., SÖDERSTRÖM T. and ZHENG W. X. Convergence properties of biaseliminating algorithms for errors-in-variables identification. Int. J. of Adaptive Control and Signal Proc., vol. 19(9), pp. 703–722, 2005. [8] KOTTA U. and MULLARI T. Equivalence of realizability conditions for nonlinear control systems. In Proc. of the Estonian Academy of Sciences. Physics. Mathematics, vol. 55(1), pp. 24–42, 2006. [9] KOTTA U., NOMM S. and ZINOBE, A. S. I. On state space realizability of bilinear systems described by higher order difference equations. In Proc. of 42nd IEEE Conf. on Decision and Control, vol. 6, pp. 5685–5690, 2003. [10] LARKOWSKI T., LINDEN J. G., VINSONNEAU B. and BURNHAM K. J. Identification of dynamic errors-in-variables models for bilinear systems. In Proc. of 7th Int. Conf. on Technical Informatics, Timisoara, Romania, 2008. [11] LARKOWSKI T., LINDEN J. G., VINSONNEAU B. and BURNHAM K. J. Recursive bias-compensating algorithm for the identification of dynamical bilinear systems in the errors-in-variables framework. In Proc. of 5th Int. Conf. on Informatics in Control, Automation and Robotics, Funchal, Madeira Portugal, 2008. [12] LARKOWSKI T., VINSONNEAU B. and BURNHAM K. J. Bilinear model identification in the errors-in-variables framework via the bias-compensating least squares. In IAR and ACD Int. Conf., Grenoble, France, 2007. [13] MARKOVSKY I., WILLEMS J. C., VAN HUFFEL S. and DE MOOR B. Exact and Approximate Modeling of Linear Systems: A Behavioral Approach. Monographs on Mathematical Modeling and Computation. SIAM, 2006. [14] MOHLER R. R. Nonlinear Systems: Applications to Bilinear Control, volume 2. Prentice Hall, Englewood Cliffs, NJ, 1991. [15] MOHLER R. R. and KHAPALOV A. Y. Bilinear control and application to flexible a.c. transmission systems. J. of Optimization Theory and Applications, vol. 105(3), pp. 621– 637, 2000. [16] PEARSON R. K. Discrete-Time Dynamic Models. Oxford University Press, New York, USA, 1999. [17] SÖDERSTRÖM T. Errors-in-variables methods in system identification. In Automatica, vol. 43, pp. 939–958, 2007. [18] SÖDERSTRÖM T., SOVERINI U. and MAHATA K. Perspectives on errors-in-variables estimation for dynamic systems. In Signal Proc., vol. 82(8), pp 1139–1154, 2002. [19] VAN HUFFEL S. and LEMMERLING P. Total Least Squares and Errors-in-variables Modeling: Analysis, Algorithms and Applications. Kulwer Academic Publishers, The Netherlands, 2002. [20] YU D., GOMM J.B., SHIELDS D. N., WILLIAMS D. and DISDELL K. Fault diagnosis for a gas-fired furnace using a bilinear observer method. In Proc. of American Control Conf., vol. 2, pp. 1127–1131, 1995. 102

[21] ZHENG W. X. On a least-squares-based algorithm for identification of stochastic linear systems. In IEEE Trans. on Signal Proc., vol. 46, pp. 1631–1638, 1998. [22] ZHENG W. X. Transfer function estimation from noisy input and output data. Int. J. of Adaptive Control and Signal Proc., vol. 12, pp. 365–380, 1998. [23] ZHENG W. X. A bias correction method for identification of linear dynamic errors-invariables models. IEEE Trans. on Automatic Control, vol. 47, pp. 1142–1147, 2002. [24] LJUNG L. System identification - theory for the user. Prentice Hall PTR, New Jersey, USA, 1999.

103

Computer Systems Engineering 2008 Keywords: network optimization, heuristic

Krzysztof LENARSKI∗ Andrzej KASPRZAK∗ Piotr SKWORCOW†

ADVANCED TABU SEARCH STRATEGIES FOR TWO-LAYER NETWORK DIMENSIONING PROBLEM

This paper concerns use of tabu search-based strategies to solve a two-layer network dimensioning problem. A modular case of a problem in non-bifurcated networks is presented. Since this problem is NP-hard (its decision version is NP-complete), thus, it is highly unlikely to solve it in a reasonable time (for large networks) using exact methods. The main goal of this paper is to examine advanced tabu search strategies such as long term memory and return jump methods. A computer experimentation system has been developed to carry out simulations and complex experiments. Example results of experiments are demonstrated and discussed.

INTRODUCTION

Computer networks are becoming more and more utilized due to increasing popularity of applications that require high bandwidth. Because of growing number of network users it is important for the network to be designed properly and in a reliable manner [1]. Significant attention is given to the network designing issues. Improvement of network functional quality just by few percent may reduce costs of leasing even by thousand dollars per month [2]. The resources (links and nodes) of communication and computer networks are configured in a multi-layered fashion, forming a hierarchical structure with each layer being a proper network on its own. The links of an upper layer are formed using paths of the lower layer, and this pattern repeats as one goes down the resources hierarchy. This may result in network providers hierarchy, i.e. some providers may own resources only at one or two neighbouring layers of a network [3]. ∗ †

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. Water Software Systems, De Montfort University, Leicester, UK.

104

Emerging architectures and technologies for Next Generation Internet (NGI) core networks introduce a whole spectrum of new functions and features. Internet Protocol (IP), enhanced with MPLS traffic engineering capabilities is being used today and is foreseen to be implemented in NGI networks. Furthermore recent advances in wavelength division multiplexing (WDM) technology makes it a strong candidate for the basic technology in the next generation optical transport network (OTN). Thus one of the possible architectures for NGI is IP/MPLS-over-(D)WDM [4]. In this work a modular case of two-layer network dimensioning of networks with non-bifurcated flows is considered. This is a NP-complete problem and path-link formulation results in large routing list that has to be predefined [5]. This paper concerns tabu search (TS) approach to solve a two-layer connection-oriented networks design problem. Proposed algorithm is a meta-heuristic which guides a local heuristic search procedure to explore the solution beyond local optimality. Tabu search is based on a concept that problem solving, in order to qualify as intelligent, must incorporate adaptive memory and responsive exploration. Further information about this topic can be found in e.g. [6-9]. The main goal of this work is to examine tabu search to determine the most efficient parameters. The rest of this paper is organized as follows. Section 2 contains problem formulation. Section 3 is the overview of the proposed solution. Section 4 presents the development of an experimentation system. In Section 5 the results of investigations are presented and discussed. Finally, Section 6 contains final remarks.

Fig. 1. Two layer network example

105

PROBLEM STATEMENT

A network model is represented as a two undirected finite graphs G = (N, L), where N is a set of nodes and L is a set of graph edges. The example of two resource layer network is shown on Fig.1. The demands are directed from one node to another and the sum of demands in both directions needs to be lower than or equal to the link capacity. The network is also characterised by constants, such as: volume of demands, links’ capacity in upper and lower layer and also their costs. The problem can be formulated as follows: Two layer dimensioning - Link-Path formulation indices d = 1, 2, ..., D p = 1, 2, ..., Pd e = 1, 2, ..., E q = 1, 2, ..., Qe g = 1, 2, ..., G constants hd δedp M ξe γgeq N κg

demands, candidate paths in upper layer for flows realizing demand d, links of upper layer, candidate paths in lower layer for flows realizing link e, links of lower layer.

volume of demand d, =1 if link e of upper layer belongs to path p realizing demand d; 0, otherwise, size of the link capacity module in upper layer, cost of one (M -module) capacity unit of link e of upper layer, =1 if link g of lower layer belongs to path q realizing link e of upper layer; 0, otherwise, size of link capacity module in lower layer, cost of one (N -module) capacity unit of link g of lower layer.

106

variables xdp udp ye zeq req ug

flow allocated to path p realizing volume of demand d, binary variable associated with path p realizing demand d, M -module capacity units of link e, flow allocated to path q realizing capacity of link e, binary variable associated with path q realizing demand d, N -module capacity of lower layer link g.

objective minimize

F =

X e

ξe ye +

κg ug

(1)

constraints X

udp = 1,

d = 1, 2, ..., D

(2)

δedp xdp ≤ M ye ,

e = 1, 2, ..., E

(3)

zeq = ye ,

e = 1, 2, ..., E

(4)

e = 1, 2, ..., E

(5)

req = 1,

XX e

γgeq zeq ≤ N ug ,

g = 1, 2, ..., G

(6)

Equation (1) is the objective function defined as a sum costs of lower and upper capacity layer modules. Equations (2-6) represent constraints, where equations (2) and (5) state that each demand must be realized on a single path (non-bifurcated) network, equations (3) and (6) state that flow in each link of upper and lower layer cannot exceed link capacity. Equation (4) specifies that all upper layer capacities have to be realized by the lower layer flows.

107

PROPOSED SOLUTION METHODS

Having determined paths for all demands we can try to find the solution of the stated problem using tabu search algorithm. Following subsections consider neighborhood structure and moves, tabu search memories and tabu structure. 3.1.

NEIGHBORHOOD STRUCTURE AND MOVES

Neighborhood in our case is the list of objects (see example in Table 1) that contains information such as: • start node of demand (Start N), • end node of demand (End N), • list of specified paths connected above nodes (Path), • tabu tenure parameter (TT). The path is divided into two sets of active (A) and inactive (I) paths. The active paths are considered in a current solution and the inactive ones are waiting in a queue to be considered. The moves that define the neighborhood structure consist of transferring a chosen path from one subset to another. 3.2.

TABU SEARCH MEMORIES

The core of the algorithm is to use of TS memory structures to guide through the search process. A short term memory is employed principally to prevent the search from being trapped in a local optimum, and also to introduce vigor in the search process. A long term memory is used to handle more advanced issues, such as intensification and diversification. Short term memory. The short term memory operates by imposing restrictions on the Table 1. Neighborhood object

Start N.

End N.

P.N.

Path

< 1, 2, 4 >

< 1, 5, 4 >

< 1, 2, 3, 4 >

< 1, 2, 5, 4 >

< 1, 5, 2, 4 >

< 1, 5, 2, 3, 4 >

108

composition of a newly generated solution. For elementary moves, we impose restrictions which ensure that a move cannot be ”reversed”. For example, if one of the paths is moved from set I to set A, in the tabu structure it is noted that this move cannot be performed for some period of time. This prevents the algorithm from oscillating around local optimum. A parameter called tabu tenure identifies the number of iterations when a particular tabu restriction remains in force. The tabu tenure parameter is examined in Section V. Long term memory. The long term memory is used to store the frequency of moves in a specified direction. Changing of one route in demand between nodes x and y will increase a parameter called ”frequency” connected with this demand by 1. Long term memory can be used in two cases, either to intensify frequency moves or diversify it. In our implementation we would use it for diversification. 3.3.

TABU STRUCTURE

Short and long term memory are implemented as a special structure called Tabu Structure. It stores two parameters: tabu tenure and frequency. Tabu tenure is stored using a field in every neighborhood object (see Table 1). Frequency is stored for every demand, as shown in Fig.2.

Fig. 2. Example of tabu structure

109

EXPERIMENTATION SYSTEM

In order to carry out investigations for the stated problem a computer experimentation system (CES) was developed. CES was implemented in C# language using Visual Studio 2005 environment with .NET technology [11,12]. The system consists of several modules, which are responsible for different functionality. The application allows to design complex experiments using experiment design module, to find paths and solutions for the stated problem including time measurements using computational module and finally to compare the results on graphs using presentation module. Experiment design module. In this module user can enter the network data and algorithm parameters. The data describing network structure and the intensity matrix [13] can be entered manually or loaded from file. The algorithm parameters can only be entered manually. Computational module. This module is an implementation of path finding and then tabu search algorithms. Presentation module. This module is responsible for presentation of all results obtained during experiments. One can summarize the results on a chart and save the results to either a txt or xls file.

INVESTIGATIONS

Four two-layer networks were considered during investigations, their parameters are summarised in Table 2. All of these networks have some similarity to real, existing networks. The lower layers are problem instances of SNDlib 1.0, which is the library of test instances for fixed telecommunication network design [14]. The upper layers are composed from the lower layers, with some nodes randomly removed. Demands and size of modules M used in investigations were also taken from SNDlib. In the fourth network there was no problem instance with a modular capacity, hence some arbitrary values, which can be found in Table 3, were used. It was assumed that each link has the same module size and cost, both for upper and a lower layer: In this paper results of three experiments are presented. In the first experiment the number of iterations was investigated. The second part considered how the results were affected by different tabu tenure values. And, finally, in third one use of advanced strategies is investigated. All the investigations were conducted on each network, and repeated 10 times to make the obtained results more representative. Experiments were conducted on a PC with AMD Sempron 2500+ (1,8GHz) processor with 512 RAM. 110

Table 2. Networks parameters

Upper lay. Network

Lower lay.

Dem.

atlanta

Number of nodes

Number of links

210

Network

poland

Number of nodes

Number of links

Network

dfn-bwin

Number of nodes

Number of links

Network

di-yuan

Number of nodes

Number of links

5.1.

EXPERIMENT 1. NUMBER OF ITERATIONS

Experiment 1 and 2 investigated how the results depend on different tabu search parameters. The number of algorithm iterations was increased while observing what influence it had on results and on computation time. Results for “atlanta” network are shown in Fig.3, and for other networks in Table 4. Table 3. Module capacities and costs

Name

Cost M

Cost N

atlanta

1000

950000

4000

1090000

poland

155

272

622

816

dfn-bwin

1000

44400

4000

54400

di-yuan

100

150

111

Fig. 3. Experiment 1: Cost and computation times for different number of iterations for “atlanta” network.

As expected lower cost was obtained when more iteration was performed. In the first part of chart in Fig.3 it can be noticed that the cost decreased steeply, but after exceeding 300 iterations further improvements became insignificant. 5.2.

EXPERIMENT 2. TABU TENURE

Results of investigation how the results are affected by different tabu tenure values are summarised in Table 5. Table 4. Experiment 1. Results

Networks poland

dfn-bwin

di-yuan

8,25E+06

1,86E+08

1,33E+05

7,36E+06

1,55E+08

1,10E+05

6,13E+06

1,51E+08

9,52E+04

5,22E+06

1,33E+08

9,05E+04

4,89E+06

1,16E+08

8,08E+04

4,44E+06

1,08E+08

7,72E+04

3,73E+06

9,87E+07

6,98E+04

3,09E+06

8,97E+07

6,64E+04

2,88E+06

8,78E+07

6,27E+04

100

2,69E+06

8,50E+07

6,22E+04

112

Tabu tenure parameter had no influence on the algorithm computation time. It can be noticed in Table 5 that for the considered networks the best results were achieved for tabu tenure equal to 4 or 5. Tabu tenure is a crucial parameter for tabu search methods and we it was observed that in worst cases badly chosen value of tabu tenure increased the cost by 13%. 5.3.

EXPERIMENT 3. ADVANCED STRATEGIES

This experiment investigated how use of the advanced strategies influences the results. As it can be seen in the Fig.4 using diversification did not improve the results and the cost was higher than in other cases by approximately 30%. The best results were obtained using intensification with or without back jump or when the long term memory was not used at all.

CONCLUSIONS

This paper investigated the use of tabu search-based strategies to solve a two-layer network dimensioning problem. The main goal of this paper was to examine impact of advanced tabu search strategies and a computer experimentation system has been Table 5. Experiment 2. Results

Networks atlanta

poland

dfn-bwin

di-yuan

5,09E+08

8,41E+07

2,13E+08

5,80E+04

4,88E+08

8,45E+07

2,04E+08

6,16E+04

5,12E+08

8,23E+07

2,06E+08

6,08E+04

4,82E+08

7,96E+07

2,14E+08

5,63E+04

4,90E+08

7,67E+07

2,03E+08

5,80E+04

5,44E+08

8,13E+07

2,17E+08

5,72E+04

5,26E+08

7,81E+07

2,14E+08

5,64E+04

5,36E+08

7,41E+07

2,13E+08

5,78E+04

4,97E+08

7,48E+07

2,19E+08

6,04E+04

4,88E+08

7,87E+07

2,12E+08

6,06E+04

113

Fig. 4. Experiment 3. Results

developed for this purpose. Based on simulation results it may be observed that: 1) the number of iterations (NoI) has significant influence on results, the more iterations, the better the result (i.e. lower cost). However after a certain NoI, depending on network size, further increase of NoI does not reduce the cost significantly, but increases the time of computation; 2) the best value of tabu tenure parameter depends on a network structure and has no effect on computation time; 3) using the long term memory the best results were achieved using intensification, long jump has insignificant impact on result in the cases considered. Further work will include to extension of capabilities of the computer experimentation system by adding some new problem instances and implementation of other metaheuristic methods.

REFERENCES [1] LENARSKI K. and KOSZALKA L., Comparison of heuristic methods applied to optimization of computer networks, XVI International Conference on Systems Science, Wroclaw, Poland, 4-6 September, 2007. [2] TANENBAUM A.S., Computer Networks (in Polish), Warszawa, 1988. [3] PIORO M. and MEDHI D., Routing, Flow, and Capacity Design in Communication and Computer Networks, San Francisco: Morgan-Kaufmann, 2004. [4] KUBILINSKAS E. and PIORO M., An IP/MPLS over WDM network design problem, International Network Optimization Conference (INOC) 2005, Lisbon, Portugal, 20-23 March, 2005. [5] KUBILINSKAS E., Notes on application of Path Generation to Multi-layer network design problems with PF flow allocation, 17th Nordic Teletraffic Seminar (NTS 17), Fornebu, Norway, 25-27 August, 2004. [6] GLOVER F., Tabu search fundamentals and uses, Colorado, 1995. 114

[7] GLOVER F., Tabu Search - Part I, ORSA Journal on Computing, Vol. 1, No. 3, pp. 190206, 1989. [8] GLOVER F., Tabu Search - Part II, ORSA Journal on Computing, Vol 2, No. 1, 4-32, 1990. [9] GLOVER F., Xu J. and Chiu S.Y., Probabilistic Tabu Search for Telecommunications Network Design, Combinatorial Optimization: Theory and Practice, Vol. 1, No. 1, 1997, pp. 69-94. [10] DROZDEK A., C++. Algorithms and data structures (in Polish), Gliwice, 2004. [11] STEFACZYK A., Secrets of C sharp (in Polish), Gliwice, 2005. [12] PERRY S. C., Core C sharp and .NET, Gliwice, 2006. [13] KASPRZAK A., Wide area networks with packet switching (in Polish), Wroclaw, 1999, pp. 173-216. [14] Survivable Network Design Library [Online, 2008]. http://sndlib.zib.de/home.action

115

Computer Systems Engineering 2008 Keywords: optimization, flow allocation, dimensioning problem, multilayer network

Michał KUCHARZAK* Leszek KOSZAŁKA* Andrzej KASPRZAK*

OPTIMIZATION ALGORITHMS FOR TWO LAYER NETWORK DIMENSIONING Modern computer networks utilise integration of different technologies. This led to a multilayered model of network, raised new questions and challenged most of the existing optimization algorithms developed for a single layer. This paper considers flow allocation in multilayered networks and investigates dimensioning of networks involving two layers of resources. Two different strategies for dimensioning network links capacities based on the Dijkstra’s shortest path algorithm are compared. The first strategy concerns the dimensioning directly from the mean traffic load in the individual layers from the top layer to the bottom layer in a multilayered network. The second approach considers actual flow and multilayered path regarding the coordination of the individual network layers. Experimentation results and comparison strategy are illustrated and discussed.

1. INTRODUCTION

Modern communication networks can be composed of more than one layer of resources. Multi-layer technology has evolved with many different multi-layering possibilities, e.g., IP networks can be provided over ATM, MPLS, SONET or WDM. In fact, it is possible to have more than two layers, e.g. IP over ATM over SONET network [8]. Unlike regular networks, these multilayer networks allow users and other networks to interface on different technology layers. Introduction of multilayered models of networks resulted in most of the network optimization problems becoming more computationally demanding. Furthermore, many of existing well-understood optimisation problems on single layer networks *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

116

changed to be far-from-trivial on multilayered networks [3]. Problems defined on multilayered networks are more difficult to solve than these stated for flat networks [4]. The integration of different technologies such as ATM, SDH, and WDM in multilayer transport networks raises many questions regarding the coordination of the individual network layers [1]. In this work authors compare two deterministic methodologies for off-line solving of dimensioning problems for multilayered networks involving two layers of resources. The first approach concerns flow allocation directly from the mean traffic load in the individual layers from the top layer to the bottom in the multilayered network respectively, the second approach considers actual flow and multilayered path regarding the coordination of the individual network layers. This paper is divided into six main sections. The main idea and a model of multilayered networks is discussed in Section 2, supported by examples and related illustrations. In the same section a logical model of multilayer networks involving two layers of resources is introduced. Section 3 presents a dimensioning optimization problem statement with particular assumptions and constraints. Section 4 describes two deterministic approaches for solving dimensioning problems. Investigations are described in Section 5. This section illustrates basic principles and algorithms comparison with evaluation results. Final remarks appear in Section 6. 2. MULTILAYER NETWORK 2.1 CONCEPT OF MULTILAYER NETWORKS

Multilayer networks are networks composed of more than one layer of resources [5, 8]. In Fig. 1 an idea of two-layer network is illustrated as an example. The example shows network which consists of two layers: traffic network and transport network.

Fig. 1. Two-layer network example 117

Traffic network represents a logical capacity to carry the traffic (tagged links d are logical). To route these logical links and the associated capacity it is necessary to introduce and set a transport network. For example, the logical link d=1, between nodes A and C, in the traffic network illustrated in Fig. 1 can be connected with the transport network route A-B-C (or e=1, e=2). Similarly, data unit for the logical link d=6, between nodes A and D, can be connected via the transport route A-B-F-D (or by links e=1, e=5, e=4). The logical capacities in traffic layer are realized by means of flows in physical transport layer. 2.2 MULTILAYER NETWORKS MODELLING

Consider the network example illustrated in Fig. 2 and the model presented in [8]. The network consists of two layers of resources (layer 1: equipment, transport layer; layer 2: Virtual Capacity layer) and an additional auxiliary layer (demand layer) used merely to specify the demands (logical capacity to carry the traffic).

Fig. 2. Three-layer network model

For each demand d its demand volume hd is realized by means of flows xdp assigned to paths Pd of Layer 2 (Fig. 3). In accordance to [8], Pd = (Pd1, Pd2, …, PdPd) is used for denoting Layer 2 candidate link-path list for demand d while Qe = (Qe1, Qe2,

118

…, QeQe) is used for denoting Layer 1 candidate link-path list for link e. Examples of candidate paths Pd and Qe for network presented in Fig. 2 are shown in Table 11. Table 1. Candidate Path Lists Link d =1 d =2 d =3 d =4 d =5 d =6

Path List P11={1} P12={3, 4, 2} P13={5, 2} P21={1} P22={1, 3, 4} P23={1, 5} P31={3} P32={5, 4} P41={3} P42={3, 5} P43={3, 1, 2} P51={3, 1} P52={4, 2} P53={3, 5, 2} P61={5} P62={3, 4} P63={1, 2}

Link

Path List

e =1 e =2 e =3 e =4 e =5

Q11={1} Q12={2, 6, 5, 4} Q13={2, 6, 7, 3} Q21={4, 5} Q22={3, 7} Q23={1, 2, 6} Q31={2} Q32={1, 3, 7, 6} Q33={1, 4, 5, 6} Q41={6} Q42={2, 1, 3, 7} Q51={2, 6} Q52={1, 4, 5}

Fig. 3. Demand d=6 of volume h6 realization by means of bifurcated flow x6 on three paths from path list P6 in VC layer

The resulting loads of flows xdp on each link e of Layer 2 determine link capacity vector, denoted y, of the layer (as in the example in Fig. 3). The next step is analogous. The capacity of each link e in Layer 2 is realized by means of flow in Layer 1, and the resulting Layer 1 flows zeq determine the load of each link g of Layer 1, and hence its capacity ug. Unit costs (ξe, κg) define the cost of transporting data on each link in each layer. Considered multilayer network model is based on the following assumption: if a node appears in an upper layer, then it automatically appears downwards. In a nutshell, the resources (links and nodes) of communication and computer networks are configured in a multi-layered fashion, forming a hierarchical structure with each layer being a proper network on its own. The links of an upper layer are formed using paths of the lower layer, and this pattern repeats as one goes down the resources hierarchy [8]. Actual flow in a multilayer network is assigned to a multilayered path that describes a path throughout the entire network.

The routing lists do not necessarily contain all possible paths. 119

3. NETWORK DIMENSIONING PROBLEM The network dimensioning problem concerns finding flow allocations in the upper layer (VC layer) and in the lower layer (equipment), as well as the capacity of links in both layers, given the cost of modules capacity unit of link for data transferring in both layers. According to the multilayer network model described in Section 2 and considerations in [8] the dimensioning problem formulation for a network containing two layers of resources can be stated as follows: A two layer network dimensioning problem indices d = 1, 2, . . . , D p = 1, 2, . . . , Pd e = 1, 2, . . . , E q = 1, 2, . . . , Qe g = 1, 2, . . . , G

demands candidate paths in upper layer for flows realizing demand d links of upper layer candidate paths in lower layer for flows realizing link e links of lower layer

constants hd volume of demand d δedp = 1 if link e belongs to path p realizing demand d, 0 otherwise ξe cost of one capacity unit of link e γgeq = 1 if link g belongs to path q realizing capacity of link e, 0 otherwise κg cost of one capacity unit of link g variables xdp flow allocated to path p of demand d ye capacity of link e in the VC layer zeq flow allocated to path q realizing capacity of link e ug capacity of (physical, transport) link g in the lower layer objective minimize F = ∑e ξ e ye + ∑ g κ g u g

(1)

constraints

∑ x =h , p

∑∑δ d

edp

d = 1,2,..., D

xdp ≤ ye ,

(2)

e = 1,2,..., E

(3)

120

∑ z =y, q

∑∑γ e

e = 1,2,..., E

z ≤ ug ,

geq eq

(4)

g = 1, 2,..., G

(5)

Constraint (2) describes that flows in the VC layer realize the assumed demand volumes and (3) defines required capacity of each link e. By analogy, ye determines demand for the lower layer that must be realized by means of flows zeq (4), and formula (5) specifies the lower layer capacity constraint. 4. ALGORITHMS Both presented approaches, Top-to-Bottom and Flat Cost Strategies, are based on the well-understood Dijkstra’s shortest path finding algorithm [2]. 4.1 TOP-TO-BOTTOM OPTIMIZATION

Top-to-Bottom (TtB) optimization approach is a simple algorithm based on an assumption that all of resource layers (VC and lower) are networks on their own sense with their own individual demands. Flow allocation is performed separately for each layer, from the uppermost layer to the bottom one.

Fig. 4a. Defined volume of demand hd is demand to be routed directly in VC layer.

Fig. 4b. Capacity ye of all links e determines demand to be routed in transport layer

This approach can be considered as different flow problems for each layer which are combined together in such a way, that the upper layer imposes demand on the neighbouring lower layer (Fig. 4a and Fig. 4b). An exact formulation of the TtB algorithm can be stated as follows:

121

TtB pseudo-code: 1: FOR EACH d 2: Pd := {Pd1}, where Pd1 is derived from Dijksta's shortest path 3: xd1 := hd (demand realization in the way of non-bifurcated flow) 4: FOR EACH e, 5: IF δed1 = 1 6: ye := ye + xd1 7: END IF 8: END FOR 9: END FOR 10: FOR EACH e 11: Qe := {Qe1}, where Qe1 is derived from Dijksta's shortest path ze1 := ye 12: 13: FOR EACH g, 14: IF γge1 = 1 15: ug := ug + ze1 16: END IF 17: END FOR 18: END FOR 19: RETURN F

4.2 FLAT COST OPTIMIZATION

Flat Cost (FC) optimization approach considers all layers (VC and lower) as one integrated network. Flow allocation is performed not on the cheapest path in each layer separately but on the cheapest multilayered path throughout the entire multilayered network. Dijkstra’s algorithm is well defined for network composed of single layer of resources. FC strategy treats a two-layer network as a flat (single layer) artificial network composed of nodes of VC layer and links e represented as lower layer topology (Fig. 5) with new cost of links assigned to each e. Flat Cost methodology introduces a Flat Cost Coefficient (FCC) – ξe + βe. FCC describes a total cost of realization a data unit on the shortest path in lower layer and in VC layer. Let q=1,path list equals Qe={Qe1}, where Qe1 is the Dijkstra’s shortest path. FCC is stated as ξe + βe, where ξe cost of one capacity unit of link e and: ∀ g γ geq = 1, β e = ∑ g κ g

122

(6)

Fig. 5. Each link e is represented as network topology of lower layer (new artificial network is flat with new costs of links ξe + βe, and Dijkstra’s algorithm can be performed easily). In the example links e=1 and e=7 are depicted. Other links are prepared in the same way.

FC pseudo-code: 1: 2: 3: 4: 5: 6: 7: 8: 9:

FOR EACH e Qe := {Qe1}, where Qe1 is derived from Dijksta's shortest path FOR EACH g, IF γge1 = 1 δe := βe + κg (realization cost of load ye on the path Qe1) END IF END FOR FOR EACH d Pd := {Pd1}, where Pd1 is derived from Dijksta's shortest path regarding cost of one capacity unit of link ξe + βe 10: xd1 := hd 11: FOR EACH e, 12: IF δed1 = 1 13: ye := ye + xd1 14: END IF 15: END FOR 16: END FOR 17: FOR EACH e 18: ze1 := ye 19: FOR EACH g, 20: IF γge1 = 1 21: ug := ug + ze1 22: END IF 23: END FOR 24: END FOR 25: RETURN F

123

5. INVESTIGATIONS 5.1 PROBLEM INSTANCES

Examination of the algorithms is performed for three hypothetic multilayered networks. Each of them consists of demand layer with the same topology (Fig. 6).

Fig. 7. Layer 1: Transport layer scenario

Fig. 6. Layer 3: Demand, layer, in the background transport layer is shown

Transport layer 1 with costs of links g (Fig. 7.) is adapted from the core telecommunication network in Poland originated by Polish Telecom [7]. Fig. 7. illustrates three different topologies of Virtual Capacity layers with unit costs of links e.

Fig. 7. Layer 2: Different Virtual Capacity layer scenarios, in the background transport layer is shown

In the problem defined in Section 3 the modularization of links’ capacities ye and ug is not taken into consideration, thus flow allocation assigned to the shortest Dijkstra’s

124

path (in a non-bifurcated way) ensures that the obtained transportation cost is minimized [6]. Volume of demand hd for each d equals to 1 for all instances. In fact for nonbifurcated flow and a case without modularization of links’ capacities it is sufficient to determine the shortest paths for single unit of data. 5.2 OPTIMIZATION RESULTS

The objective function F according to formula (1) for mixed integral case of dimensioning problem is compound by two sub-functions: summation of all flow costs in VC layer ∑eξeye and summation of all flow costs in lower layer ∑gκgug. The objective of optimization is to find flow allocations xdp in the VC layer and flow zeq in the transport layer that the total cost F is minimized. Optimised values of the cost function F are shown in Fig. 8a. The best optimization results were obtained using Flat Cost algorithm for all defined instances. However, TtB approach achieves lower values of cost in its first step, the volume of demand hd realization directly in VC layer is locally allocated more effectively than in FC strategy (Fig. 8b). FC which uses FCC resulted in finding the globally cheapest route throughout all layers in because of high level of minimization of costs in transport layer (Fig. 8c). TtB

Fig. 8a. Dimensioning problem optimization results for three multilayer networks A, B and C

125

TtB FC FC TtB

Fig. 8b. Costs in VC layers

Fig. 8c. Costs in transport layers

TtB

Fig. 9a. Traffic load visualization in VC and transport layers for instance A

TtB

126

Fig. 9b. Traffic load visualization in VC and transport layers for instance B

TtB

Fig. 9c. Traffic load visualization in VC and transport layers for instance C

Dimensioning optimization causes the flow to be allocated in each of two layers VC and transport layer. In Fig. 9a – 9c traffic load on each link for these layers is visualized. Non-zero capacity links are marked proportionally to the volume of load and dashed lines indicate unused links in each layer of resources. 6. FINAL REMARKS This paper compared two deterministic approaches for multilayered networking dimensioning problem. The main statements can be formulated as follows:

127

Flat Cost strategy is better approach than Top-to-Bottom optimization for dimensioning problem for the considered networks; 2. Multilayer Network should not be considered as a set of separate layers but rather as an integrated model; 3. Sometimes it is worth to allocate flow in a more expensive way locally in the VC layer, to obtain better results globally; The main goals of computer networks optimization is to ensure networks reliability, efficiency, security, to reduce utilization of devices, to minimize the use of resources etc. In general, all of these goals lead to improved financial benefits and reduced expenses, hence, it is possible to state the following general statement: dimensioning optimization result depends on topology in each layer of resources and the multilayer network should not be considered as a set of separate layers but as an integrated model. Sometimes it may be worth to install more links, prepare more interfaces and adapt more nodes to minimize the total cost of dimensioning problem in multilayered networks. REFERENCES [1] DEMESTER P., GRYSSELS M., et al. Resilience in Multilayer Networks. IEEE Communications Magazine 37/8, 1999, pp. 70–76. [2] DIJKSTRA E. W., A note on two problems in connexion with graphs, Numerische Mathematik, No. 1, 1959, pp. 269–271. [3] DIJKSTRA F., ANDREE B., KOYMANS K., HAM J. and LAAT C., A Multi-Layer Network Model Based on ITU-T G.805, In Press, May 2007. [4] KOHN M., Improving Fairness in Multi Service Multi Layer Networks, Transparent Optical Networks, No.1), 2005, pp. 53–56. [5] KUBILINSKAS E. and PIÓRO M., An IP/MPLS over WDM network design problem, International Network Optimization Conference (INOC), 2005, pp. 20–23. [6] KUCHARZAK M., Bifurcated and non-bifurcated flows in multilayered networks dimensioning issues, 6th Student Scientific Conference (KNS), Wroclaw, Poland, 2008 (in Polish) [7] ONLINE. The library of test instances for survivable fixed telecommunication network design. http://sndlib.zib.de, 2006. [8] PIÓRO M. and MEDHI D., Routing, Flow, and Capacity Design in Communication and Computer Networks, Morgan Kaufmann Publishers, 2004.

128

Computer Systems Engineering 2008 Keywords: Database, embedded system, experimentation, SQLite, PostgreSQL, MySQL

Dawid PICHEN* Iwona POŹNIAK-KOSZAŁKA*

A NEW TREND IN SOFTWARE DEVELOPMENT PROCESS USING DATABASE SYSTEMS

In this paper, we present a new trend in software development processes, which uses database systems instead of ordinary files to purposes of applications data storing. We also present the implemented experimentation system, which gives an opportunity to investigates problem occurring in databases. The objective of experiments is to evaluate a database management system which may be used as a data storage in developed programs. We present and discuss some results of investigations.

1. INTRODUCTION Most current computer programs need to store non-volatile data in some kind of memory. There are two most popular methods of non-volatile data storing. The first one uses regular files as a data carrier. This method is recommended when the data file has a binary content or has a larger size. Most UNIX applications use this method to store data. For example, program configuration files are kept in the /etc directory. Usually they are just regular text files and each of them contains one or more data fields with zero, one or more assigned values. The structure of such files strongly varies between programs, so this method is not compact. It also has a noticeable disadvantage – program developers have to write a data file parser or use one from many already written libraries. The second method is more sophisticated but it can be used only in Microsoft Windows operating systems (OS). Beginning from Windows NT 3.51 there is a centralized database built-in. This database is commonly called the registry. The *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. 129

purpose of the registry is to store data from different applications in a compact way. The registry is divided into keys and values. There are 12 different types of values. The great advantage of this method is that almost all data (except the huge values) are kept in one place and there is no need to write the parser, since OS handles all operations with the registry through API functions. Usage of the registry is possible only in Windows systems, so the method is not portable. Another method is to use an external database to store data. In that case, it can be portable – of course when the database is. Like in the Windows registry, there is no need to write a parser, it is only necessary to use proper API functions. That makes the program developing process faster and easier. There are many database systems available on the market, but not all of them are recommended for the presented purpose. The first requirement for a database system is to have the smallest possibly size, so that the main application size do not increase too much. The second aspect is the cost of a database system. For that reason we concentrate on free or open source software, to choose database systems that can be used in a developed software without paying a single fee. The third requirement is the portability, both a database system and API libraries should be available on different operating systems and/or hardware architectures. The important requirement is also the performance of a database system. In most cases requirements presented in [1] are eligible. The rest of the paper is organized as follows. Section 2 contains a short description of the considered database systems. In Section 3 the main idea and opportunities of the experimentation system are presented. Section 4 is related to investigations, including design of experiments, results of selected experiments and analysis of the database system effectiveness. Conclusions and final remarks appear in section 5. 2. SELECTED DATABASE SYSTEMS In this paper three well known database systems will be presented and compared in a simulated environment. These database systems are: MySQL, PostgreSQL and SQLite. Each of them is ported to various operating systems and they are open source projects. Each of them also recognizes the SQL language but particular statements and supported features may vary depending on the system.

130

2.1. MYSQL

MySQL is a relational database management system written as an open source project. Its developers established the MySQL AB company, which develops, distributes, sells and supports the commercial version of MySQL database. Because of that, there are two license models available: the GNU General Public License and the commercial license. The first license type allows developers to use MySQL for free, as long as the main application is under the GPL license. This means that MySQL database needs to be purchased if it is going to be used in a commercial product. That can exclude MySQL if it is planned to built a commercial application without paying for a database system. In 2008 Sun Microsystems acquired MySQL AB. The MySQL database management system has been actively expanded since 1995 when the first version emerged. Beginning from version 5.0, it is an advanced database system which supports i.e. unions, subqueries, cursors, stored procedures, triggers, views, query caching, UTF support, etc. Its SQL language conforms to the ANSI SQL 99 standard. MySQL, unlike other database systems, is provided with multiple database storage engines. Although this system is advanced, it is considered to be very fast too. MySQL is a very popular product, mainly among dynamic website developers. It has gained a strong position on the market because it has been used as one of the components of LAMP (Linux, Apache, MySQL, PHP) software package [2]. The LAMP software package is known as an easy to use and maintain, powerful system to deliver dynamic Web content. Accessing MySQL database systems with programs written in major programming languages is not a problem, because there are many client libraries available with well documented APIs [3]. 2.2. POSTGRESQL

PostgreSQL is an open source relational database management system. Unlike MySQL, PostgreSQL is not owned by any company. Its developers belong to PostgreSQL Global Development Group, which is a community of people and companies supporting this project. Great advantage of this project compared to MySQL is its license type. PostgreSQL is available under BSD license, which is very liberal. This license type allows to use PostgreSQL for free, even in commercial, closed source applications. The history of PostreSQL started in 1986 at the University of California at Berkeley where prof. Michael Stonebraker started the Postgres database project. In 1995 two Stonebraker's students replaced the query language used by Postgres with SQL. They also changed the database name to Postgres95. In 1996 Postgres95 was 131

published to the open source community. Developers from around the world stared to modify the database. They have improved stability and speed, added new features and written documentation. That made database a powerful and advanced product. Due to those changes, the project has changed its name to PostgreSQL and the first version of PostgreSQL was 6.0. PostgreSQL Global Development Group advertises its product as The world's most advanced open source database. Indeed, PostgreSQL is an advanced system which supports i.e. foreign keys, joins, views, triggers, stored procedures in multiple languages, subqueries, asynchronous replication, nested transactions, hot backups, UTF support, etc. Its SQL language conforms to the ANSI SQL 92/99 standards. PostgreSQL is often used as a replacement for MySQL in the LAMP package. Like MySQL, there are also many client libraries available for different languages. More info about PostgreSQL can be found in [4]. 2.3. SQLITE

SQLite is also an open source database management system, but it is much different than two systems mentioned before. The main difference is the model of the system. MySQL and PostgreSQL are based on the client-server model, in which at least one server instance is needed to serve queries from clients. Even if the database needs to be installed only on one computer, running the server is necessary to handle the user queries. These queries need to be sent by using a client library. SQLite is an embedded database system which is contained in a single library. To use it, it is only necessary to link the main program with its library file and then, communication with the database is achieved by calling library functions described in the API's documentation [5]. Because of that, there is no a database process in the system. Instead, the process of the main program changes the database file itself by calling proper database system functions. SQLite uses just a single file to store the whole database. This system was designed to have the smallest possible size and be usable as well. Size of its library is less than 0.5 MB, so it can be used not only in computer programs, but also in stand-alone devices such as mobile phones, set-top boxes, MP3/4 players, etc. [6] SQLite is published on the most liberal license type – public domain. Everyone can use its source code for any purpose without any notice that it was used. SQLite source code does not contain any information of its author, because its author wanted to emphasize the license. SQLite is not so advanced as the previous database systems, mainly because of its size. It is addressed to other segments of the market, in which the program size matters. It supports most of the ANSI SQL 92 standard, but some its features are not implemented. Features that are implemented are i.e. transactions, triggers, complex 132

queries and UTF support. Features that are not implemented are i.e. foreign keys support, right and full outer join, table permission statements (since it uses ordinary files and then file permissions), etc. The large difference between SQLite and other DBMSs is columns data types handling. SQLite recognizes different data types in the CREATE TABLE statement, but it actually stores all data in only 5 different storage classes. Those are NULL, INTEGER, REAL (floating point values), TEXT and BLOB (blob of data). But the most important fact is that the column data types are not static. That means each column of records can have different data types assigned. For example, text string can be inserted into a column declared as integer. Declaration of column data types is only a notice to SQLite engine of the recommended data type. This is an unusual behavior, which is sometimes criticized. 3. COMPUTER EXPERIMENTATION SYSTEM In every experimentation system it is necessary to define the inputs and the outputs. Our experimentation system has opportunities for setting up four different inputs (denoted by I1, …, I4) and observing two different outputs (denoted by O1 and O2). The block-scheme of the process of experiment as an input-output plant is shown in Fig. 1. DBMSs MySQL client library

MySQL server

PostgreSQL client library

PostgreSQL server

SQLite library

ordinary file

I1 I2

Experiment plant

I3 I4

Fig. 1. Object of the experiment

Inputs. In Table 1 inputs used in all complex experiments are specified, including the kind of database management system (I1), the number of records (I2), the type of query (I3) and the number of queries for a test (I4).

133

Outputs. Table 2 contains outputs which may be regarded as “local” measures of effectiveness. The outputs are the query execution time (O1) and the processor utilization time (O2). Table 1. Input data for each experiment

Input I1

Input name

Possible values of input

Database Management System

MySQL PostgreSQL SQLite

Number of records (added, updated, deleted or used to select data from)

10, 20, 50, 75, 100, 200, 500, 750, 1000, 3000, 5000, 7500, 10000

Type of query

Acquiring data (SELECT) from 1 table Acquiring data (SELECT) from 2 tables (joining) Adding data (INSERT) Modifying data (UPDATE) Deleting data (DELETE)

Number of queries for a test

10 100 1000

Output symbol O1 O2

Table 2. Outputs for all experiments Output name Query execution time Processor utilization time

The experimentation system consists of several modules. The main module of the system is a special program called DBsim which performs simulations of different database systems. This application was written in C language and was designed to work in UNIX and UNIX-like operating systems. It requires MySQL, PostgreSQL and SQLite developer libraries to work, as well as running instances of MySQL and PostgreSQL servers and earlier prepared suitable users and databases. Each database consists of two tables: files and file_types. The simulator feeds the table files with the information about files stored on the simulator environment a real file system. Table file_types stores MIME types and text description of popular files. SQL statements used to create the files table are presented in Fig. 2. It can be noticed that these statements are not equal, there are few differences between the database systems. This 134

is because of the auto increment feature that is supported in every database used in the experiment, but in a different way. Statement used to create the file_types table is common to every database and is presented in Fig. 3. MySQL

PostgreSQL

SQLite

CREATE TABLE files ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, name TEXT, path TEXT, filedate TIMESTAMP, chksum_md5 TEXT, size INTEGER, file_type INTEGER NOT NULL );

CREATE TABLE files ( id SERIAL NOT NULL PRIMARY KEY, name TEXT, path TEXT, filedate TIMESTAMP, chksum_md5 TEXT, size INTEGER, file_type INTEGER NOT NULL );

CREATE TABLE files ( id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, name TEXT, path TEXT, filedate TIMESTAMP, chksum_md5 TEXT, size INTEGER, file_type INTEGER NOT NULL );

Fig. 2. Statements used to create files table

MySQL / PostgreSQL / SQLite CREATE TABLE file_types ( id INTEGER NOT NULL PRIMARY KEY, mime_type TEXT, description TEXT );

Fig. 3. Statements used to create file_types table

4. INVESTIGATIONS All experiments were executed using the same computer, thus comparing obtained results is reliable. All experiments were performed on a notebook computer with Intel Core 2 Duo T5500 processor and 1 GB of RAM memory. OpenSUSE 10.2 operating system was used with a 2.6.18.8 version of Linux kernel. The main program was built 135

with gcc 4.1.3 compiler. The following versions of database management systems were used in the experiment: PostgreSQL 8.1.11, MySQL 5.0.26, SQLite 3.3.8. 4.1. EXPERIMENT FOR DATA INSERTING

For this research, from 10 to 10000 simulated records were subsequently added to each database system. The total execution time of this operation was measured. The results are presented in Fig. 4. It can be seen that SQLite was the slowest database system, it needed almost 50 s to add 750 records, while its rivals did it in less than 1 s. The fastest database is MySQL, it added 10000 records in less than 1 s. PostgreSQL did it in 6 s, which is still acceptable time for most situations. 4

Time [s]

MySQL PostgreSQL SQLite

-2

-4

2000

4000 6000 Number of records

8000

10000

Fig. 4. Query execution time for records adding (time is on the logarithmic scale)

4.2. EXPERIMENT FOR DATA FETCHING

For this research, there were two different SELECT queries passed to the databases. The total execution time of these queries was measured. Results are presented in Fig. 5. The first query was selecting data from the files table only. The second query was a complex one, it did left outer joining files table with file_types table. It can be noticed that MySQL was the fastest database when the simple query was used. For the complex query, SQLite was the fastest database, however it was the slowest system when the simple query was used. PostgreSQL was about 40% faster than SQLite, when the simple query was used and the dataset had more than 4000 records. In the case of the complex query, PostgreSQL was the slowest database, with results much worse than its competitors (as high as 5 times slower than MySQL).

136

0.2

0.1 MySQL PostgreSQL SQLite

0.1 0.05 0

MySQL PostgreSQL SQLite

0.08 Time [s]

Time [s]

0.15

0.06 0.04 0.02

2000

4000 6000 8000 Number of records

10000

Fetching data from a single table

2000

4000 6000 8000 Number of records

10000

Joining data from two tables

Fig. 5. Query execution time for data selecting

4.3. EXPERIMENT FOR DATA MODIFYING

In this experiment values in column chksum_md5 were changed in all table rows. The UPDATE statement execution time was measured. Results are presented in Fig. 6. It can be seen, that the fastest was MySQL database. When there were more than 3500 records in the table, PostgreSQL was the slowest system, but when there were less records, SQLite was. But in all cases, even for a table with a large amount of rows, the data modification time was still tolerable. 0.8 MySQL PostgreSQL SQLite

Time [s]

0.6 0.4 0.2 0

2000

4000 6000 Number of records

8000

10000

Fig. 6 Query execution time for data modifying

4.4. EXPERIMENT FOR DATA DELETING

In the last experiment presented in this paper, the whole content of the files table was deleted. The total execution time of the DELETE query was measured. Results of 137

this experiment are shown in Fig. 7. It can be noticed, that the fastest system was MySQL, it totally beaten its rivals with the time, which was less than 10 ms, even for deleting as many as 10000 rows. When there were more than 1000 records in a table, it can be noticed that the delete query time of SQLite was almost constant (about ¼ s), probably because it changed only the database file in fixed positions). When there were more than 3500 records, PostgreSQL was the slowest database system, and it can be seen that the delete query execution time depends on the number of rows in a linear way. 0.4

Time [s]

0.3 0.2 MySQL PostgreSQL SQLite

0.1 0

2000

4000 6000 Number of records

8000

10000

Fig. 7. Query execution time for data deleting

5. CONCLUSIONS On the basis of the simulations made, the following conclusions may be considered in a program developing process. Using a database system in the computer program instead of regular files strongly reduces the total developing time, because there is no need to write special functions that would fetch, add, find, modify or delete data. Algorithms used in the database systems cause that those operations are usually faster than if they are done by their own functions. If it is planed to store the huge amount of data, MySQL or PostgreSQL should be chosen, but they also need a separately delivered database server with the application. If that application is going to be a commercial product with a free database, then it is better to choose PostgreSQL. If the database has a constant huge table that is delivered with the application and it does not add a lot of rows, SQLite is preferred. If the size of the environment in which the desired application would run is small, SQLite because it does not need a running server and its size is smaller than 0.5 MB.

138

Although SQLite is a very small system, it has a very good overall performance, except of data inserting. Most programs do not store a lot of data, thus, SQLite is a good solution. It neither requires any configuration, so its integration with the developed program would be the easiest. Finding data by using the database system is very fast. In all examined database systems it took less than 200 ms to fetch data from a table that had 10000 records, both when the query was simple or complex. That processing time is acceptable in most cases. The processor utilization time for all compared databases was relatively low, even for a large number of rows in the table. However, these results are not completely sure for MySQL and PostgreSQL, because they work as an another process and their client libraries communicate with them by a socket. It causes the simulator to measure only the client library processor time, which always is low, because the client only passes queries to the main server, and processes the obtained data. REFERENCES [1] POZNIAK-KOSZALKA I., PICHEN D.: Simulation Based Evaluation of Algorithms for Improving Efficiency of Database Systems, pp. 211-218, Proc. of the MOSIS '07 Conference, Rožnov pod Radhoštěm, Czech Republic, 2007. [2] LANE D., WILLIAMS H. E.: Web Database Application with PHP and MySQL, 2nd Edition, O'Reilly, 2004. [3] MySQL 5.0 Reference Manual, http://dev.mysql.com/doc/refman/5.0/en/, last modified: 2008-03-11 (revision: 10190). [4] PostgreSQL 8.3.0 Documentation, http://www.postgresql.org/docs/8.3/interactive/index.html, last modified: 2008-03-06. [5] SQLite Version 3 C/C++ API Reference, http://www.sqlite.org/c3ref/intro.html, last modified: 2008-02-21. [6] HUDSON P., Interview: Richard Hipp, Linux Format magazine, Issue 73, United Kingdom, 2005. [7] POZNIAK-KOSZALKA I.: Relational Data Bases in Sybase Environment – Modeling, Designing, Applications, WPWR, Wroclaw, 2004 (in Polish). [8] KLINE K. E.: SQL in a Nutshell, 2nd Edition, O'Reilly, 2004. [9] POZNIAK-KOSZALKA I., PICHEN D.: Simulation Based Evaluation of Hardware Methods for Improving Efficiency of Database Systems, pp. 131-136, Proc. of the ASIS 2007 Colloquium, Hostýn, Czech Republic, 2007. [10] KING K., JAMSA K.: SQL Tips and Techniques, Premier Press, 2002. 139

Computer Systems Engineering 2008 Keywords: network, protection, p-cycles

Adam SMUTNICKI∗ Krzysztof WALKOWIAK∗

AN ALGORITHM FOR UNRESTORABLE FLOW OPTIMISATION PROBLEM USING P -CYCLES PROTECTION SCHEME

This paper deals with Unrestorable Flow Optimisation (UFO) problem in networks protected by p-cycles. This novel protection technique is used as the eﬃcient tool for ensuring survivability of computer networks. In this paper, the mathematical model of UFO problem and the original solution algorithm based on metaheuristics are formulated. The proposed algorithm combines k-shortest paths method, multi knapsack problem, p-cycles generator, linear programming method and tabu search approach.

INTRODUCTION

Survivability of computer networks and systems is located among the most important subjects in modern computer engineering and science. This research topic embraces the wide spectrum of particular technological and theoretical problems derived from computer architecture area, network topology, communication protocols, transmission, coding, cryptography, etc. The topology of the computer network has crucial meaning for its survivability, since physical creation of the net links is much more time-consuming and troublesome than producing a new (or spare) device, furthermore faults of network links and nodes are still the common problem. Traditionally, for years, ring and mesh topologies are used to increase net survivability, with all their advantages and disadvantages. Quite recently, p-cycles appeared as the competitive and useful tool for providing survivability of real computer networks. The approach was born less than a decade ago and makes a great career in a short time. The fundamental usage of p-cycles assumes ∗

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

140

that network configuration can be updated, namely the capacity of some links can be increased, to achieve desired level of protection. The cost of such capacity modification constitutes the goal function of the suitable optimization problem. This case has been considered commonly in the scientific papers. Nevertheless, p-cycles offers also possibility of achieving higher reliability of the network without any additional cost on increasing link capacities. This case ensures clear benefits in comparison to classical cycle approach, but has been only mentioned in literature and practically not researched — such problem will be considered in the presented paper. The remainder of the paper is organized as follows. In Section 2 we provide brief introduction into p-cycles idea. Section 3 affords mathematical model of the originally formulated UFO (Unrestorable Flow Optimization) problem whereas Section 4 describes the solution method and component algorithms.

2. FUNDAMENTALS OF P-CYCLES Traditionally, either ring or mesh topology have been used in the construction of survivable computer networks. Ring offers short restoration time as well as simple restoration scheme, however its design and operation is rather complex and the usage of total transport bandwidth is inefficient. Mesh is easy to design, optimize and operate, but have greater than ring restoration time. Mesh networks do not require as much spare capacity as rings, because in the restoration process capacity demand can be split between different links. On the other hand, rings are so efficient in the restoration process because there are no need to search for restoration path. Obviously, there is a great need to find topology, which aggregates all advantageous properties of mesh and ring networks. This idea was fully realized in the concept of p-cycles, that means “fast as ring”, “efficient as mesh”, preconfigurable and protected. 2.1.

BASIC NOTIONS AND PROPERTIES

We perceived computer network protected by p-cycles as mesh with working paths realizing demands of flows between specified nodes, by using one of shortest direct routes (k-shortest paths). A collection of p-cycles is formed in advance while configuring network, to be ready to use in case of any failure and perform real-time recovery. p-Cycles are not an ordinary cycles. Let us consider a mesh network and choose some cycle (Fig. 1 a and b). In classical cycle protection 141

approach, this cycle protects all spans being ”on-cycle”. In the paper [9] it is shown that cycle established on mesh network protects also ”straddling spans”, i.e. spans between cycle nodes, but not belonging to the cycle (Fig. 1 c). Observe, that in case of failure of ”straddling span” the arc of cycle can be used to transfer whole flow from this failed span. This property allows one to extend protection provided by p-cycles on straddling spans as well.

Fig. 1. p-Cycles in the mesh network.

In case of failure of ”on cycle” span, there is one path which can be used to transfer flow (Fig. 2 b). But for failure of ”straddling span” there are two different paths, which can be used for recovery process. Arc of the cycle can be used as a path, or both arcs to achieve lower load on links (Fig. 2 c and d). Because without using any additional links and spare capacity we achieve much 142

higher level of protection, protected are not only cycle spans but also ”straddling spans”. ”Straddling spans” have twice the leverage of an on-cycle span in terms of efficiency because when they fail, the cycle itself remains intact and can thereby offer two protection paths for each of unit of protection capacity. This spans are not limited to be inside a cycle, each span between two nodes of the cycle is protected by this idea.

Fig. 2. p-Cycles protection schemes for various types of failure.

Notice, we do not need any additional spare capacity to protect ”straddling spans”, because the spare capacity from ring spans is used to protect those spans. This means that we can protect much more spans and link capacity using the same amount of spare capacity as in the ring model. Thus, under the same costs we can achieve higher level of network survivability. 143

2.2.

OPTIMISATION PROBLEMS AND EXTENSIONS

Traditional p-cycles approaches consider two optimization tasks, [8], each of them is formulated for pre-generated set of p-cycles. The former one tends to achieve maximum restorability using existed spare capacity on links. Our overall aim is to find a configuration of p-cycles in the network, on which we have specified amount of spare capacity. The problem can be defined as: X min ui (1) i∈S

where S — the set of network spans; ui — the number of unrestorable working channels on span i. Formula (1) minimizes the number of unprotected channels, thus maximizes restorability. In the latter problem we are looking for the minimum spare capacity, which we have to provide to our network, in order to achieve 100% restorability. The optimization task can be defined as: X min ci si (2) i∈S

where ci — cost or length of span i; si — the number of working spans on link i. Both optimization tasks relay on the assumption that a candidate set of p-cycles hav been already generated and we only have to choose the best one (or a set) among them, to achieve desired reliability. In practice, the number of such candidates growths exponentially with increasing network size. Therefore, efficient algorithms of p-cycles generation and philosophy of making this generation are especially desirable, see e.g. [8, 13, 23]. Since 1998 there have been published many interesting ideas improving pcycles reliability and functionality. The first, actually one of the most popular, extends scope of p-cycles protection by considering not only links, but also nodes, or whole paths. In [6] there is presented a new interpretation of a straddling-span, not as physical connections but logical links (then more than one physical span can be protected). This approach is called path protection because it protects a path from one node lying on a cycle to another node on a cycle, including links and nodes on a path. Using this method not only links can be protected but also nodes. The idea of protection of a straddling path consisting of more than one physical link have been proposed in [8]. In the basic p-cycles model we have guarantee that 100% recoverability is obtainable for any single failure. However, assumption about lone failure cannot be sustained in networks of current size and scope. Networks must be prepared to 144

deal with multiple failures in the same time, see e.g. node failure (all links to this node become disconnected contemporary). In [12] dual-failure protection model is discussed, and an integer linear programming model for designing a minimum-cost p-cycles network with a specified minimum dual-failure restorability is proposed; some level of dual-failure restorability can be achieved without any additional spare capacity. An alternative approach, [16], is based on an assumption that it is hardly to observe dual (multiple) failure at the same time moment. Then, some action (dynamic reconfiguration of p-cycles) can be taken in the time period between failures.

3. MATHEMATICAL MODEL AND UFO PROBLEM In this section, we present an optimization model of flow allocation (as well as the routing pre-selection) in network protected by fixed configuration of p-cycles. An existing backbone network is considered. In many cases the network is in an operational phase and augmenting of its resources (links, capacity, replica servers). The network survivability — indispensable in modern computer networks — is provided by the concept of p-cycles described in previous section. In the problem we are given network topology, link capacity, traffic demand matrix, candidate paths for demands, p-cycles configuration. We optimize over working flows in normal non- failure state of the network for protection in the case of single link failure. The objective is to minimize the unrestored flow, i.e. flow that due to limited resources of link capacity cannot be restored using the concept of p-cycles. This problem was firstly described in [19]. Notation based on [17] and [2] will be used. Indices e, l = 1, 2, . . . , E d = 1, 2, . . . , D p = 1, 2, . . . , Pd q = 1, 2, . . . , Q s = 1, 2, . . . , S

network links (spans) demands candidate paths for flows realizing demand d p-cycles failure states

Constants δedp = 1, if link e belongs to path p realizing demand d; 0 otherwise hd — volume of demand d ce — capacity of link e 145

βeq ǫeq

= 1, if link e belongs to p-cycle q; 0 otherwise = 1, if p-cycle q can be used for restoration of link e; 0 otherwise i.e. link e either belongs to p-cycle q or is a straddling span of q — coefficient of restoration paths provided for failed link e by an instance of p-cycle q (= 1 for an on-cycle link; = 0.5 for a straddling span; = 0 otherwise)

γeq

Variables xdp = 1 if demand d uses path p; 0 otherwise (binary) fe — load of link e associated with working demands ydeq = 1 if demand d uses path p-cycle q for restoration in the case of failure of link e; 0 otherwise (binary) zde = 1 if demand d is not restored in the case of failure of link e; 0 otherwise (binary) gel — load of link e associated with p-cycle in the case of failure of link l Objective min U =

zde hd

(3)

d = 1, 2, . . . , D

(4)

Constraints X

xdp = 1,

fe =

XX p

δedp xdp ,

ǫeq ydeq =

gel =

zde +

δldp xdp

δedp xdp hd ,

e = 1, 2, . . . , E

(5)

d = 1, 2, . . . , D

(6)

l = 1, 2, . . . , E

(7)

βeq ydlq γlq hd ,

e = 1, 2, . . . , E

fe + gel ≤ ce ,

e = 1, 2, . . . , E

l = 1, 2, . . . , E

(8)

Constraint (4) imposes the non-bifurcated flow, i.e. one of candidate paths must be selected for each demand. (5) is a definition of the link load, calculated as a sum over all demands and candidate paths. Constraint (6) assures that if demand d uses link e then in the case of failure of link e either the demand d is not restored, or one of p-cycles selected for restoration. The right-hand side of 146

(6) is 1, only if link e belongs to the path p selected for demand d. Consequently, if path p used by demand d does not use link e, there is no need to decide on restoration of d in the case of link e failure. Since we include in the sum constant ǫeq (= 1, if p-cycle q protects link e) in the sum over all p-cycles (right-hand side), (6) guarantees that the p-cycle q selected for restoration of demand d using link e in the case P of failure of link e (ydeq = 1) can restore flow of d. In other words, in the sum q ǫeq ydeq we take into account only these p-cycles q, which can protect link e. Let us have a closer look at definition (7), which enables to calculate the overall flow of p-cycles allocated to link e in the case of failure of link l. Notice that γlq ydlq hd denotes how much flow of demand d is allocated to p-cycle q for restoration in the case of failure of link l. Recall that γlq is restoration coefficient of paths provided for failed link l by p-cycle q. Due to construct of p-cycle γlq = 1 if l is on-cycle link; γlq = 0.5 if l is a straddling link of p-cycle q (restoration is run on both parts of p-cycle, therefore each part carries half of the demand), and finally γlq = 0 otherwise. Moreover, term δldp xdp enables to check whether or not the path selected for demand d uses link l. Combining both terms (δldp xdp γlq ydlq hd ) we obtain the flow (if any) of demand d carried on link e in the case of link l failure. Finally, to compute gel we must check if p-cycle q uses link e (βeq = 1). The next constraint (8) is a capacity constraint and assures that flow allocated to link e in normal failure-free state of the network plus flow associated with p-cycles using link e in the restoration process does not exceed link capacity. The UFO problem is the special case of (3) – (8) obtaining by fixing paths for realizing demands (i.e. by fixing variables xdp ).

SOLUTION METHOD

In the context of optimization problem formulated in Section 3 one can propose quite natural decomposition into several sub-problems as follows. At the begin we have the network with detailed description of topology, span capacities and costs. Then, for this network a set of demands is given. Each demand determines flow transfer between pair of nodes (from source to destination). In order to satisfy these demands routing paths have to be found, taking into account a cost of each path. This problem is known in the literature as multicomodity flow (MCF), and will be described briefly in Section 4.1. MCF problem requires the set of shortest paths (k-SP) between specified pair of nodes, among which MCF selects the set of the best ones (see [17] for detail). Observe, the configuration of paths 147

satisfying demands need be advantageous for p-cycles configuration. Our overall aim is to find the optimal set of p-cycles; this will be done by tabu search (TS) algorithm, which goes through the solution space by certain search trajectory verifying candidates of p-cycles. Successive p-cycles in this space can be generated on demand by using a reasonable generator. One among described in literature can be used to this order, [1, 4, 10, 13, 14, 18, 18, 21, 22, 22–24]. For fixed set o p-cycles and fixed routing paths realizing demands we have to solve the UFO problem. Its optimal solution is created by independent checking of restorability for each span with nonzero flow. If this span transfers single commodity flow, its restorability case can be simply evaluated. Otherwise, if the span transfers multicommodity flow, the multi-knapsack (MK) problem is used to find optimal restoration scheme. Both cases require the evaluation of so called spare capacity of the cycle. For disjoin cycles inside current p-cycles configuration such evaluation can be made independently and the problem is not troublesome. If cycles inside current p-cycles configuration are not disjoint, the problem of finding real spare capacity of cycles will be solved by special linear programming (LP) problem. Before the construction of final (aggregated) solution algorithm one should selects the optimization strategy, either non-joint or more complex joint optimization. More elaborate discussion of this subject is presented in Section 4.2, where examples and description of efficiency are shown. The solution method of UFO problem is described in Section 4.3. Section 4.4 introduces MK Problem and its solution methods. Section 4.5 provides the LP problem used for distribution of spare capacities in cycles in current p-cycles configuration. 4.1.

MULTICOMMODITY FLOW PROBLEM

Term commodity means in computer networks a set of packets having the same source and target nodes. Generally spans transfer multicommodity flow. The muticommodity flow problem can model average flow of computer data packets in networks in chosen time unit. By [20], multicommodity flow problem I = (G, c, K) is defined for undirected graph G = (V, E), where V means a set of vertices, E is a set of graph edges and c is a vector of span capacities. For G there can be defined specification K with k commodities, counted 1, . . . , k, where specification of commodity i consists of pair source-target si , ti ∈ V and non-negative value of demand di . The number of different sources is represented by k ∗ , the number of nodes is n and number of vertices is m, m ≥ n. Graph G is connected and does not have parallel edges. Symbol uw represents an edge is unique and means directed edge. 148

Multicommodity flow f consist of k vectors fi , i = 1, . . . , k. Value fi (uw) stands for flow of commodity i through edge uv. If flow of commodity i have the same direction as edge uv then it has the same sign, in other situation has opposite sign. Is is only needed to determine flow directions. For each commodity i there are defined following constraints: X

fi (wu) −

wu∈E

fi (uw) = 0 ∀u 6∈ {si , ti }

(9)

fi (uw) = di

for u = si

(10)

fi (wu) = di

for u = ti

(11)

uw∈E

wu∈E

Total value of flow on span uv is defined as: X f (uw) = |fi (uw)|

(12)

Flow can be realised when: f (uw) ≤ c(uw)

(13)

Accordingly to [5] multicommodity flow with integer values of flow and even two commodities is the N P-complete task. Since MF problem is known for years, any literature algorithm can be used to solve it. 4.2.

OPTIMISATION OF P -CYCLES

In Section 2.2 (following by [19]) there have been presented optimisation tasks associated with p-cycles. Most solutions relay on the assumption that routing (Multicomodity Flow Optimisation, described in Section 4.1) has been already performed. Then the set of p-cycles is generated and searched to find optimal p-cycles configuration, see Fig. 3 a. Notice, in our problem spare capacity and p-cycles configuration depends each other. Then one can propose the alternative approach, see Fig. 3 b, called a joint-optimisation — flow allocation and p-cycles optimization is made with a feedback (by using two-level algorithm). While analyzing literature one can observe that most of authors avoid joint optimization, solving problems by using non-joint attitude. Frequently the argument “to complex” justify abandonment. Yes, this is true, joint optimisation is much more complex, but as it has been shown it may provide much better results. 149

(a)

(b)

Fig. 3. Comparison of two p-cycles optimisation ideas. MCF — Multicomodity Flow, CS — Cycle Selection, SCO — Spare Capacity Optimisation.

So the main idea is to design appropriate algorithm which can give, maybe not optimal solutions, but approximate solution within reasonable time. This can be achieved using metaheuristics (Genetic Algorithm, Ant Colony Optimization, etc) or local search algorithms (Tabu Search, Simulated Annealing, Simulated Allocation). In [8] there have been presented comparison of results received by using join and non-joint optimisation. Tests have been done for four networks: three with 32 nodes and 51, 47, 41 spans and on COST-239 network. For a network with 32 nodes 10 000 p-cycles candidates have been generated, for COST-239 500 p-cycles candidates have been generated. For joint optimisation there have been chosen 5 shortest paths (for realisation of each demands) and for COST-239 the number of chosen paths was 10. Table 1 presents results from [8]. Accordingly to results presented in Tab. 1 it is obvious that joint optimisation produces significantly better solutions that non-joint optimisation — difference is about 20-25%. Quite low redundancy (39%) for COST-239 network is a very good result. There is also a relationship between redundancy achieved and network degree. Lower values of network degree means bigger redundancy value. For this reason COST-239 have best redundancy value (because of highest network degree). For described problem the join-optimisation approach have been chosen, and a Tabu Search [7] algorithm as a direct solution method. The main idea of Tabu 150

Table 1. Comparison of results of joint and non-joint optimisation (from [8]). 32n51s means network with 32 nodes and 51 spans. Network

network degree

total load

total spare

redundancy

number of cycles

Non-joint 32n 51s 32n 47s 32n 44s COST-239

1.18 2.94 2.75 4.72

394 877 414 984 423 596 137 170

352 559 347 285 403 982 81 790

89.3% 90.2% 95.4% 59.6%

16 36 30 7

65.0% 74.6% 82.5% 39.0%

28 33 23 4

Joint 32n 51s 32n 47s 32n 44s COST-239

1.18 2.94 2.75 4.72

405 539 424 267 413 853 143 745

246 943 300 400 341 269 46 910

Search algorithm won’t be described in this paper, because the standard solution will be used, and implementation is in progress. The main idea of this section is to describe non typical problems associated with the analysed optimisation problem. 4.3.

ALGORITHM FOR UFO PROBLEM

In section 3 there has been formulated Unrestorable Flow Optimisation (UFO) problem. We would like to find criteria value described in equation (3) and to calculate matrix zde , values d are given as input. On this stage of consideration we assume that all paths realizing demands are known and all p-cycles are configured with their capacities. Algorithm is presented on Lis. 1, all terms are compatible with used in Section 3, and also: E set of edges in network; De set of flow demands realised by span e; Qe set of p-cycles protecting span e (e can belong to cycle or be a straddling span); Re set of demands which can be restored in case of failure of span e; Ue set of demands which can not be restored in case of failure of span e; Listing 1. Algorithm for evaluation of zde values, need for calculation of ﬁnal value of criteria function.

f or e ∈ E do 151

begin (Re , Ue ) = checkRestoriationP osibility(e, De , Qe ) ; f or d ∈ R do zde = 0 ; f or d ∈ U do zde = 1 ; end ; Function checkRestoriationP osibility(e, De , Qe ) is crucial to this algorith — its aim is to decide whether all demands realized using span e can be restored in case of failure of span e. If not all demands can be restored, this function should optimize this decision process, so that the amount of unrestorable flow will be minimal. This function returns information about two sets — restorable (Re ) and unrestorable (Ue ) demands for span e. It looks quite simple, but is not, as will be show below. There are several scenarios which can appear: i). span e is protected by single p-cycle with capacity q and only one demand d (with the value of flow h) is realised using span e; ii). span e is protected by single p-cycle with capacity q and more than one demand (each of them have its own value of flow hd ) is realised using span e; iii). span e is protected by more than one p-cycle, each with its own capacity qi and more than one demand d (each with value of flow hd ) is realised using span e. Case 1 is very P simple — demand can be restored if h ≤ q. Case 2 is more complicated if d∈De hd > q. If this situation appears, decision which demands restore and which not have to be made. This is some kind of optimisation problem: to minimise unrestorable flow for capacity q. This task can be modeled using well known classical knapsack problem [3]. Case 3 is the most complicated situation. There is a decision which demands can be restored as well as using which p-cycle. This can be also modeled using knapsack problem, but formulated a little different — it is so called multiple knapsack (MK) problem. Detailed description of MK problem as well as possible solution algorithms are described in Section 4.4, see [11] for detail. 4.4.

MULTIPLE KNAPSACK PROBLEM

Multiple Knapsack Problem (MK) is a generalization of well known Knapsack Problem. In MKP exist several knapsacks with generally different capacities. Accordingly to [11] MKP can be formulated as follows: 152

• exists a set of items: N := {1, . . . , n} • each item has its value pj and weight wj , j = 1, . . . , n • exists a set of knapsacks M := {1, . . . , m} • each knapsack has on positive capacity ci , i = 1, . . . , m Subset N̂ ⊆ N is called fitable if elements of N̂ can be assigned to knapsacks without exceeding their capacities. In other words there have to exist a method of fragmentation N̂ into n disjoint sets Ni , such that w(Ni ) ≤ ci , i = 1, . . . , m. Main aim is to choose such subset N̂ , which value is maximised: max

m X n X

pj xij

(14)

i = 1, . . . , m

(15)

j = 1, . . . , n

(16)

i=1 j=1

taking into account constraints: n X

wj xij ≤ ci ,

j=1

m X

xij ≤ 1,

i=1

where: xij ∈ {0, 1},

i = 1, . . . , m,

j = 1, . . . , n

xij = 1 when element j was assigned to knapsack i, otherwise zero. In case when all knapsacks have the same capacity c problem is called multiple knapsack problem with identical capacities (MK-I). When pj = wj , j = 1, . . . , n problem is called as multiple subset sum problem (MSS). MSS with identical capacities of knapsacks is called multiple subset sum problem with identical capacities (MSS-I). MK is a particular case of general generalized assignment problem (GA). Analysing GA problem each item has value pij (instead of pi ), when assigned to knapsack i — value of item depends on chosen knapsack. Accordingly to [11] even simplified version MSS-I problem is strongly N Phard, because it is an optimisation version of 3-partitioning problem, which is strongly N P-complete. In [11,15] there are presented several algorithms for MK. For our needs the greedy approach has been chosen. In both works presented algorithms base on assignment of sequence of elements si−1 + 1, . . . , si − 1 to knapsack i (i = 1, . . . , m) with the assumption that sX i −1

wj ≤ ci

j=si−1 +1

153

This means that items are packed one after another to knapsacks. If item does not fits actualy packed knapsack, item have to be rejected or actual knapsack is full, and one should start packing next knapsack. Of course elements in a queue for packing are sorted descending by value of a weight unit (pi /wi ). This order assures that the most valuable items will be packed in the first order. But algorithm like this have some disadvantages: • when an item does not fit to actual knapsack, moving to packing another knapsack will waste some space in previous knapsack, where probably smaller items will fit; • in case of rejection of item, which does not fit to actual knapsack, it can happen that this item could have been packed into another knapsack and the item with higher weight value than other items is rejected. This problem comes from the fact that from the beginning of this algorithms there was an assumption that complexity have to be linear O(n). It have a great meaning for bigger n and m values. For small n and m values another greedy algorithm can be proposed. Items for packing are sorted descending by the value of a weight unit (pi /wi ). Each element is packed into knapsack according to ”best fit” rule — item is packed into knapsack which value (ci − wj ) is the biggest. Packing like this ensures that in first order the most valuable elements are packed, and when rejecting element, we are sure that we cannot pack it into any knapsack. This algorithm will have complexity O(nm) what is acceptable for small sizes of problem. Of course presented algorithm as well as algorithms from [11, 15] (only greedy algorithms) are approximate algorithms, and won’t give optimal solution. 4.5.

P -CYCLES CAPACITY CALCULATION

In most analyzes of optimization problems connected with p-cycles, authors concentrate on minimization of total needed amount of spare capacity Section 2.2. In those problems simply additional amount of spare capacity is added to a span, which can allow to build desired protection scheme using p-cycles. In problem described in Section 3 there is no possibility to add any spare capacity, only existing spare capacity in working spans can be used. This assumptions generates several additional constraints. Main of them is a decision problem how much of spare capacity have to be assigned to each of used p-cycles, in situation when at least two cycles have common span — Fig. 4. In this situation sum of p-cycles capacities cannot exceed span spare capacity. Additionally maximization 154

of sum of whole p-cycles spare capacity is desired. Mathematical formulation of this problem is described below. There have been used the following sysmbols:

Fig. 4. Example of conﬁguration where two p-cycles have common span.

rq capacity of p-cycle q; se available spare capacity on span e; rqmax maximum potential capacity of p-cycle q (defined in equation 17); βeq = 1 if span e belongs to p-cycle q; Maximum potential capacity of each p-cycle is bounded by value: rqmax = min{se : e = 1, . . . , E, e ∈ q}. For each span we have constraint: X rq βeq ≤ se ,

e = 1, . . . , E

(17)

(18)

Total amount of p-cycles capacity should be maximised: X max rq , q = 1, . . . , Q

(19)

taking into account constraints: 0 ≤ rq ≤ rqmax ,

q = 1, . . . , Q

(20)

The problem (19) – (20) is a typical linear programming task, so simplex method can be recommended here to solve it. 155

5. CONCLUSIONS In this paper we have formulated a new optimization problem of flow allocation network protected by p-cycles. The presented model can be employed to develop exact as well as heuristic algorithms depending on their destination and numerical properties. Optimization problem (3)–(8), considered for each fixed set of p-cycles, is the mathematical programming case with linear goal function and linear as well as non-linear constraints, with 0/1 and continuous decision variables (so called nonlinear mixed mathematical programming). Possibility of solving this problem (e.g. through linearization, relaxation and/or dual approach) is an open task at this stage of the research. In this paper we propose the following intuitive two-level solution algorithm: (1) upper level searches for optimal set of p-cycles, (2) lower level – for each fixed set of p-cycles we need to solve the optimization problem (3)–(8). We propose the algorithmic components used on the lower level, namely multi-knapsack, linear programming and multicommodity flow. Our future research will be concentrated on implementation and testing of the proposed algorithm.

REFERENCES [1] CHANG L. and LU R. Finding good candidate cycles for efficient p-cycle network design. In Proc. of 13th International Conference on Computer Communications and Networks, pp. 321–326, October 2004. [2] CHOLDA P., JAJSZCZYK A. and WAJDA K. The evolution of the recovery based on p-cycles. In Proc. of the 12th Polish Teletraffic Symposium PSRT2005, pp. 49–58, Poznan, Poland, September 2005. [3] CORMEN T.H., LEISERSON C.E., RIVEST R. L., and STEIN C. Introduction to Algorithms, Second Edition. The MIT Press, 2001. [4] DOUCETTE J., HE D., GROVER W. D. and YANG O. Algorithmic approaches for efficient enumeration of candidate p-cycles and capacitated pcycle network design. In Proc. of Fourth International Workshop on Design of Reliable Communication Networks, pp. 212–220, October 2003.

156

[5] EVEN S., ITAI A. and SHAMIR A. On the complexity of timetable and multicommodity flow problems. SIAM Journal on Computing, vol. 5(4), pp. 691–703, 1976. [6] GANGXIANG S. and GROVER W.D. Extending the p-cycle concept to path segment protection for span and node failure recovery. IEEE Journal on Selected Areas in Communications, vol. 21(8), pp. 1306–1319, October 2003. [7] GLOVER F. and LAGUNA M. Tabu search. Kluwer Academic Pusblishers, Massachusetts, USA, 1997. [8] GROVER W. D. Mesh-Based Survivable Networks: Options and Strategies for Optical, MPLS, SONET, and ATM Networking. Prentice Hall PTR, New Jersey, USA, August 2003. [9] GROVER W. D. and STAMATELAKIS D. Cycle-oriented distributed preconfiguration: Ring-like speed with mesh-like capacity for self-planning network restoration. In Proc. of ICC 1998 IEEE International Conference on Communications, pp. 537–543, Atlanta, Georgia, USA, June 1998. [10] KANG B., HABIBI D., LO K., PHUNG Q.V., NGUYEN H.N. and RASSAU A. An approach to generate an efficient set of candidate p-cycles in wdm mesh networks. In Proc. of APCC ’06. Asia-Pacific Conference on Communications, 2006, pp. 1–5, Busan, South Korea, August 2006. [11] KELLERER H., PFERSCHY U. and PISINGER D. Knapsack Problems. Springer Verlag, 2004. [12] LI W., DOUCETTE J. and ZUO M. p-cycle network design for specified minimum dual-failure restorability. In Proc. of ICC 2007. IEEE International Conference on Communications, pp. 2204–2210, Glasgow, Scotland, June 2007. [13] LO K., HABIBI D., RASSAN A., PHUNG Q., and NGUYEN H. N. Heuristic p-cycle selection design in survivable wdm mesh networks. In Proc. of ICON ’06. 14th IEEE International Conference on Networks, pp. 1–6, Singapore, September 2006. [14] LO K., HABIBI D., RASSAN A., PHUNG Q., and NGUYEN H. N. and KANG B. A hybrid p-cycle search algorithm for protection in wdm mesh 157

networks. In Proc. of ICON ’06. 14th IEEE International Conference on Networks, pp. 1–6, September 2006. [15] MARTELLO S. and TOTH P. Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons, West Sussex, England, 1990. [16] MUKHERJEE D. S., ASSI C., and AGARWAL A. Alternate strategies for dual failure restoration using p-cycles. In Proc. IEEE International Conference on Communications, 2006, pp. 2477–2482, June 2006. [17] PIORO M. and MEDHI D. Routing, Flow, and Capacity Design in Communication and Computer Networks. Elsevier Inc., San Francisco, USA, 2004. [18] SCHUPKE D.A. An ilp for optimal p-cycle selection without cycle enumeration. In Proc. of Eighth IFIP Working Conference on Optical Network Design and Modelling (ONDM 2004), Ghent, Belgium, February 2004. [19] SMUTNICKI A. and WALKOWIAK K. Modeling flow allocation problem in mpls network protected by p-cycles. In Proc. of 42nd Spring International Conference on Modelling and Simulation of Systems, pp. 35–42, Hradec nad Moravici, Czech Republic, April 2008. [20] STEIN C. Approximation Algorithms for Multicommodity Flow and Shop Scheduling Problems. PhD thesis, S. M., Electrical Engineering and Computer Science Massachusetts Institute of Technology, 1992. [21] WU B., YEUNG K. L., LUI K.S. and XU S. A new ilp-based p-cycle construction algorithm without candidate cycle enumeration. In Proc. of ICC 2007. IEEE International Conference on Communications, pp. 2236–2241, Glasgow, Scotland, 2007. [22] WU B., YEUNG K.L. and XU S. Ilp formulation for p-cycle construction based on flow conservation. In Proc. of Global Telecommunications Conference, 2007. GLOBECOM ’07, pp. 2310–2314, Washington, DC, USA, 2007. [23] XUE G. and GOTTAPU R. Efficient construction of virtual p-cycles protecting all cycle-protectible working links. In Proc. of 2003 Workshop on High Performance Switching and Routing, pp. 305–309, Torino, Italy, 2003. [24] ZHANG H. and YANG O. Finding protection cycles in dwdm networks. In Proc. of IEEE International Conference on Communications, pp. 2756–2760, 2002. 158

Computer Systems Engineering 2008 Keywords: Mesh, task allocation, simulation

Małgorzata SUMISŁAWSKA* Maciej GARYCKI* Leszek KOSZAŁKA* Keith J. BURNHAM** Andrzej KASPRZAK*

EFFICIENCY OF ALLOCATION ALGORITHMS IN MESH ORIENTED STRUCTURES DEPENDING ON PROBABILITY DISTRIBUTION OF THE DIMENSIONS OF INCOMING TASKS

In efficient utilizing of multi-processing units and computers connected in clusters a proper task allocation is needed. In this paper, four algorithms for task allocation are compared such as First Fit (FF), Stack Based Algorithm (SBA), Better Fit Stack Based Algorithm (BFSBA) and a new one Probability Distribution Stack Based Algorithm (PDSBA) – that is based on SBA. Comparison of the algorithms is provided using dedicated experimentation environment, designed and implemented by the authors.

1. INTRODUCTION. Nowadays computational strength of a single processor unit is insufficient in advanced calculations or data management in corporations. Thus, multi-computers and computer networks are utilized. Advanced processing requires CPUs in multiprocessing units or clusters of computers connected using mesh topology [5]. The mesh topology in this context is commonly used in data processing centres and research facilities of Internet Service Providers. The objective of this paper is to compare several mechanisms of the allocation process. The second section explains basic terms and issues related to the subject. In Section 3 we discuss commonly used allocation algorithms. The new proposed *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. Control Theory and Applications Centre, Coventry University, UK.

159

algorithm is presented subsequently. Section 5 describes designed and implemented experimentation system. Section 6 reports the results of the investigation. The final remarks are given in the last section. 2. PROBLEM FORMULATION In this section, the discussed problem is formulated. At first, we define a twodimensional mesh (Fig. 1) that is a structure of nodes connected in the following way: (i.) most of the nodes are connected with four neighbours, (ii.) border nodes have three adjacent executors, (iii.) four nodes are situated in the corner having two neighbours. The mesh M(W, H) is a rectangular structure containing W×H nodes, where W denotes the width and H describes the height of the grid. A submesh S(i, j, k, l) is a rectangular subgrid of the mesh, where (i, j) refers to the left top corner of the submesh and k and l are its width and height respectively (Fig 1).

Fig. 1. Submesh S(3,2,4,3) of a mesh M(7,5)

Each task is represented by a rectangular grid (wj, hj), which is allocated on the mesh. The newly allocated task cannot overlap any other task in the mesh. Each task has its own Time To Live (duration time) after which is removed from the grid and the released nodes can be reused. Allocation is a process of placing an incoming task into a mesh. There are several efficiency indexes of allocation algorithms defined: the runtime of an algorithm, the fragmentation factor of the mesh or the percentage of allocated tasks. In case of tasks with a short duration time (comparable to the runtime of an algorithm) it is essential to minimize the runtime of an algorithm, in contrast to the tasks which Time To Live (TTL) is much longer than runtime of an algorithm. In this paper, we focused on the second case. Further, we analyse some well known methods that are based on Stack Based Algorithms (SBA) (see [1], [2], [3]) and the new proposed algorithm called Probability Distribution Stack Based Algorithm (PDSBA).

160

Terms used in description of SBA-family algorithms are as follows. Base Node (BN) of each task is its left top node. For each task in the mesh the Forbidden Area (FA) is created. It is a submesh which nodes cannot be used as Base Nodes because the newly allocated task overlap another task in the grid. A Rejected Area (RA) refers to an L-shaped area covering the right and bottom nodes of the mesh. If a Base Node were placed in RA, the task would cross the boundary of a mesh. Base Block (BB) refers to a submesh where Base Node of an incoming task can be placed, thus, the task does not overlap any other task allocated in the mesh. BB is selected from the group of Candidate Blocks. Initial CB is obtained by subtracting RA from the whole mesh. Then by subtracting a Forbidden Area from one Candidate Block we create one to four new CBs. 3. ALGORITHMS The allocation process of SBA algorithm begins when a new task arrives. Since RA is subtracted from the mesh, creating the Initial Candidate Block (ICB), the remaining area of the mesh is being analysed. The ICB is placed on the stack together with the first FA from the list. If there are no FA, the whole ICB becomes the BB. Otherwise the algorithm starts to run in a loop. The FA and CB are taken from the stack and one to four new CB are created by subtracting FA from CB. New CBs are placed on the top of the stack together with the next FA from the list. The whole process recurs until the end of the list of FAs. If there are no FAs in the list to put on the stack, one of CB created with the last FA becomes BB. Better Fit Stack Based Algorithm (BFSBA), which is a modification of SBA, selects the CB with minimal height and minimal horizontal position. In comparison to SBA, BFSBA processes all CBs that are on the stack to choose the appropriate CB. The proposed PDSBA is described broader in the next section. The discussed algorithms are compared with the First Fit (FF) algorithm, which allocates a task in the first discovered submesh which does not overlap any allocated process. FF starts from the top left corner node and continues to check the executors one after another, row by row until it discovers the one which might become BN. 4. PDSBA ALGORITHM The proposed algorithm PDSBA is similar to BFSBA, but the probability distribution of incoming tasks size is being recognized as the new tasks come. The difference between BFSBA and PDSBA is that, while using BFSBA an incoming task 161

is being allocated in CB with the minimal height, the proposed algorithm selects CB, which height is a multiple of a size of task, which is most probable to come. The scheme of PDSBA is illustrated in Fig. 2.

Fig. 2. The scheme of PDSBA 162

5. THE EXPERIMENTATION SYSTEM The algorithms are evaluated using the experimentation system developed in Borland C++ Builder 6.0 Personal Edition. The simulation environment works under Windows XP. In the proposed system, a new task appears in each iteration and it has to be allocate. The main idea of the system is presented in Fig. 3.

Fig. 3. Scheme of the application

Now, we describe steps of a single iteration of the simulator (the experimentation sytem) (Fig. 3): Step 1 Arrival of a new task. Step 2 Searching for BB. Step 3 If BB is discovered, the structure is allocated in the upper left corner of BB. Step 4 If allocation is impossible, it is examined if the queue is full. Step 5 If the queue is not full, the task is being placed at the end of the queue. Step 6 If the queue is full, the task is rejected. Step 7 TTL of each allocated task is decreased.

163

Step 8 If any task is completed (its TTL equals zero) it is removed from the structure. If not, go to Step 11. Step 9 The queue is explored with the aim of discovering task, which fits to the space left by executed task. Step 10 The task selected from the queue is allocated in the mesh using the algorithm chosen before the simulation started. Step 11 When the new task arrives, go to Step 2. Step 12 The simulation stops when all planned tasks are processed. The system has a possibility to simulate the situation where several executors are inactive, thus, there are discontinuities in the mesh structure. Input parameters of the simulator are (Fig. 4): • the selected algorithm, • the mean and variance of the task size, • the width and height of the mesh structure, • the number of incoming tasks. The size of the queue is one fifth of the whole number of the planned tasks. The indexes of performance of the algorithms are: • the number of tasks allocated as soon as they appear, • the number of all allocated tasks, • the number of rejected tasks, • the average time the task spends in the queue, • the fragmentation index in the function of the time. If there are no tasks in the queue, the fragmentation index equals zero. If the queue is not empty, the fragmentation index equals the percentage of idle executors within the entire mesh.

Fig. 4. Input and output parameters of the system

164

6. EXPERIMENTS The purpose of the experiments is to determine the utility of each algorithm for various settings. Two stages of the experiment were performed. In the first one, we examine a full mesh (all the nodes active). The investigations in the stage #2 checked the efficiency of the algorithms in case when several nodes are inactive what simulated shutdown or collapsed executors. We provide 4 experiments in Stage #1 and 2 in Stage #2. Each simulation was 25 times for each allocation algorithm that is 100 simulations in total. Stage #1, Full mesh: Experiment #1. Full mesh (all the nodes active). Grid size: 20×20. 100 tasks to appear. Mean size of a task equals 7, standard deviation = 2 in the first case and 5 in the second case. Examining the whole number of allocated and rejected tasks, and the number of tasks allocated as soon as they arrived. Experiment #2. The same input parameters as in Experiment #1 (full mesh, grid size: 20×20, 100 tasks to appear, mean dimension of a task = 7, standard deviation = 2 in the first case and 5 in the second case). Examining the average time, the task spends in the queue before it is allocated. Experiment #3. Full mesh. Grid size: 20×20. 100 tasks to appear. Mean size of a task equals 7, standard deviation = 2. The dependence between the number of queued tasks and the fragmentation index of the mesh in function of the iteration number is examined. Experiment #4. The same environment parameters as in Experiment #1, case 1 (grid size: 20×20, 100 tasks to appear, the mean task size = 7, standard deviation = 2). The mesh fragmentation indexes in function of the iteration number for each algorithm are compared. Stage #2, Partial mesh: Experiment #5. Partial mesh (several nodes inactive). Grid size: 20×20. 100 tasks to appear. Mean size of a task equals 7, standard deviation = 2 in the first case and 7 in the second case. Examining the whole number of allocated and rejected tasks, and the number of tasks allocated as soon as they arrived. Experiment #6. The same parameters as in Experiment #4 (partial mesh, grid size: 20×20, 100 tasks to appear, mean dimension of a task = 7, standard deviation = 2

165

in the first case and 7 in the second case. Examining the average time, the task spends in the queue before it is allocated. Next, we provide the results of the described experiments. Experiment #1. The aim of the first simulation was to prove that the PDSBA, in comparison to the other algorithms, provides the best results when the task sizes are more predictable. In relation to the mean task size of 7, the standard deviation of 2 is low, thus, the probability density function is focused around the mean value. 100

100

alloc ated as soon as the jobs appeared alloc ated rejected

40 20

alloc ated as soon as the jobs appeared alloc ated rejected

40 20

0 BFSBA PDSBA

BFSBA PDSBA

SBA

FF SBA

Fig. 5. Efficiency of the algorithms according to the standard deviation of the incoming task sizes. The left chart illustrates the situation when the standard deviation equals 2 (rather predictable task sizes). The chart on the right depicts the case when the task size is less predictable (standard deviation equals 5).

When the task dimensions were more predictable, PDSBA provided the best results. Note that although in the second case the percentage of tasks allocated by PDSBA is lower than by SBA, the PDSBA allocated more tasks as soon as they come (without queuing them). It is worthy noticing that although FF is least efficient in comparison to all SBA, it provides better results when the scatter of task dimensions is high. Experiment #2. Investigating the mean queuing time was the reason this experiment was performed. The shorter the queuing time is, the higher probability, that the queuing cache will not become full and reject incoming processes, thus, more tasks will be allocated.

166

8 6

meam queuing time

0 BFSBA PDSBA

BFSBA

SBA

PDSBA

FF SBA

Fig. 6. Mean queuing time for each of the examined algorithms according to the standard deviation of the incoming task sizes. The left chart illustrates the situation when the standard deviation equals 2 (rather predictable task sizes). The chart on the right depicts the case when the task size is less predictable (standard deviation equals 5).

This experiment exhibits the differences between PDSBA, BFSBA and SBA. The chart on the left depicts the situation, when the standard deviation of incoming task sizes is relatively small, thus, the dimensions of incoming processes are more predictable. In this case the PDSBA returns the best results. On the second chart (relatively high standard deviation of task sizes), one can notice that although all algorithms based on stack behave very similar, the classic SBA is the most efficient. The behaviour of the FF algorithm shows that its efficiency is far less than the other mentioned algorithms. In both cases almost the same queuing time of FF can be observed, so the conditions when the task sizes varies significantly does not influence on the behaviour of the FF algorithm. Experiment #3. The goal of the experiment is to examine the dependence between the number of queued tasks and the fragmentation index of the mesh in function of time. 35

25 20

5 0

0 0

queued

100

40 queued

fragmentation

100

fragmentation

Fig. 7. The number of queued tasks and the fragmentation index for SBA (left) and FF (right) 167

20 20

15 15

10 10

5 5

0 0

queued

100

queued

100

fragmentation

Fig. 8. The number of queued tasks and the fragmentation index for BF SBA (left) and PDSBA (right)

Due to the definition of the mesh fragmentation used in this paper, the value of that indicator reliably determines the efficiency of the algorithm after a certain number of the tasks have appeared. After the first rejection of a task, the percentage of idle executors is high, what increases the fragmentation index, although numerous tasks may be allocated in the free space. The examined performance indicator became stable when about 60 – 70 tasks have come. Dependence between the fragmentation index and the number of queued tasks may be derived from Fig. 7. and Fig. 8. As the incoming task is not allocated, the percentage of idle nodes does not decrease. Experiment #4. The goal was to compare the mesh fragmentation in function of the iteration number for each examined algorithm. The environment has been chosen with the same parameters as in Experiment #1, case 1 (grid size: 20×20, 100 tasks to appear, mean task size = 7, standard deviation = 2). 35 30 25 20 15 10 5 0 0

30 PDSBA

BFSBA

SBA

100

Fig. 9. Mesh fragmentation for each of examined algorithms in function of the number of iteration 168

The shape of fragmentation index function is similar for all examined algorithms. After the arrival of about 60 tasks, the fragmentation index became stable. The percentage of vacant tasks was least for BFSBA, although in the first phase of the simulation (before 30 tasks have appeared) PDSBA provided the lowest fragmentation index. Experiment #5. The purpose of this experiment was to compare the efficiency of PDSBA to the other mentioned algorithms (SBA and derivatives and FF) for the environment of a mesh structure with randomly removed nodes. Other conditions (the mean size of incoming tasks and their dimensions) remain the same as in the first experiment. As one can notice, the efficiency of each algorithm in that kind of environment is rather different from one observed in Experiment #1, where the mesh was continuous (without inactive nodes). In this case the PDSBA and SBA are almost equally efficient. When the variance of task dimensions is low in comparison to their mean size, the BFSBA gives worse results than the other algorithms based on stack, but its efficiency increases with the variance of the task sizes and in the second case BFSBA appear to be very efficient. 80

100

80 allocated as soon as the jobs appeared allocated rejected

40 20

allocated as soon as the jobs appeared allocated rejected

40 20

0 BFSBA PDSBA

FF SBA

PDSBA BFSBA SBA

Fig. 10. Efficiency of all investigated algorithms according to the standard deviation of the incoming task size for environment of a mesh structure with randomly removed nodes. The chart on the left illustrates a situation when the standard deviation of the task dimensions is relatively small in comparison with their mean value. The chart on the right depicts a case, when the incoming task sizes differ significantly from their mean size.

The FF algorithm in both cases gives unacceptable results. Although in the environment with continuous mesh its efficiency was rather small according to performance of the rest of investigated algorithms, it is totally inapplicable to the environment of partial mesh topology (most of the tasks were rejected).

169

Experiment #6. The aim of this experiment was to examine the average time the process spend in the queue before it is allocated for the environment with partial mesh (some nodes removed or inactive) and compare the results for relatively predictable task sizes and significantly various task sizes. The results of this experiment shows, that the queuing time of tasks waiting to be allocated for all algorithms is comparable when the mesh is not full and the sizes of incoming tasks are predictable (small variance of the task dimensions). In that case the SBA performed the longest queuing time. 12

12 10

8 6 mean queuing time

4 2

8 6 4

meam queuing time

2 0

0 BFSBA PDSBA SBA

BFSBA PDSBA SBA

Fig. 11. Mean queuing time for each of the examined algorithms according to the standard deviation of the incoming task sizes for the environment with partial mesh. The left chart illustrates the situation when the standard deviation equals 2 (rather predictable task sizes). The chart on the right depicts the case when the task size is less predictable (standard deviation equals 5).

When the sizes of incoming processes were less predictable (higher variance of the task sizes), the results were significantly different. The shortest queuing time is a feature of the PDSBA algorithm, the FF algorithm performs the less satisfying queuing time. 7. CONSLUSIONS AND PERSPECTIVES There is no such allocation algorithm among the analysed cases, that is the best in all cases. This research resulted in specifying fields of usage, in which the examined algorithms give the best results. Table 1 summarises the results of the experiments. The conditions for which the specific algorithm is most effective has been defined roughly. The formula to define the best algorithm for the given input parameters is to be set up in the future.

170

Table 1. Application of the examined algorithms

Situation Full mesh

Variance of task sizes

Recommended algorithm

yes

low

PDSBA

yes

high

SBA

low

BFSBA

high

PDSBA, SBA

REFERENCES [1] LISOWSKI D., Analiza efektywności wykorzystania algorytmów alokacji zadań w sieciach o topologii siatki, Wrocław, Politechnika Wrocławska, 2004. [2] KUBIAK M., Analiza wydajności dynamicznych algorytmów alokacji zadań w systemach zorientowanych siatkowo, Wrocław, Politechnika Wrocławska, 2006. [3] KOSZAŁKA L., LISOWSKI D., PÓŹNIAK-KOSZAŁKA I., Comparison of allocation algorithms for mesh structured networks with using multistage simulation. [4] BANI-MOHAMMAD S., OULD-KHAOUA M., ABABNEH I., MACKENZIE L., noncontiguous processor allocation strategy for 2D mesh connected multicomputers based on sub-meshes available for allocation, In, Proceedings of the 12th International Conference on Parallel and Distributed Systems, 12-15 July 2006. [5] Blue Gene Project, http://www.research.ibm.com/bluegene/index.html, 2005. [6] YOO B.-S., DAS C.-R., A fast and efficient processor allocation scheme for meshconnected multicomputers, IEEE Transactions on Parallel & Distributed Systems. [7] ABABNEH I., An efficient free-list submesh allocation scheme for two-dimensional meshconnected multicomputers, Journal of Systems and Software.

171

Computer Systems Engineering 2008 Keywords: MPI, parallel processing, benchmark, implementation

Bolesław TOKARSKI∗ Leszek KOSZAŁKA∗ Piotr SKWORCOW†

SIMULATION BASED PERFORMANCE ANALYSIS OF ETHERNET MPI CLUSTER

This paper considers the influence of network aspects and MPI implementation on the performance of an Ethernet-based computer cluster. Following factors are considered: message size, number of nodes, node heterogeneity, network speed, switching technology and MPI implementation. The primary index of performance is throughput measured with Intel MPI Benchmark. It was found that there is a specific message size that is optimal for each cluster configuration. The most important factors were found to be the network speed and MPI implementation.

1. INTRODUCTION Nowadays CPUs use a multi-core architecture. This means that single-threaded, single-process applications can not make use of more than one core at a time. As a result, multithreaded applications are commonly used in desktop PCs. However, large data computing centres use multiple server computers interconnected typically with Ethernet. More information about Ethernet, its design and protocols can be found in [1]. Multithreading alone cannot overcome the obstacle of network interconnection. This is the field of Message Passing Interface (MPI). MPI is de facto standard library used for large scale computation. It is used in supercomputers to provide fast and reliable data exchange between nodes. All supercomputers listed on Top500 Supercomputing Sites [2] use MPI as their communication library. Sun Microsystems in its book about Multithreading for Solaris [3] mentions a combination of multithreading and RPC (Remote Procedure Call – see [4]) as a method of extending ∗ †

Department of Systems and Computer Networks, Wrocław University of Technology, Poland. Water Software Systems, De Montfort University, Leicester, UK.

172

a multithreaded program to run over the network. The book also points Sun’s ClusterTools, based on OpenMPI as a more effective approach for distributed systems. The MPI standard is defined on the MPI Forum in [5] and [6]. It consists of a set of function calls needed to ensure a consistent communication between nodes in a multi-computer cluster. The MPI library was designed to address supercomputers consisting of thousands of CPUs. The objective of this paper is to find the factors influencing the performance of Ethernet-based MPI network cluster. Performance tests were conducted on low-cost PCs. In this paper the tests were focused mainly on comparison of MPI implementations from Sun, LAM and MPIch. More investigations on tuning a network-based cluster can be found in e.g. [8]. Along with optimisation methods a comparison of MPIch2, GridMPI, MPIch-Madeleine and OpenMPI can be found in that work. Other MPI tests were conducted by Brain VanVoorst and Steven Seidel [7] with comparison of MPI implementations on a shared memory machine. Efficiency is measured using the SendRecv test from Intel’s MPI Benchmarks [9]. Following factors are evaluated: • cluster heterogeneity and the presence of a master node, • number of nodes in the cluster, • interconnection speed, • switching algorithm, • MPI implementation. The paper is organised as follows. In section 2 the system used for experiments is discussed, including hardware and software details. Section 3 contains the results of the tests mentioned above. Section 4 lists other factors that might influence the cluster. Section 5 contains conclusions from the experiments. Section 6 shows perspectives on future work with MPI.

2. EXPERIMENTATION SYSTEM 2.1. HARDWARE

The computer cluster used in this paper consists of a set of 10 computers, 8 of which are IBM NetVista 6579-PCG – Pentium III-based, 866 MHz with 128 MB of RAM and 173

1 Fast Ethernet card. The detailed specs of the nodes are available from IBM [10]. The master node is a brandless Pentium IV 3 GHz with 1.5 GB of RAM and 3 Fast Ethernet cards (bonded). The slow node is a brandless Pentium 166 MHz with 64 MB of RAM and 1 Fast Ethernet card. The slow node is used to test the efficiency of a heterogeneous cluster. Two different devices were used for the interconnection. The first one is a UB Networks 10 Mbps Half-duplex hub, the second is a 3com SuperStack II 3300 – 100 Mbps Full-duplex, intelligent switch, used in 10 mbps mode. 2.2.

SOFTWARE

The following software is used in the cluster: Debian Linux 4.0, MPI libraries – OpenMPI 1.2.4 and Intel MPI 3.1. As a test suite Intel MPI Benchmarks 3.1 was used; graphs in the further part of the paper illustrate the results of SendRecv test from the suite. The mechanism of this test is thoroughly discussed in the documentation [9]. In summary, the test uses the MPI function call MPI Sendrecv. Each node sends a message of a given size to its “right” neighbour and receives a message from its “left” neighbour. The time measured is the time between the start of sending the message and receiving it by the last node in the cluster. Throughput is expressed by Equation (1). 2 · messagesize (1) time To obtain meaningful results each test was repeated 1000 times. The charts contain average results. It was assumed that during the test no unnecessary services were running on the nodes. throughput =

3. TEST RESULTS 3.1.

EXPERIMENT #1 – CLUSTER HETEROGENEITY

Design of the experiment The following hardware configurations were used for this test: • IBM NetVista computers (Pentium III 866 MHz), • 8 IBM NetVista computers + master node (Pentium IV 3 GHz), • 8 IBM NetVista computers + master node + slow node (Pentium 166 MHz). 174

Other conditions: Intel MPI library, interconnected with 100 mbps switch in intelligent mode, master node connected with 3 bonded links. Experiment results

8 standard nodes 8 standard + master node 8 standard + master + slow node

24 22 Throughput in Mbytes/sec

20 18 16 14 12 10 8 6 4 2 0 1

64 256 1k 4k 16k 64k 256k 1M Number of Bytes Transferred

Fig. 1. Measured throughput of clusters consisting of a) 8 standard nodes b) 8 standard + 1 master node c) 8 standard + 1 master + 1 slow node

Figure 1 illustrates the throughput of the tested set. It shows that for message size up to 32 KB the best results are obtained for a homogeneous cluster. However, it seems that there is an additional burden of synchronising the nodes when the size of message exceeds 32 KB. Without the master node, performance of the cluster drops from 21 MB/s to 13 MB/s, when the message size is 128 KB. Adding a slow node leads us to having half the throughput of a homogeneous cluster with a master node. This would result in a huge slowdown in applications that require node synchronisation or assume equal computing power of nodes. However, one needs to note that the real slowdown seen here does not affect certain types of computation, i.e. so called “embarrassingly parallel” tasks. 175

3.2.

EXPERIMENT #2 – NUMBER OF CLUSTER NODES

Design of the experiment In this test we used a master node plus 1, 4 or 8 standard nodes interconnected with one of the following: • UB Networks hub in 10 mbps half duplex mode • 3Com switch in 100 mbps full duplex mode • 3Com switch in 10 mbps half duplex mode Other conditions: Intel MPI library, switch used in intelligent mode, master node connected with 1 link. Experiment results Figure 2 illustrates how the cluster behaves for a different number of nodes and different network interconnection. As one can see, the 10 mbps hub easily gets overloaded with data. While throughput can reach 1 MB/s with 2 nodes, it drops to 0.4 MB/s with 5 nodes and to 0.2 MB/s on 9 nodes. 100 mbps used with full duplex serves well for the cluster; the throughput is nearly the same for any number of nodes. On the 5 and 9 node graph one can observe a drop of throughput when the message size exceeds 128 KB. One can notice such drop also in Figure 1. The reason for this has not been determined; it may be related to the network stack or the library implementation. 3com switch used in 10 mbps half duplex mode resulted in improved throughput, compared to the hub. Cluster throughput did no depend on the amount of nodes. However, two anomalies have been observed. One is in the a part of the graph – the hub seems to give better results than the switch. The second can be seen in parts b and c – there is a drop of throughput when the message size equals 8 KB or 16 KB. Comparing the results seen on part c of Figure 2 one can notice that using a 10 mbps switch resulted in four times higher throughput compared to a 10 mbps hub, while using a 100 mbps full duplex switch gives over 20 times better throughput than a 10 mbps half duplex switch.

176

a) Throughput of 2 node cluster

b) Throughput of 5 node cluster

100

100 5 nodes - 10 mbps hub 5 nodes - 10 mbps switch 5 nodes - 100 mbps switch Throughput in Mbytes/sec

Throughput in Mbytes/sec

2 nodes - 10 mbps hub 2 nodes - 10 mbps switch 2 nodes - 100 mbps switch 10

0.1

0.01

0.1

0.01 1

256

16k

64k 256k 1M

Number of Bytes Transferred

256

16k

64k 256k 1M

Number of Bytes Transferred c) Throughput of 9 node cluster

100

Throughput in Mbytes/sec

9 nodes - 10 mbps hub 9 nodes - 10 mbps switch 9 nodes - 100 mbps switch 10

0.1

0.01 1

256

16k

64k

256k

Number of Bytes Transferred

Fig. 2. Measured throughput of cluster with 10 mbps hub/switch and 100 mbps switch consisting of a) 2 nodes b) 5 nodes c) 9 nodes

3.3.

EXPERIMENT #3 – SWITCHING ALGORITHMS

Design of the experiment In this test the switch interconnecting the nodes was using one of the following switching algorithms: • Store-and-forward • Fragment-free • Fast-forward • Fast-forward with 3 bonded links on the master node Conditions: 8 standard and 1 master node, OpenMPI library, interconnected with 100 mbps switch, master node connected with 1 link. 177

Experiment results

Store-and-forward Fragment-free Fast-forward Master node with 3 links in Fast-forward

24 22 Throughput in Mbytes/sec

20 18 16 14 12 10 8 6 4 2 0 1

64 256 1k 4k 16k 64k 256k 1M Number of Bytes Transferred

Fig. 3. Comparison of clusters interconnected with changing switching algorithm switch

In this experiment the OpenMPI library was used, hence the curve shown in Figure 3 is different from the one in Figure 1. There is, however, a similar drop of throughput for 64 KB gets worst value. Figure 3 also shows that it is better to have a faster connection for the master node. One would expect to see that Fast-forwarding algorithm should give the best results. This, however, was found not to be true for the considered cluster, as shown in Figure 3. Moreover, Store-and-forward shows better results than the other switching techniques considered. 3.4.

EXPERIMENT #4 – MPI IMPLEMENTATION

Design of the experiment In this test the MPI implementation used was one of the following: 178

• Intel MPI, • OpenMPI, • MPIch, • LAM-MPI. Other conditions: Constant number of 8 standard and 1 master node interconnected with 100 mbps switch in intelligent mode, master node connected with 3 bonded links. Intel MPI OpenMPI LAM MPI MPIch

24 22 Throughput in Mbytes/sec

20 18 16 14 12 10 8 6 4 2 0 1

64 256 1k 4k 16k 64k 256k 1M Number of Bytes Transferred

Fig. 4. Comparison of clusters based on different MPI implementations

Experiment results This final experiment investigated how different MPI implementations impact the performance of the cluster. Figure 4 shows that LAM-MPI and Intel MPI achieved the highest throughput, with LAM-MPI being marginally better than Intel MPI for all 179

message sizes considered except 256 KB. OpenMPI produced average results, with 25% loss compared to its rivals in its weakest point, i.e. for message size 64 KB. On the other hand, it was the best performer when used with 512 KB message size. MPIch produced the worst results for all message sizes considered. It is worth mentioning that during the tests none of Intel MPI, OpenMPI or MPIch needed to use swap file or other HDD data. Only LAM-MPI, when testing 512 KB message size, needed to use the PC’s HDD; it did not seem to affect the test result, though.

4. CONCLUSIONS In this paper, the most important factors influencing the performance of Ethernetbased network cluster were evaluated experimentally and discussed. The findings can be summarised as follows: • The most important factor was found to be the linking device used for the interconnection. Upgrading to a better switch can result in 20 times higher throughput. If a high-end cluster is needed, one should consider InfiniBand, MyriNet or other proprietary solution. • The second factor is the heterogeneity of the cluster. One needs to assume that the speed of the cluster depends on the slowest machine. Heterogeneous cluster is ineffective, hence it is better to upgrade all CPUs or memory in the cluster rather than upgrade just a part of it. • It was observed that using an efficient MPI implementation speeds up calculations. MPI implementations are targeted and optimised for specific areas. For example, MPIch performed poorly in the tests considered in this paper, but performed much better on a shared-memory system considered in [7]. • When writing an application using MPI, it is essential to choose the right message size for the cluster. For the considered cluster, this was found to be 16 KB for MPIch, 1MB-4MB for OpenMPI and 128 KB for LAM-MPI and Intel MPI. When the application’s target MPI implementation is not known, it is recommended not to use message sizes over 16k.

180

REFERENCES [1] SPURGEON C.E., Ethernet: The Definitive Guide, O’Reilly Media, Inc., Sebastopol, CA, first edition, 2000. [2] TOP500.Org, Top 500 supercomputing sites., Technical report, http://www.top500.org/, XI 2007. [3] Sun Microsystems, Multithreaded Programming Guide, Sun Microsystems, Inc., Santa Clara, CA, first edition, 2008. [4] BLOOMER J., Power Programming with RPC, O’Reilly and Associates, Inc., Sebastopol, CA, first edition, 1992. [5] Message Passing Interface Forum, Mpi: A message-passing interface standard (version 1.1), Technical report, http://www.mpi-forum.org/, 1995. [6] Message Passing Interface Forum, Mpi-2: Extensions to the message-passing interface, Technical report, http://www.mpi-forum.org/, 2003. [7] VanVOORST B. and SEIDEL S., Comparison of mpi implementations on a shared memory machine, Technical report, 2000. [8] MIGNOT J.C., PASCALE S.G., PRIMET V.B., GLÜCK L.H.O.,Comparison and tuning of mpi implementations in a grid context, Technical report, Université de Lyon, INRIA, LIP (Laboratoire de l’Informatique du Parallélisme), France, 2007. [9] Intel GmbH, Bruehl, Germany, Intel MPI Benchmarks, Users Guide and Methodology Description, 3.1 edition, 2007. [10] International Business Machines Corporation, Technical Information Manual, A20 Type 6269, A40 Types 6568, 6578, 6648, A40p Types 6569, 6579, 6649, 1st edition, 2000.

181

Computer Systems Engineering 2008 Keywords: Intrusion Detection Systems, Data Mining, KDD-CUP 99

Michał ŻACZEK∗ Michał WOŹNIAK∗

APPLYING DATA MINING TO NETWORK INTRUSION DETECTION This paper presents results of software implementation of classifying data mining algorithm – Naive Bayes – for detection of unwanted network traffic. Set of data created for KDD-CUP 99 competition was used to substitute a real network traffic. Series of tests shown significant dependencies of size and contents of training dataset, and algorithm efficiency. The paper points out difficulties in implementation and potential capabilities in data mining for network intrusion detection applications.

1. INTRODUCTION Safety is the key issue for the growth of computer networks and the Internet. Every application that uses network connection may constitute a weakness in safety of a computer system. Comprehensive analysis of all ISO/OSI model layers [1] can be effective protection against the most recent attacks. This task is carried out by special Intrusion Detection Systems, IDS. Use of data mining algorithms [2] is relatively new approach to detection of intrusions. Data mining is a process of automated unveiling previously unknown and potentially useful patterns or schemes from large data repositories [4]. Large data repository can be a network traffic, e.g., IP packets, analysed in real time or intercepted and recorded in logs. Classification is one of data mining methods that is eagerly used for intrusion detection. In classification mapping of data into a set of predefined classes is found. Classification process contains three phases – building a model, testing, and prediction of unknown values. In the first phase a training set of examples is used for input data. Example is a list of descriptive attributes, i.e. descriptors, and a decisive attribute selected to define a class. Classifier ∗

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

182

is the result; basing on values of descriptors, it assigns value of decisive attribute to each new example. To test is to use the classifier to classify known data and evaluate its functioning, whereas prediction involves using the classifier in practice. The system can be used in practice when percentage of correctly classified packets, i.e. classifying accuracy, is satisfactory. When classifying accuracy is too low, an attempt can be made to create classifier with different training datasets. The purpose of this work is software implementation of a classifying algorithm and evaluation of its accuracy for detection of network intrusions. 2. TRAINING DATA AND TEST DATA In the paper, we use existing dataset prepared for KDD-CUP 99 data mining competition [3]. The set was intended especially for training and testing IDS systems that utilize data mining. Sets were divided into training portion with decisive attributes defined, and test portion with decisive attributes to evaluate accuracy of the classifier. Data contain both, common traffic network and typical attacks as well. Convention accepted by KDD-CUP assumes that a single text line (example) is a set of attributes that describe one packet. Each packet has a decisive attribute, whose standard value is “normal”, while in case of an attack is described by the name of the attack. An example of a single packet from the dataset is shown below: 0,tcp,http,SF,334,1684,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,9,0.00,0.00,0.00,0.00,1.00,0.00 ,0.33,0,0,0.00,0.00,0.00,0.00,0.00,0.00, normal. Each coma-separated number (or char string) is single descriptor – a descriptive attribute. Attributes relate to protocol, ports, services, amount of transferred data etc. Specific meaning of each descriptor is not relevant at this point. To simplify the analysis, and to avoid problems with too small number of examples, I decided to remove classes with less than 3000 examples from the training set. In the end I obtained training set that contained 682 462 examples in 5 classes, and test set containing 284 614 examples of the same classes. Table 1 and Table 2 show configuration of sets. 3. CLASSIFICATION ALGORITHM Having certain knowledge on possibilities of classification algorithms [2][3][4], we have decided to implement Naive Bayes in the research as potentially accurate for IDS applications. Mathematically, the algorithm computes probability that certain example belongs to a specified class. Once the probability is known for all classes, a class with

183

the greatest probability is selected and the example is assigned to that class. To calculate probability of an example belonging to a certain class, Naive Bayes: • • • • •

analyses training data to calculate probability of occurrence of each example descriptor value in the class, assumes (“naively”) that occurrence of the given descriptor values are independent events, assigns multiplication product of occurrence of all descriptors values in the example to a occurrence of the given example in the class, takes occurrence of given example in the class as the multiplication product of events, consisting in occurrence of values of all descriptors from the example, uses events independency theorem to calculate probability that example belongs to the class; probability is the product of occurrence of all descriptor values in the example.

Table 1. Configuration of training set.

Table 2. Configuration of test set.

NUMBER OF EXAMPLES

CLASS

NUMBER OF EXAMPLES

“ipsweep”

7579

“ipsweep”

300

“neptune”

204815

“neptune”

58001

“normal”

250000

“normal”

60593

“satan”

5389

“satan”

1630

“smurf”

212363

“smurf”

164090

CLASS

If a given descriptor takes nominal values, probability is defined by frequency of occurrence of the given descriptor value in the class. For example training data contains 100 examples of “normal” class, 50 of which has “tcp” value of the given descriptor, then occurrence probability for this descriptor in “normal” class is 50/100 = ½. However, if a given descriptor can take values from a continuous domain, probability will be calculated with a normal distribution (value derived from the Gauss function). Parameters of a normal distribution – mean value and standard deviation – must be calculated earlier, based on the training data for each numerical descriptor of a given class. 3. IMPLEMENTATION The Naive Bayes algorithm is implemented in C++ and it operates in Windows Vista/XP operating system environment. Classification throughput is reduced when training set increases – for the biggest set (682 462 examples), approximately 4000 examples per hour can be processed. 184

4. RESEARCH At the start, we had the whole training sets and test sets. For such configuration a single test would last for over 71 hours. It was unacceptable, due to many tests were carried out. Moreover, such big training set can not give the best results [4]. The most important purpose of these experiments was to find an optimal training set, i.e. a set with the smallest possible size that produces the greatest classification accuracy. Experiment #1 For the first test series we prepared 5 training sets of different size, and the whole set max. Tables 3 and 4 show configuration of sets. Table 3. Configuration of the training sets – Experiment #1 SET NAME NUMBER OF EXAMPLES

tren_100

tren_1000

tren_2000

tren_5000

tren_10000

max

500

5000

10000

25000

50000

682462

Table 4. Configuration of the test file – Experiment #1 CLASS

“ipsweep”

“neptune”

“normal”

“satan”

“smurf”

NUMBER OF EXAMPLES

300

1000

Table 5. Test results – Experiment #1 TRAINING FILE

tren_100

tren_1000

tren_2000

tren_5000

tren_10000

max

OVERALL CLASSIFICATION ACCURACY

45 %

67 %

76 %

76,23 %

76,28 %

68,5 %

The classification accuracy increases with the increasing amount of training data, but up to a certain size only. For the whole set it is lower than maximal. It is necessary to notice, however, that quantitative ratios of specific examples in the “max” set are different, thus, probably affecting accuracy. Accuracy for “ipsweep”, “normal”, “satan” and “smurf” classes was above 90%, which improves only slightly when bigger training files are used. The classification accuracy for “neptune” class reached 3% for the smallest set and remained unchanged. During analysis of the results file it turned out, that the classifier mistakes “neptune” examples with “satan” class.

185

Experiment #2 We have made an attempt to correct “neptune” class and modify tren_5000 set. We changed number of examples for “neptune” and “satan” classes. Table 6. Configuration of training sets – Experiment #2 TRAINING FILE NUMBER OF EXAMPLES, “neptune” NUMBER OF EXAMPLES, “satan”

tren_A

tren_B

tren_C

tren_D

tren_E

10000

5000

10000

2500

5000

10000

2500

10000

Using the abovementioned sets does not bring any improvement – results are practically the same as those for the unchanged set. The last series of tests investigated accuracy of the classifier for the whole test file (Table 7). Table 7. Results for tren_5000 training file and the whole test set OVERALL RESULT

“ipsweep”

“neptune”

“normal”

“satan”

“smurf”

85,4 %

98 %

29,8 %

99,26 %

73,8 %

100 %

Even though numbers are greater, it is not to be taken for improvement, but rather more precise examination. Those results are the most conclusive classifier evaluation that were obtain. Accuracy tests for tren_5000 file modification do not introduce any significant change. Difficulties in correct distinction of “neptune” and “satan” classes may result from high correlation of descriptor values in examples of those classes. 5. CONCLUSIONS AND PERSPECTIVES Classification using own Naive Bayes implementation gives very promising results. Three out of five examined classes obtained results close to 100%. Let me emphasise excellent results for “normal” class. This is particularly important, since effectiveness in network traffic recognition is significant determinant of IDS system quality. Weaknesses in this field may draw the system into generating false alarms and heavily reduce practical importance of the solution. Additional tests with large datasets and other classes (types of attacks) are necessary before suitability of the algorithm for intrusion detection can be finally evaluated. In the future, Naive Bayes can be utilized for operation in a computer network.

186

REFERENCES [1] BABBIN J., GRAHAM C., OREBAUGH A., PINKARD B., RASH M., IPS. Zapobieganie i aktywne przeciwdziałanie intruzom. Mikom, 2003. [2] FRANK E., WITTEN I. H., Data Mining. Practical Machine Learning Tools and Techniques (second edition). Elsevier, 2005. [3] http://kdd.ics.uci.edu/databases/kddcup99/task.html, October 2007. [4] http://wazniak.mimuw.edu.pl/index.php?title=Eksploracja_danych, October 2007.

187

Computer Systems Engineering 2008 Keywords: linear systems, bilinear systems, generalized predictive control

Ivan Zajı́c† Keith J. BURNHAM†

EXTENSION OF GENERALISED PREDICTIVE CONTROL TO HANDLE SISO BILINEAR SYSTEMS

The paper reviews the existing generalized predictive controller and its extension to handle a class of bilinear systems. The investigation also makes a case for the need for bilinear controllers in contrast to the alternative adaptive linear model-based control approach for nonlinear systems. The paper highlights this justification and demonstrates the potential of adopting the bilinear control approach on a particular example. A Monte Carlo simulation is carried out to evaluate and compare the differences in performances between the investigated approaches.

1. INTRODUCTION The generalized predictive control (GPC) algorithm has had a significant impact in terms of recent developments in control, as the currently widely adopted three term PID controller, when it become a popular choice as an industry standard. It is a popular model-based predictive control (MBPC) method and is being used in industry. The approach was proposed and developed by Clarke et al. during the 1980’s, see [2]. The current well-known GPC utilises a linearised model of the real-word nonlinear system such that it allows for predicting the future values of the system output over a prediction horizon. Linear systems represent a small, but important subset of bilinear systems, and bilinear systems represent an important subset of the wider class of non-linear systems. Many real-world processes can be described using bilinear models. Bilinear systems are characterised by linear behaviour in both state and control when considered separately, with the nonlinearity arising as a product of system state and control [9]. These processes may be found in areas such as engineering, ecology, medicine and socioeconomics. Thus the adoption of bilinear models represents a significant step †

Control Theory and Applications Centre, Coventry University, Coventry, UK

188

towards dealing with practical real-world systems. Prompted by the fact that many processes may be more appropriately modelled as bilinear systems, as opposed to linear systems, the development of bilinear MBPC strategies is justified. The use of bilinear GPC (BGPC) should yield better performance over the use of linear GPC when applied to systems for which a bilinear model is more appropriate. The paper considers the extension of linear GPC to handle a class of single-input single-output (SISO) time invariant bilinear systems, and the development of BGPC algorithm. Several approaches for accommodation of the bilinearity within the BGPC scheme are considered [4, 11]. The focus of the paper will also be directed to the comparison of the GPC within the self-tuning framework, i.e. self-tuning GPC (STGPC), where the linear model is updated at each time step, and BGPC. Monte Carlo simulation studies are presented and comparisons are made.

2. GENERALIZED PREDICTIVE CONTROLLER The idea of GPC is to minimise the variance of the future error between the output and set point by predicting the long range output of the system and separating the known contributions to future output from the unknown contributions. In this way a vector of future predicted errors can be used to generate a vector of future incremental controls, where only the current control input is applied to the plant. The aim is to minimise the multi-stage quadratic cost function defined as, see e.g. [1], i.e. J=

Hp X

[y(t + j) − r(t + j)] +

Hc X

λ[∆u(t + j − 1)]2

(1)

j=1

j=Hm

with respect to current and future values of the differenced control action ∆u(t + j − 1) over a maximum prediction horizon Hp . The differencing operator is defined such that ∆ = 1 − q −1 , where q −1 denotes the backward shift operator defined by q −1 x(t) = x(t − 1) and t is the discrete time index. The minimum prediction horizon is denoted by Hm and is defined such that d 6 Hm 6 Hp , d is a normalised integer valued time delay which relates to the system time delay expressed as an integer multiple of the sampling interval. The Hc denotes the maximum control horizon and in general Hc 6 Hp . During the derivation of the GPC predictive control law Hc = Hp and the rather practical case of Hc 6 Hp will be considered at the end. The y(t + j) denotes the future values of system output, r(t + j) is a sequence of future reference trajectory, where in [1] the weighted sequence of future reference trajectory is assumed instead. The cost weighting parameter λ is the energy constraint. 189

The first term in the quadratic cost function (1) corresponds to the requirement of tracking the reference signal, hence by minimising the sum of future squared tracking errors. The second term in (1) is the weighted sum of the squared differenced control actions, i.e. the energy constraint in order to achieve a realizable control. Note that the future values of the system output are required in (1). The exact values of the future outputs are unknown and only the optimal prediction of these can be obtained utilising the predictor design. In the case of GPC of Clarke [2], the predictor is based on a locally linearised CARIMA (Controlled Auto-Regressive Integrated Moving-Average) model of the system having the following form y(t) =

C(q −1 ) B(q −1 ) u(t − d) + e(t), A(q −1 ) ∆A(q −1 )

(2)

where e(t) denotes white normally distributed measurement noise and polynomials A(q −1 ), B(q −1 ) and C(q −1 ) are defined as follows A(q −1 ) = 1 + a1 q −1 + . . . + ana q na , −1

−1

(3a)

+ . . . + bn b q ,

(3b)

C(q −1 ) = 1 + c1 q −1 + . . . + cnc q nc .

(3c)

B(q

) = b0 + b1 q

For simplicity only the case C(q −1 ) = 1 is considered and the case C(q −1 ) > 1 is investigated in [3], where the inclusion of the C(q −1 ) polynomial increases the robustness of the controller through the better filtering of the measured system output. Note that the CARIMA model structure is chosen for convenience since it allows for elegant inclusion of the integral action to the GPC so that the zero steady state error is guaranteed. In order to minimise the multi-stage cost function (1) the future values of the system output needs to be computed. The model (2) cannot be used directly for computing the future system output since the future values of noise are unknown, hence the predictor is required. Consider Diophantine equation e −1 )Ej (q −1 ) + q −j Fj (q −1 ), 1 = A(q

(4)

e −1 ) = ∆A(q −1 ) and polywhere j = 1, . . . , Hp denotes the prediction, polynomial A(q −1 −1 nomials Ej (q ) and Fj (q ) are defined as Ej (q −1 ) = ej,0 + ej,1 q −1 + . . . + ej,nEj q −nEj , Fj (q

−1

) = fj,0 + fj,1 q

−1

+ . . . + fj,nF j q

−nF j

(5) (6)

where nEj = j − 1 and nF j = nã − 1. The prediction j = 1, . . . , Hm − 1, where Hm = d here, corresponds to a so called simple prediction and is not considered in the 190

design of the GPC control law. The Diophantine equation can be computed recursively, e −1 )Ej (q −1 )q j leads to see e.g. [2, 10]. Multiplying both sides of (2) by A(q e −1 )Ej (q −1 )y(t + j) = Ej (q −1 )B(q −1 )∆u(t + j − d) + Ej (q −1 )e(t + j). A(q

(7)

Making use of (4) the known (current and past) and unknown (future) values of the system output can be separated, so that (1 − q −j Fj (q −1 ))y(t + j) = Ej (q −1 )B(q −1 )∆u(t + j − d) + Ej (q −1 )e(t + j). (8) Since the best prediction (in the sense of minimising the squared prediction error) of the value of the white noise e(t + j) is null, the predictor of the system output is then given by ŷ(t + j|t) = Fj (q −1 )y(t) + Ej (q −1 )B(q −1 )∆u(t + j − d), (9) where ŷ(t + j|t) denotes the predicted system output based on the information available up to and including time t. Note that the second term in predictor (9) consist of known (past) values, i.e. j = 1, . . . , Hm − 1, and unknown (current and future) values, i.e. j = Hm − 1, . . . , Hp , of the differenced control actions which are yet to be determined. These can be separated by utilising a Diophantine equation having the following form Ej (q −1 )B(q −1 ) = Gj (q −1 ) + q −j Gj (q −1 ),

(10)

where the polynomials orders are ng = j − d and ng = d − 2 + nb , respectively, and j = Hm , . . . , Hp so that the simple prediction is not assumed. The predictor (9) then takes the form ŷ(t + j|t) = Fj (q −1 )y(t) + Gj (q −1 )∆u(t − d) + Gj (q −1 )∆u(t + j − d).

(11)

The predictor (11) can be rewritten in more convenient matrix form ŷ = f + Gu,

(12)

where the vectors of predicted outputs and control actions are given by ŷ = [ŷ(t + d|t), ŷ(t + d + 1|t), . . . , ŷ(t + Hp |t)]T

(13)

u = [∆u(t), ∆u(t + 1), . . . , ∆u(t + Hp − d)]T ,

(14)

and

191

respectively. The vector of known contributions to ŷ which forms the free response of the system [8] is given by y(t) (15) f = Fj (q −1 ) Gj (q −1 ) ∆u(t − d) and the Toeplitz lower triangular matrix G is defined as  g0 0 ...  g1 g0 ...  G= .. .. ..  . . .

 0 0   ..  , . 

(16)

gHp −d g(Hp −d)−1 . . . g0

where the leading j subscripts on the elements in G are omitted, since the diagonal (main and minor) elements are the same and not dependent on j. The cost function (1) can be written in matrix form as JGP C = (ŷ − r)T (ŷ − r) + uT λu,

(17)

where the vector of future set points (or reference signal) is defined as r = [r(t + d), r(t + d + 1), . . . , r(t + Hp )]T .

(18)

The next step of deriving the GPC algorithm is to differentiate the cost function (17) with respect to u, i.e. T T ∂ ∂ ∂J = (ŷ − r)T (ŷ − r) + (ŷ − r) (ŷ − r) ∂u ∂u ∂u T T ∂ T ∂ λu + u λu + uT ∂u ∂u T = (ŷ − r)T G + [G]T (ŷ − r) T + uT λ + [I]T λu =2GT (ŷ − r) + 2λu.

(19)

Substituting (12) for the vector of the predicted outputs in (19) leads to ∂J =2GT (f + Gu − r) + 2λu ∂u =2GT (f − r) + 2(GT G + λI)u, 192

(20)

where I denotes an identity matrix of appropriate dimension. The analytical solution of the cost function minimisation is obtained by setting ∂J/∂u = 0, hence GT (f − r) + (GT G + λI)u = 0.

(21)

Rearranging the expression (21) to solve for vector u leads to the GPC algorithm −1 T u = GT G + λI G [r − f ] ,

(22)

u(t) = u(t − 1) + ∆u(t).

(23)

where only the first term of vector u is applied to the plant, so that

Throughout the derivation of the GPC algorithm the control horizon has been set such that Hc = Hp . However, the use of Hc 6 Hp is common in practice, which decreases the computational load. The control horizon is relatively simply implemented by reducing the dimension of the lower triangular matrix G by considering only the first Hc columns of G and the dimension of u is then Hc × 1. The corresponding weighting matrix λI is also required to be suitably truncated. The matrix inversion in (22) for the special case of Hc = 1, reduces to the division by a scalar, which is often used in practice due to ease of computation.

3. SELF-TUNING GPC CONTROLLER Recognition that real-world nonlinear systems exhibit different behaviour over the operating range, and locally linearised models are valid only for small regions about a single operating point, has prompted the desire to extend the MBPC utilising a fixed linearised model to a self-tuning framework. In self-tuning concept the linear models upon which the MBPC are based are required to be repeatedly updated as the system is driven over the operational range of interest. The STGPC algorithm has virtually the same structure as the GPC algorithm given in (22). However, note that the matrix G and vector f in (22) comprise of the model coefficients ai and bi , which are repeatedly updated, hence these must also be repeatedly updated at each time step. The linear model of the system on which the STGPC is based is commonly estimated utilising the recursive least square (RLS) estimation technique, see e.g. [7], or any other appropriate on-line estimation technique might be used instead.

193

4. BILINEAR GPC CONTROLLER The use of bilinear GPC increases the operational range of the controller over the use of the linear model-based GPC when applied to systems for which a bilinear model is more appropriate. It is conjectured that, the bilinear controller makes better use of the dynamics of the system itself compare to his linear counterpart. A general single-input single-output bilinear system can be modelled using a nonlinear auto-regressive moving average with exogenous inputs (NARMAX) model representation, i.e. y(t) =

na X

nb X

−ai y(t − i) + bi u(t − d − i) i=1 i=0 nb X na X

ηi,l y(t − d − i)u(t − d − i − l + 1) +

i=0 l=1

nc X

ci e(t − i),

(24)

i=1

where the ai and bi are assumed to correspond to the linear CARIMA model (2) with the ηi,l being the discrete bilinear coefficients which are required to be identified either on-line or off-line along with the ai and bi [4]. The predictive control law is based on a bilinear model (24), which for the purpose of obtaining an explicit solution to the multi-stage quadratic cost function (1) is interpreted as a time-step quasi-linear model such that the bilinear coefficients are combined with either the ai or bi parameters. The combined parameters are either given by ãi (t) = ai − u(t − d − i)η(i − 1)

(25)

b̃i (t) = bi + y(t − i)η(i).

(26)

or by The decision to accommodate the bilinearity with the ai or bi coefficients depends on a particular control situation and, to some extent, user choice. As a consequence of utilising the bilinear (bilinearised) model for the purpose of predicting the system output the prediction error decreases, hence the BGPC is more effective over the standard GPC. The BGPC algorithm retains the same structure as in the case of GPC (22). However, since the ãi (t) or b̃i (t) coefficients are time varying and input or output dependent, respectively, the Toeplitz lower triangular matrix G and vector f , which comprise of these coefficients, are required to be updated at each time step. Naturally, the BGPC can also be implemented in the self-tuning framework, where linearisation at a point is replaced by that of bilinearisation over a range. This rise in benefits such that the estimated model parameters less vary over the time. In some cases 194

the BGPC utilising a fixed bilinearised model is sufficient enough for describing the system leading to a less complex controller compare to the standard STGPC. In fact, since the ãi (t) or b̃i (t) coefficients are time varying the BGPC can also be interpreted in the self-tuning framework to some extent.

5. SIMULATION STUDIES The system (plant) is represented by a second order single-input single-output autoregressive with external input (ARX) model having additional Hammerstein and bilinear nonlinearities, which takes the form y(t) = − 1.56y(t − 1) + 0.607y(t − 2) + 0.042u(t − 1) + 0.036u(t − 2) − 0.01y(t − 1)u(t − 1) + 0.01u2 (t − 1) + e(t),

(27)

where the noise variance is σe2 = 0.002. The coefficient of the bilinear term is η0 = −0.01 and the coefficient of the Hammerstein term is 0.01. The negative bilinear term is indicative of a system with saturation. Similar structured nonlinear models have been assumed previously for replicating the characteristics of high temperature industrial furnaces, see [6, 5]. The nonlinear system has been chosen to show that bilinear controllers can be used to control nonlinear systems without using the adaptive control approach, leading to the use of less complex and robust controllers. 5.1.

PERFORMANCE CRITERIA

In order to evaluate the control performances of the bench mark GPC and the investigated control schemes several quality performance criteria are introduced. Firstly, the mean square error (MSE) is defined as 2 M N 1 X X r(t) − y(t) , M SE = M N − t0 t=t 1

(28)

where M represents the number of Monte-Carlo runs, N denotes the number of data samples and t0 denotes the start time of the evaluation. Secondly, the mean square control (MSC) performance criterion, representing a control effort, is defined as M

1 X X u2 (t) . M SC = M 1 t=t N − t0 0

195

(29)

The third performance criterion is the activity of the control action, denoted AC given by M N 1 X X |u(t) − u(t − 1)| AC = × 100. (30) M 1 t=t N − t0 0

5.2.

SYSTEM IDENTIFICATION

Three controllers are investigated and compared, which are namely: GPC, STGPC and BGPC. The GPC is based on a second order linearised ARX model of the system (27) given by y(t) = −1.552y(t − 1) + 0.600y(t − 2) + 0.043u(t − 1) + 0.037u(t − 2).

(31)

The linearised model (31) of the system has been estimated off-line using linear least squares (LLS) estimation technique, see e.g. [7], applied to recorded data obtained when the system was simulated in an open-loop setting spanning the expected working points in the operational range. Note the difference between the identified linearised ARX model (31) and CARIMA model (2) used for GPC derivation. This incongruity arises from the estimation technique used and may worsen controller performance. The BGPC is based on the bilinearised model of the system, which is given by y(t) = − 1.552y(t − 1) + 0.600y(t − 2) + 0.043u(t − 1) + 0.037u(t − 2) − 0.006y(t − 1)u(t − 1).

(32)

This has been similarly obtained using LLS as described for the linearised model (31). The STGPC is based on the linear second order ARX model having na = 2, nb = 2 and unity time delay, where the model parameters are estimated on-line utilising the RLS method. 5.3. MONTE CARLO SIMULATION STUDY

A Monte-Carlo simulation study with M = 100 runs, N = 200 samples and t0 = 30, is performed. For all three controllers the tuning parameters are the same, which are Hp = 5, Hc = 1 and the cost weighting parameter λ = 0.1. The BGPC is based on the model (32), where the bilinearity is combined with the ai parameters (25). The system is subjected to a reference signal, switching between ±1 with a period of 50 samples. The results are given in Table 1, where the mean values of MSE, MSC and AC for each controller are presented along with the benchmark comparison expressed in normalised form with respect to the GPC (where indices are all normalised to 100%). The results of a single simulation for a particular noise realisation corresponding to the benchmark 196

GPC and the STGPC are shown in Figure 1 and the benchmark GPC and the BGPC are shown in Figure 2. Table 1. Results of a numerical simulation along with a benchmark comparison between the GPC controller (where its values represent 100%) and the two investigated control schemes.

Controller GPC STGPC BGPC

M SE 0.0586 0.0564 0.0523

M SC 1.0021 0.9361 0.8134

AC 0.1262 0.1276 0.1161

M SE [%] 100.00 96.136 89.271

M SC [%] 100.00 93.415 81.168

AC [%] 100.00 101.09 92.025

Output signal 1.5 GPC STGPC r(t)

y(t)

0.5 0 −0.5 −1 −1.5

100

120

140

160

180

200

140

160

180

200

Control action 2

u(t)

0 −2 −4 40

100

120 Time

Fig. 1. Simulation of GPC and STGPC controller

5.4. OBSERVATIONS

The results given in Table 1 show the superior performance of the BGPC over the standard GPC for this particular case. The tracking ability improves by 11% and the 197

Output signal 1.5 GPC BGPC r(t)

y(t)

0.5 0 −0.5 −1 −1.5

100

120

140

160

180

200

140

160

180

200

Control action 2

u(t)

0 −2 −4 40

100

120 Time

Fig. 2. Simulation of GPC and BGPC controller

control effort decreases by 19%. The STGPC provides moderate improvement over the GPC. It is noted, however, that for a lower demand on the tracking accuracy (slow control), e.g. Hp = 10, Hc = 1 and λ = 0.2, the three investigated controllers perform in an almost undistinguishable manner. It is anticipated here that the high control activity of the STGPC is caused by the interaction of the parameter estimation and control part of the controller algorithm as well as the noise sensitivity of the parameter estimator.

6. CONCLUSIONS The results obtained highlight the benefits of adopting a bilinear MBPC approach over standard linear MBPC approaches. The BGPC is able to achieve its objective through the effective automatic gain scheduling via the nonlinear (bilinear) controller model structure so that the complexity of the controller decreases compare to the selftuning schemes. It is postulated here that in the cases, where the fast and tight control 198

is required the bilinear controller makes better use of the system dynamics compare to his linear counterpart. It is conjectured that, in the case when the set point is required to change over a wide operational range, and/or where the system may change over time, a self-tuning form of the BGPC should be beneficial.

REFERENCES [1] CAMACHO E. F. and BORDONS C., Model Predictive Control. Springer-Verlag, London, 1998. [2] CLARKE D. W., MOHTADI C. and TUFFS P. C., Generalized Predictive Control: Parts I and II. Automatica, vol. 23, 1987, pp. 137–160. [3] CLARKE D. W. and MOHTADI C., Properties of generalized predictive control. Automatica, vol. 25, 1989, pp. 859–875. [4] DUNOYER A., Bilinear self-tuning control and bilinearisation with application to nonlinear industrial systems. PhD thesis, Coventry University, UK, 1996. [5] DUNOYER A., BURNHAM K. J. and MCALPINE T. S., Self-tuning control of an industrial pilot-scale reheating furnace: Design principles and application of a bilinear approach. Proceedings of the IEE Control Theory and Applications, vol. 144(1), 1997, pp. 25–31. [6] GOOTHART S. G., BURNHAM K. J. and JAMES D. J. G., Bilinear self-tuning control of a high temperature heat treatment plant. Proceedings of the IEE Control Theory and Applications, vol. 141(1), 1994, pp. 12–18. [7] LJUNG L., System Identification - Theory for the user. Prentice Hall PTR, New Jersey, 1999. [8] MACIEJOWSKI J. M., Predictive control with constraints. Pearson education limited, Edunburg Gate, 2002. [9] MOHLER R. R., Bilinear control processes: with applications to engineering, ecology and medicine. Academic Press Inc.,U.S.,Orlando, FL, USA,1974. [10] NAJIM K. and IKONEN E., Advanced Process Identification and Control. Marcel Dekker, Inc., 2002. [11] VINSONNEAU B., Development of errors-in-variables filtering and identification techniques: towards nonlinear models for real-world systems incorporating a priori knowledge. PhD thesis, Coventry University, UK, 2007.

199

Computer Systems Engineering 2009 Keywords: scheduling, learning effect, heuristic

Tomasz CZYŻ∗ Radosław RUDEK†

SCHEDULING JOBS ON AN ADAPTIVE PROCESSOR

This paper is devoted to a scheduling problem, where the efficiency of a processor increases due to its learning. Such problems model real-life settings that occur in the presence of a human learning (industry, manufacturing, management). However, the increasing growth of significant achievements in the field of artificial intelligence and machine learning is a premise that the human-like learning will be present in mechanized industrial processes that are controlled or performed by machines as well as in multi-agent computer systems. Therefore, the optimization algorithms dedicated in this paper for scheduling problems with learning are not only the answer for present day scheduling problems (where human plays important role), but they are also a step forward to the improvement of self-learning and adapting systems that undeniably will occur in a new future.

1. INTRODUCTION The classical scheduling problems consider that job parameters such as processing times are constant. However, in many industrial and even in multi-agent systems, the efficiency of a machine or a processor increases due to learning. Therefore, to solve efficiently scheduling problems that occur in such environments, it is required to model this phenomenon and on this basis design algorithms. In the scientific literature there are two main approaches to model the learning effect in the context of the scheduling theory. The first one assumes that the processing time of each job is described by a non-increasing function dependent on the number of performed products [1]. It follows from many observations and analysis carried out in economy and industry during the last few decades (see [5], [6], [10], [11]). The second approach to model the learning phenomenon in scheduling problems is based on the observation that more time a human spends on performing a job then more he learns. Therefore, the job processing time is a non-increasing function of the sum of the normal processing times of previously ∗ †

Wrocław University of Technology, Poland. Wrocław University of Economics, Poland.

200

performed jobs (see [8]. Whereas the first approach models better problems, where a production is dominated by machines and a human activity is limited (e.g., to setup a machine), the second approach models better human-learning. For a survey see [2]. In this paper, we analyse the total weighted completion time scheduling problem, where the learning effect (i.e., the job processing time) is modelled according to the second approach. Since the computational complexity of the considered problem is not determined, we propose some approximation algorithms that are based on metaheuristic methods such as simulated annealing [7] and tabu-search [3]. The remainder of this paper is organized as follows. The problem formulation is presented in the next section. The description of the proposed algorithms with the numerical verification of their efficiency are given subsequently. The last section concludes the paper.

2. PROBLEM FORMULATION There is given a single processor and a set J = {1, . . . , n} of n independent and nonpreemptive jobs (e.g., products, packets, calculations) that are available for processing at time 0. The processor can perform one job at a time, and there are no precedence constraints between jobs. Each job j is described by the weight wj and the processing time pj (v) of job j if it is scheduled as the vth in a sequence, this parameter models the learning effect. If π = π(1), ..., π(i), ..., π(n) denotes the sequence of jobs (permutation of the elements of the set J), where π(i) is the job processed in position i in this sequence, then the processing time pπ(i) (i) of a job scheduled in the ith position in the sequence π is described by the following non-increasing function:  aπ(i),1 , gπ(i),0 = 0 ≤ e(i) < gπ(i),1     aπ(i),2 , gπ(i),1 ≤ e(i) < gπ(i),2 , pπ(i) (i) = ..  .    aπ(i),k , gπ(i),k−1 ≤ e(i) < gπ(i),k

where aπ(i),l is the processing time of job π(i) if the sum of the normal processing times P of previous jobs e(i) = i−1 l=1 aπ(l),1 is between gπ(i),l−1 ≤ e(i) < gπ(i),l , where gπ(i),l is the lth threshold of job π(i). For the given schedule π, we can determine the completion time Cπ(i) of a job placed

201

in the ith position in π as follows: Cπ(i) =

i X

pπ(i) (i),

(1)

l=1

where Cπ(0) = 0. The objective is to find such a schedule π of jobs on the adaptive processor, which minimizes the total weighted completion time criterion: T W C(π) =

n X

wπ(i) Cπ(i) .

(2)

i=1

To denote the problem, we use the three field notation scheme X | Y | Z (see [4]), where X describes the processors, Y contains job characteristics P and Z is the criterion. The considered problem will be denoted as follows 1|LE| wj Cj .

3. ALGORITHMS The considered problem seams to be NP-hard. Based on this observation, we use metaheuristic algorithm to solve it. Before we describe the algorithms, we present an approach to calculate job processing times in O(1). Note that for each possible value P of e(i) ∈ [0, nj=1 aj,1 ], we can calculate corresponding job processing times aj,l only once for the given instance, and store it in the following array (Table 1). Table 1. Array of job processing time values Pn job\e(i) 0 1 · · · i=1 aπ(i),1 1 2 .. . n

Hence, the job processing times are calculated in O(1). On this basis, we propose two metaheuristic algorithms simulated annealing (SA) and tabu search (TS). 3.1.

SIMULATED ANNEALING

The general idea of the implemented simulated annealing algorithm (SA), following [7], is given. The algorithm starts with an initial solution and based on the current solution π it chooses (in each iteration) the next solution π ′ by swap two randomly chosen 202

jobs. The new solution replaces the current solution with the following probability ( ) P (π, π ′ , T ) = min 1, exp T W C(π ′ ) − T W C(π ′ ) /T , where T is the current temperature. The initial value of the temperature is T0 and it decreases till it reaches the stop temperature TN . When the temperature T reaches TN it is reset to T0 and the temperature decreasing process starts aging. Two temperature T and logarithmic T = λT , decreasing models are considered: geometrical T = 1+λT where λ is the temperature step. During the initial experiments geometrical model was rejected because of great relative errors comparing to the logarithmic decreasing. The algorithm stops after the given number of iterations N . Therefore, the complexity of SA is O(nN ). The formal description of SA is given below. Algorithm 1 SA 1: T = T0 , π = π ∗ = πinitial , T W C ∗ = T W C(πinitial ) 2: F O R i = 1 T O N 3: C H O O S E π′ B Y A R A N D O M S W A P O F T W O J O B S I N π 4: C A L C U L A T E T W C(π) A N D T W C(π ′ ) A C C O R D I N G T O (2) 5: A S S I G N π = π′ W I T H P R O B A B I L I T Y ) ( ′ W C(π) P (T, π ′ , π) = min 1, exp − T W C(π )−T T 6: I F T W C(π) < T W C ∗ T H E N π ∗ = π A N D T W C ∗ = T W C(π) T 7: T = 1+λT 8: I F T ≤ TN T H E N T = T0 9: T H E P E R M U T A T I O N π ∗ I S T H E G I V E N S O L U T I O N

3.2.

TABU SEARCH

The proposed algorithm is based on the tabu search [3]. Its computational complexity is O(n3 N ), where N is the number of iterations. The algorithm uses local search with a short term memory, called tabu list, that stores forbidden moves. In the implemented algorithm move is defined as the swap of two jobs. The applied tabu list stores pairs of forbidden moves or permutations. If the move or permutation is in the tabu list then it is forbidden and not considered further. The tabu list is organized as FIFO (First In First Out), thereby, if the list is full then the new move or permutation is added at its beginning. The size of tabu list is denoted as |T abuList|. We also use a random diversification that 203

chooses a random solution a counter reach the value of a diversification parameter D. The counter is increased when the next move gives worse criterion than the current, and decreased when the new is better. The formal description of TS is given below. Algorithm 2 TS 1: T abuList = ∅, π = πbest = π ∗ = πinitial , counter = 0 T W Cprevious = T W Cbest = T W C ∗ = T W C(π ∗ ) 2: F O R i = 1 T O N FO R j = n T O 1 3: 4: FO R v = n T O 1 5: π ′ = π, S W A P π ′ (j) A N D π ′ (v) I N π ′ 6: I F j 6= v A N D T W C(π ′ ) < T W Cbest 7: I F (j, v) I S N O T I N T abuList 8: πbest = π ′ , T W Cbest = T W C(π ′ ), jbest = j, vbest = v 9: A S S I G N π = πbest 10: A D D (jbest , vbest ) T O T abuList 11: I F T W Cbest < T W C ∗ 12: π ∗ = πbest , T W C ∗ = T W Cbest 12: I F T W Cbest < T W Cprevious T H E N counter = counter − 1 13: EL S E counter = counter + 1 12: I F counter == D T H E N C H O O S E π R A N D O M L Y 12: T W Cprevious = T W Cbest 13: T H E P E R M U T A T I O N π ∗ I S T H E B E S T F O U N D S O L U T I O N

3.3.

EXPERIMENT

In this section, we will verify numerically the efficiency of the proposed algorithms. For each n ∈ {10, 25, 50, 75, 100} parameters of jobs were generated from the uniform distribution in the following ranges: w, k ∈ [1, . . . , 10], aπ(i),1 ∈P[1, . . . , 500], aπ(i),k ∈ [1, . . . , 10] and aπ(i),l > aπ(i),j for l < j and gπ(i),k ∈ [1 · · · nx=1 aπ(i),1 ], gπ(i),0 = 0 and gπ(i),l > gπ(i),j for l < j. Algorithms were tested for 100 instances for each n. After testing several combinations of parameters the following parameters of the al204

gorithms were chosen (see Table 2 and Table 3). The initial solution for each the algorithm was a random permutation. Table 2. Simulated Annealing variants start temperature T0 stop temperature TN temperature step λ iterations

SA1 50 000 000 0.001 0.999 10 000

SA2 1 000 000 000 0.001 0.999 10 000

Table 3. Tabu Search variants tabu list block type tabu list length |T abuList| diversification parameter D iterations N

T S1 move 20 6 100

T S2 permutation 20 10 100

Each algorithm was evaluated according to the relative error δ = (T W CA − T W Cmin )/T W Cmin , where T W CA is a criterion value provided by the algorithm A ∈ {SA1 , SA2 , T S1 , T S2 } and T W Cmin is the best found criterion value for the given instance among tested algorithms (for n = 10 it is an optimal solution provided by the extensive search algorithm). The results of minimum δmin , mean δ and maximum δmax relative errors and mean running times t are presented in Table 4. The proposed algorithms have short running times and provides solutions with low relative errors. However, SA is more efficient for the considered problem. It provides results with lower mean and maximum relative errors in shorter time than TS. On the other hand, TS can give better results, but the computational effort would be much greater than for SA.

4. CONCLUSIONS In this paper, the single processor scheduling problem with the learning effect was considered. Since the problem seems to be NP-hard, we proposed two metaheuristic algorithms that are based on simulated annealing and tabu search methods. The experiments showed that they provide results with low relative errors in a short time. Thus, they can be applied for the considered problem. Our future works will be devoted to 205

Table 4. The minimum δmin , mean δ and maximum δmax relative errors and mean running times t of the algorithms

SA1

SA2

T S1

T S2

n t[s] δ[%] δmax [%] δmin [%] t[s] δ[%] δmax [%] δmin [%] t[s] δ[%] δmax [%] δmin [%] t[s] δ[%] δmax [%] δmin [%]

10 0.14 0.00 0.00 0.00 0.14 0.00 0.00 0.00 0.01 0.45 16.87 0.00 0.01 0.22 8.42 0.00

25 0.51 0.07 1.31 0.00 0.50 0.04 1.26 0.00 0.28 0.60 11.37 0.00 0.29 0.82 7.48 0.00

50 1.90 0.25 1.37 0.00 1.87 0.20 2.13 0.00 4.43 1.68 13.02 0.00 4.50 1.61 13.64 0.00

75 4.90 0.64 3.75 0.00 4.87 0.70 4.16 0.00 26.43 1.83 15.08 0.00 26.72 1.84 9.85 0.00

100 8.70 0.64 2.64 0.00 8.47 0.92 3.08 0.00 83.35 3.43 13.82 0.00 83.51 2.38 18.72 0.00

minimize the computational complexity of tabu search by decreasing the complexity of neighborhood searching.

REFERENCES [1] BISKUP D., Single–machine scheduling with learning considerations. European Journal of Operational Research, vol.115, pp.173–178, 1999. [2] BISKUP D., A state-of-the-art review on scheduling with learning effects. Journal of Operational Research, vol.188, pp.315–329, 2008.

European

[3] GLOVER F., Tabu Search - Part I. ORSA Journal on Computing, Vol. 1, No. 3, pp.190206, 1989. [4] GRAHAM R. L., LAWLER E. L., LENSTRA J. K. and RINNOOY KAN A. H. G., Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of Discrete Mathematics, vol.5, pp.287–326, 1979. [5] JABER Y. M. and BONNEY M., The economic manufacture/order quantity (EMQ/EOQ) and the learning curve: Past, present, and future. International Journal of Production Economics, vol.59, pp.93–102, 1999. [6] KERZNER H., Project management: a system approach to planning, scheduling, and controlling. John Wiley & Sons, Inc., New York, 1998. 206

[7] KIRKPATRICK S., GELATT C. D., and VECCHI M. P., Optimization by simulated annealing. Science, vol.220, pp.671–680, 1983. [8] KUO W.H. and YANG D.L., Single-machine group scheduling with a time-dependent learning effect. Computers & Operations Research, vol.33, pp.2099–2112, 2006. [9] NAWAZ M., ENSCORE JR E. E., and HAM I. A., A heuristic algorithm for m-machine, n-jobs Flow-shop sequencing problem. OMEGA International Journal of Management Science, vol.11, pp.91–95, 1983. [10] WRIGHT T. P., Factors affecting the cost of airplanes. Journal of Aeronautical Sciences, vol.3, pp.122–128, 1936. [11] YELLE L. E., The learning curve: historical review and comprehensive study. Decision Science, vol. 10, pp.302–328, 1979.

207

Computer Systems Engineering 2009 Keywords:metaheuristics, task allocation, simulation, efficiency, Tabu Search, Simulated Annealing

Wojciech KMIECIK* Marek WÓJCIKOWSKI* Andrzej KASPRZAK* Leszek KOSZAŁKA*

TASK ALLOCATION IN MESH CONNECTED PROCESSORS USING LOCAL SEARCH METAHEURISTIC ALGORITHM This article contains a short analysis of applying three metaheuristic local search algorithms to solve the problem of allocating two-dimensional tasks on a two-dimensional processor mesh. The primary goal is to maximize the level of mesh utilization. To achieve this task we adapted three algorithms: Tabu Search, Simulated Annealing and Random Search, as well as created an auxiliary algorithm Dumb Fit and adapted another auxiliary algorithm First Fit. To measure the algorithms’ efficiency we introduced two evaluating criteria called Cumulative Effectiveness and Utilization Factor. Finally, we implemented an experimentation system to test these algorithms on different sets of tasks to allocate.

INTRODUCTION

Recently, processing with many parallel units has gained on popularity. Parallel processing is applied in various environments, ranging from multimedia home devices to very complex machine clusters used in research institutions. In all these cases, efficiency depends on a wise task allocation [1], enabling the user to utilize the power of a highly parallel system. Research has shown, that in most cases, parallel processing units give only a fraction of their theoretical computing power [2] (which is a multiplication of the potential of a single unit used in the system). One of the reasons for this is high complexity of task allocation on parallel units. Metaheuristic algorithms have been invented to solve a subset of problems, for which finding an optimal solution is impossible or far too complex for contemporary computers. Algorithms like Tabu Search [3], invented by Fred Glover or Simulated *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

208

Annealing [4], [5], [6] by S. Kirkpatrick are among the most popular. They are capable of finding near-optimum solutions for a very wide range of problems in a time incomparably shorter than the time that it would take to find the best solution [7]. It was decided to adapt three algorithms for solving the allocation problem: Tabu Search, Simulated Annealing and a simplified local search metaheuristic – Random Search used for comparison. In our approach, for comparison purposes, we use also an existing solution for task allocation on processor meshes – the First Fit algorithm. We also designed a modification of First Fit, called Dumb Fit which fits better for this role. First Fit is also used to generate results that we use as a reference when examining the efficiency of the main three algorithms. We propose a new evaluating function called Cumulative Effectiveness. The function and its derivative called Utilization Factor are further explained in following sections of the article. To examine our solutions’ efficiency in different conditions (mesh sizes, task sizes, task processing times etc.) we implemented an experimentation system. Next sections of the article contain what follows: Section II specifies the problem to be solved, Section III describes used algorithms and their roles, Section IV describes the experimentation system. Section V contains an analysis of results of series of experiments on three task classes: small tasks, mixed tasks and large tasks. Finally, Section VI contains conclusions and sums up the article. 2.

PROBLEM STATEMENT

A. Definitions

1. A node is the most basic element which represents a processor in a processor mesh. It is a unit of the dimensions of a mesh, submesh or task. A node can be busy or free. 2. A processor mesh, which thereafter will be simply referred to as ‘mesh’, is a 2-D rectangular structure of nodes distributed regularly on a grid. It can be denoted as M (w, h, t), where w and h are the width and height of the mesh and t is the time of mesh’s life. The value of t may be zero or non-zero. A zero value means that the mesh will be active until the last task from the queue is processed. This value also determines the choice of evaluating function, which will be further explained later in this article. 3. A position (x, y) within a mesh M refers to the node positioned in the column x and row y of the mesh, counting from left to right and top to bottom, starting with 1. 4. A submesh S is a rectangular segment of a mesh M, i.e. a group of nodes, defined in a certain moment of time, denoted as S(a, b, e, j) with its top left node in the position (a, b) in the mesh M, and of width e and height j. This entity, has only symbolic value; it is used in this article to describe various conditions but is not anyhow separately 209

implemented in the software product. If a submesh is occupied, it means that all its nodes are busy. A mesh in a certain moment of time – M(w, h, t1), can be depicted as a matrix of integers, where each number corresponds to a node. Zero can be denoted as a dot (.) and it means a free node. Non-zero numbers (same for a submesh processing one allocated task) indicate a busy node, their value is the time left to process the task. Such depiction is portrayed in fig. 1. There, we can see four various tasks allocated on a small mesh.

Fig.1. A sample depiction of a mesh with 4 allocated tasks in a moment of time

5. Tasks, denoted T(p, q, s), are stored in a list. The entire content of the list is known before the allocation. Tasks are taken from the list and allocated on a mesh. There, they occupy a submesh S of width p height q for s units of time (thus s is their processing time). B. Evaluating functions and lifetime of mesh

The main evaluating function proposed in this paper is Cumulative Effectiveness (CE) and is given in Equation (1). It is used when there is a non-zero time of life defined for a mesh. Knowing it and the parameters of the used mesh we can count a more self-descriptive factor, i.e. the Usage Factor given in Equation (2). In (1) pi, qi and si denote width, height and processing time, respectively, of the i-th of n processed tasks. In (2) w, h, t denote width, height and time of life, respectively, of the used mesh. n

CE = ∑ ( p i ⋅ q i ⋅ si )

(1)

CE ⋅ 100% w⋅h ⋅t

(2)

i =1

U =

A task, as well as a mesh can be treated as 3-D entities when we assume that time is the third dimension. Then CE function is the cumulative volume of all allocated tasks and U is the percentage of mesh’s volume used by the processed tasks. It allows us to

210

easily determine how much of the mesh’s potential was “wasted” and how much was utilized. The creation of the CE function and derivative U factor is based on assumption that a company using a processor mesh has a set of tasks to process on their equipment, which exceeds the number of tasks possible to process in one atomic period of time (mesh’s lifetime, e.g. a day), in the beginning of which a single allocation process is conducted. In such case it is essential to utilize as much of the mesh’s power as possible. However, there is also another approach in which it is assumed that the time of life of the mesh is unlimited. In such case it is desired to process all tasks in the list as soon as possible. In such case mesh’s lifetime is set to zero (which here means infinity). Then the evaluating function is the Time of Completion given in Equation (3). In (3) tfin denotes the moment of time, since the start of processing, when the last of all tasks has been processed.

T = t fin

(3)

This factor can only be used for comparing algorithms, not for objectively evaluating their efficiency. The main advantage of T factor is shorter simulation time. Nevertheless, the Usage Factor is recommended, because of its objectiveness, and is mainly used in our research, described in further sections of this article. 3.

ALGORITHMS

A. General information

In our simulation software, we implemented 3 main metaheuristic local search algorithms: SA – Simulated Annealing explained in [4] [5] [6] [7], TS – Tabu Search explained in [3] [7], RS – Random Search (not to be confused with simple evaluating a random solution), explained in [7]. All of them work for a number of iterations. In each iteration they operate on a single solution and its neighbourhood and evaluate the results. A solution is defined here as a permutation of tasks to be allocated, stored in a list. Such permutation is evaluated by performing a simulation, using one of two atomic algorithms (First Fit and Dumb Fit) and, based on simulation result, computing one of the evaluation functions, explained above. There are also various kinds of neighbourhood to be explored by the algorithms. We implemented two of them: insert and swap. In case of the first one, a neighbouring solution is found by taking one element of the permutation and putting it in some other position. In case of swap, two elements are taken and their positions are swapped (hence the name). Performance of each of the three main algorithms highly depends on 211

the instance of the problem (mesh’s and task’s dimensions and life/processing times) as well as on algorithms’ specific parameters, and the atomic algorithms used.

Fig. 2. Block diagrams for the Dumb Fit and First Fit algorithms respectively

212

B. Random Search (RS)

RS is the simplest local search algorithm. The algorithm starts from a solution and in each iteration, it finds and evaluates a new solution from the neighbourhood of the current one. In the next iteration, the new solution becomes the current one and the process continues. In RS there are no additional parameters except for the number of iterations. This algorithm is highly resistant to local minima. C. Simulated Annealing (SA)

Main parameters of SA are starting and ending temperatures. During the course of its operation the temperature drops (logarithmically or geometrically). In each iteration, a random solution from the neighbourhood of the current one is found and evaluated. When the temperature is high there is high probability to accept the new solution as the current one, even if it is worse. When the temperature is low only these solutions are accepted which are better than the current one. Such approach makes this algorithm resistant to local minima in the beginning and improving a current solution in the end, going down to the nearest minimum. D. Tabu Search

Our implementation of the TS algorithm is similar to the SA algorithm with low temperatures, except for the fact that it does not accept a new solution as the current one, if the same solution is found in the taboo list. Whenever a new current solution is set it is added to the taboo list. The taboo list has limited length which is the main parameter of the algorithm. This algorithm is forced to leave the vicinity of a local minimum. The vicinity is limited by the length of the taboo list. At the same time TS tries to precisely improve a current solution. It also returns the best solution found during its operation. E. Atomic functions

The atomic algorithms that we use are Dumb Fit (DF) and First Fit (FF), their block diagrams are shown in fig. 2. During the operation of DF or FF algorithm, the appropriate evaluating function value is calculated and is passed to the one of the 3 main algorithms that is currently working, allowing it to proceed to another iteration. 4.

EXPERIMENTATION SYSTEM

We aimed to design as versatile simulation environment as possible, to be able to evaluate all combinations of parameters for various problem instances. As a result, we have developed a console application, written in the C++ language, with various functionalities to read experiment parameters and to write the results. The program has 213

been developed for Microsoft Windows OS and has two main modes of operation: command line mode and menu mode. A. Input

Generally, in all modes of operation, the software allows the user to set certain input parameters. First group of parameters defines the problem. It allows the user to choose ranges of dimensions (p, q) and processing times (s) for the tasks and the task-list length. The user can also define the size and lifetime of the mesh: w, h, t. All the parameters from the first group allow the program to randomly create a task-list and define a mesh, which, together, form a problem instance. The other group of parameters varies and consists of specific parameters of the chosen algorithm, like: number of iterations, starting and ending temperatures for SA, temperature profile for SA, tabu-list length for TS etc. Specifying both groups of parameters makes it possible to solve a predefined problem with a chosen, custom configured algorithm. B. Menu mode

When the software is run without any parameters in command line, it goes into the menu mode. It features a main menu consisting of options giving three main modes of operation. First mode involves tracing a single simulation for a given task-list and a given mesh, using a chosen atomic algorithm. It shows the simulation graphically, step by step and shows the value of evaluation function, which helps to understand the way of evaluating a solution with a given algorithm. The second way of using the program in menu mode is providing a file with predefined test series, a method which will be described in the command line mode subsection. The third way to use the menu mode, is performing a single experiment with a chosen algorithm. It allows the user to specify all the parameters without using an input file and watch the algorithm operation, i.e. it’s progress and current evaluation function value are shown. The menu mode would not be suitable for performing a series of experiments for a major research but is quite convenient for calibrating the parameters and designing a test series. C. Command line mode

This mode is the preferred one for running a series of experiments for a certain research. It allows the user to specify, as a command line parameter, a file with a predesigned test series. Such file begins with a set of parameters defining the problem instance. The task-list is generated only once and same problem instance is used for the whole test series defined in the file. Also the number of repetitions for each test can be specified. Any number of any kind of tests for a certain problem instance can be defined in an input file. When using the command line mode, the user can create a 214

batch file (.bat) for running a series of series of tests (a series of program executions for more than one input file). D. Output files

For each experiment (single execution of a main algorithm), in both execution modes, an output file is generated. It contains some specific input parameters set for the chosen algorithm and, most importantly, lines showing the current and best evaluation function value for each iteration. Such data can subsequently be analysed with appropriate software. 5.

INVESTIGATIONS AND DISCUSSION

Using our simulation software, we conducted a series of experiments and, in the course of the process, we noticed that the analysed problem instances should be categorized into three groups: tasks relatively small (compared with the mesh size), tasks relatively large and mixed tasks (small and large). For each of the groups, used algorithms behaved differently, so we designed three corresponding test series that are analysed in the following subsections. In each test, FF’s result is used as a reference. Evaluating algorithm means the algorithm used for each solution evaluation by the main algorithms. Initiating algorithm means the algorithm used to generate the initial solution. Table 1 summarises input values for all three test series. A. Mixed tasks and general observations

In this case almost all tests were performed for 20000 iterations (except for a few with 5000 iterations) for all main algorithms for the same task set. For tests in which FF was the evaluating algorithm, which significantly increases evaluation time, 5000 iterations were tested. The aim was to keep all algorithms running for about 100 seconds. Each test was repeated 3 times and mean values are used below, unless it is specified otherwise. The evaluating function used was CE which allowed to calculate the Usage Factor. There are several observations that emerged after analysis of the results: 1.

On average, the SA algorithm was the best performer (fig. 3.).

2. The main factor affecting the effectiveness of SA was the starting temperature. This is illustrated in fig. 3. 3. It is a good idea to use FF as an initialisation algorithm for the main algorithms, since an increase of the optimization time is marginal. This, however, does not apply to SA: mean U value for SA (T0=300, swap neighbourhood) starting from a random permutation was 83.68% and for the same settings but starting from the permutation generated by FF it was 82.61%. This is probably due to the fact that the FF 215

algorithm can put the SA starting position in a wide local minimum that the algorithm is unable to leave. 4. For all algorithms, swap neighbourhood gave better results than insert neighbourhood (about 3% difference on average in terms of U factor). 5. For the TS algorithm, taboo lists of lengths around 1000 performed marginally better than for lengths around 100 (U was 79.8% and 78.8%, respectively). 6. If the FF algorithm is used during the work of the main algorithms for evaluation, it performs well but increases the processing time significantly. After lowering the iteration count it gives comparable results to the configurations with 20000 iterations and DF algorithm. Table 1.Input for all test series Mixed tasks

Test Small tasks

Large tasks

2÷12

2÷10

4÷12

2÷12

2÷10

4÷12

2÷12

2÷10

4÷12

h t (task-list length when t=0) tested algorithms evaluating algorithms initiating algorithms evaluating function neighbourhood

50 0 (1000 tasks)

Parameter

1000

SA, TS, RS

DF, FF

swap, insert

Fig. 3 shows SA results for various starting temperatures for the best found configuration, i.e. the starting permutation being random, DF as the evaluating algorithm, swap neighbourhood, 0.01 final temperature and geometrical temperature profile. The chart in Figure 3 shows that, for this configuration and problem instance, the best starting temperature of SA is around 300. The best value of the U factor achieved by the SA algorithm for 20000 iterations was 83.68%.

216

85 84 Usage factor, U (%)

83 82 81

77 76 75 0

500

1000

1500

2000

2500

max temp -SA

Fig. 3. Performance of the SA algorithm for different starting temp vs. other algorithms

Fig. 4. Values of current and best results for SA through iterations (small tasks)

The best algorithm outperformed the FF result in its 4522nd iteration. The values of the current and the best results for each iteration, taken from the result file of the best SA passage, are shown in fig. 4. The chart shows that in the beginning, the current result tends to be lower than the best one, but then it starts to “stick” to the best result as with the temperature fall, the algorithm acts similarly to Descending Search and less like Random Search. It is also visible when the best and current results surpass the value found by FF. Results obtained: •

best result: SA, swap, evaluation function DF, starting temp. 300, geometrical profile: U=83.68%,

•

difference between the best result and FF: 10% of the FF’s result.

B. Small tasks

In this case we decided to use the second evaluating factor – T. It is less objective than the first one but still allows comparing the algorithms and gives much better 217

ability to spare experimentation time. It is so, because for a large mesh and small tasks it would require to process a large list of tasks. This would make a series of experiments unreasonably long to conduct. These experiments resulted in a conclusion that using metaheuristic algorithms for allocating small tasks is not needed and does not improve the systems efficiency. Tasks are small enough, in comparison to the size of the mesh, that the FF algorithm manages to fit a task from the list into almost every free submesh. Therefore, even metaheuristics basing on FF’s result cannot achieve any better result; an example plot is shown in fig. 5. Furthermore, due to semi-random characteristics of the tested metaheuristics, ones that started from a random solution and did not use FF for evaluation, gave even worse results, e.g. SA, in such case, gave a result of T=124. Results obtained: •

best result: SA/TS/RS: T=98,

•

difference between the best result and FF: 0.

160 140 120

time, T

100

Current Best/FF

80 60 40 20 0 0

5000

10000

15000

20000

25000

iterations

Fig. 5.Values of current and best results for SA through iterations, for small tasks

C. Large tasks

In this case, achieved results and algorithm’s behaviour were very similar to the general case of mixed tasks (we also used the same scheme of testing as then). Achieved result of the best algorithm was even better, but only by a small margin. Also, as in the case of mixed tasks, SA algorithm was the best performer and the same parameters as previously led to maximum performance. Results obtained: • •

best result: SA, swap, evaluation function DF, starting temp. 300, geometrical profile: U=83.98%, difference between the best result and FF: 13.3% of the FF’s result. 218

CONCLUSIONS AND PERSPECTIVES

This paper considered three metaheuristic local search algorithms adapted for the problem of task allocation on a processor mesh. An experimentation system has been developed. The experiments showed, that in general, local search metaheuristic algorithms perform well in solving the considered problem. Only for allocating small tasks on a large mesh, it is needless to use these algorithms, since they do not achieve better results than the basic FF algorithm which itself performs well, due to the easiness of fitting small tasks into free submeshes. On average, the best performer in all tests was the SA algorithm. It outperformed all other algorithms for mixed and large sized tasks. It also achieved reasonable results of over 83% of mesh usage. In the course of our proceedings, it was noticed that it is very important to design the experimentation environment well. It should give as many options as possible and, at the same time, should allow the user to easily design whole series of experiments. Despite our conclusions, it cannot be said that the problem has been fully explored and researched. There are other possible combinations of problem instances and testing parameters to be tested with our software. What is more, it is still possible to construct a far more thorough and versatile testing environment and to implement more algorithms, e.g. Genetic Algorithm, etc. REFERENCES [1]

[2] [3] [4] [5]

[6] [7]

GOH L. K. and VEERAVALLI B., Design and performance evaluation of combined firstfit task allocation and migration strategies in mesh multiprocessor systems. Parallel Computing, vol. 34, issue 9, September 2008, pp. 508-520. BUZBEE B. L., The efficiency of parallel processing. Frontiers of Supercomputing, Los Alamos 1983. GLOVER F., Tabu Search – part I. ORSA Journal on Computing, vol. 1, no. 3, Summer 1989. KIRKPATRICK S., GELATT C.D., VECCHI M.P., Optimization by Simulated Annealing. Science, New Series, vol. 220, no. 4598, May 13 1983, pp. 671-680. GRANVILLE V., KRIVANEK M., RASSON J.P., Simulated Annealing: a proof of convergence. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, issue 6, June 1994, pp.652-656. LAARHOVEN J.M., EMILE H., AARTS L., Simulated Annealing: theory and applications. Springer, 1987. GLOVER F., KOCHENBERGER G.A., “Handbook of metaheuristics. Springer, 2002.

219

Computer Systems Engineering 2009 Keywords: mesh network, task allocation, artificial neural network *

Rafał ŁYSIAK Iwona POŹNIAK-KOSZAŁKA* Leszek KOSZAŁKA*

ARTIFICIAL NEURAL NETWORK FOR IMPROVEMENT OF TASK ALLOCATION IN MESH-CONNECTED PROCESSORS An efficient allocation of processors to incoming tasks is especially important to achieve a high level of performance. A good allocation algorithm should identify a task and find for it the best position in the mesh. Depends on the type of the system, the algorithm should minimize the time which is necessary to find a place and keep the fragmentation on the lowest possible level. This paper showsthe results of Modified Best Fit Algorithm and the idea of usage of Artificial Neural Network to optimize the task allocation time.

1. INTRODUCTION Nowadays multicomputer systems with many processors, connecting with high speed networks are huge instruments for scientists all over the World. Among many problems with sharing it, a tasks allocation in two-dimensional (2D) mesh algorithm is one of the most popular. Efficient management of the system can increase the calculation level and general performance. The requirement here is to allocate incoming tasks to submeshes of appropriate size in the 2D mesh-based system. The size of the submesh can range from one node to the entire mesh. The allocation scheme needs to maximize the processor utilization while minimizing allocation time. This paper presents the results of the Modified Best Fit Algorithm (MBF) and the general idea of usage the Artificial Neural Network (ANN) to decrease the allocation time. The primary target of MBF was to achieve the highest possible level of the allocation. The allocation time was secondary target, because of ANN. The MBF Algorithm was compared with First Fit Algorithm. 2. DEFINITIONS AND NOTATIONS In this section, we will present definitions and useful notation that is use throughout the paper. *

Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. 220

Definition 1. Mesh is the set of nodes, which are representing the CPUs. Two-dimensional mesh network is denoted as M(W,H) where W is width and H is height of this mesh network; and where Wmax and Hmax are the maximum size of the mesh. The example of M(8,7) is shown in Fig. 1. Definition 2. Node can be described by N(x,y), where x means the column number and and . y means the row number;

Fig. 1. The example mesh network

Definition 3. Submesh is rectangular subset of the main mesh M. The submesh can be denoted as SM(x1,y1,x2,y2), where pair x1,y2 define the top left corner of the submesh, and x2,y2 define the bottom right corner of the submesh. A busy submesh is a submesh with all nodes busy. A free submesh is a submesh where all nodes are free to use. Definition 4. Task can be denoted as T(w,h,t), where w is width, h is height and t is the execution time. The knowledge of this 3 parameters can be used to allocate the task T in mesh M. Definition 5. Fragmentation is the index that expresses in percent [%] the efficiency of the allocation process. The fragmentation fM of mesh M can be defined by .

(1)

In (1) W and H represents the mesh M width and height; P represents the number of nodes from the biggest free submesh (explanation in Fig. 2); the sum is representing the number of all busy nodes.

221

Fig. 2. Mesh M(8,7) with fM=12.5%

Definition 6. Allocation time is the period between appearance of the task in the system and the final allocation in the mesh M. It contains the time which is necessary for finding the proper coordinates of the task. Definition 7. Execution time is the time when task has to be in the mesh M to be fully executed. After this period, task releases the submesh, converting it into the free submesh ready to use. Definition 8. Time horizon is the length of the simulation in dynamic model express in seconds or minutes. 3. PROBLEM DESCRIPTION The M(W,H) is given, where W and H are known. There is also the set of tasks {T1(w1,h1), T2(w2,h2), …, Tn(wn,hn)}, where wn and hn are known. The main purpose of the allocation algorithm is to return the location for the incoming task. The location is represented by <xl,yl>, where xl (column number) and yl (row number) are the coordinates. Top left corner of the mesh is denoted as <0,0>. The main criteria of the allocation process are: • minimize the fragmentation f, • minimize the task allocation time. The constraints are: • once allocated task cannot be moved, 222

• • • •

once allocated task stay in the location until it is fully executed, tasks are coming into the system every one second, if it is impossible to allocate the task it is rejected and the failure is thrown, tasks are allocated in the order of appearance.

It was dynamic allocation model used. The general overview of the allocation process is shown in Fig. 3.

Fig. 3. Dynamic allocation process scheme

4. MBF ALGORITHM MBF is the Modified Best Fit Algorithm. The main target of the algorithm is to keep the fragmentation (1) as low as it is possible. MBF uses 3 indexes to find the best possible location. , represents the number of neighbour busy nodes. This Index #1 an,, where is the same functionality as in original Best Fit (BF) [1], but in MBF there is also another limitation. MBF checks also the neighbours which are 2 nodes away, which brings better results of fragmentation. It was made because it was noticed that submeshes with width equals 1 or height equals 1, are the main reason of bigger fragmentation. The example has been shown in Fig. 4a. In original BF the “quality” of location 1 and 2 are the same for task T(2,3). In MBF location 1 is better than 2, because of free submesh SM(7,1,7,6) which remains in case of location 2. Index #2 bn,, where , represents the ratio between perimeter and area of the busy submesh which is the result of the allocation. It is better to keep the ratio on the possible low level. The example of the situation is shown in Fig. 4b. In location 1 and in location 2 . 223

Fig. 4a. Index #1 example

Fig. 4b. Index #2 example

It was noticed that the good behavior is to leave a free submesh, which size would be enough to allocate the most possible incoming task. The size of the most possible task was estimated due to the tasks that already appeared in the system. , represents the number of nodes, which will be used from Index #3 cn,, where the submesh SMl. SMl is the submesh that should stay free to use by the most possible incoming task. Next step is normalization process of sets A, B and C. The process is given by ;

;

(2)

Now, it is possible to represent every possible location by .

(3)

The final step is to find the minimal value of wi that is defined by ,

(4)

where L represents the best possible coordinates due to index #1, #2 and #3. 5. EXPERIMENTATION ENVIRONMENT MBF Algorithm was compared with First Fit Algorithm [2] using Java application that was developed for that experiment. The application gives the opportunity to run 224

the simulation using predefined time horizon, average tasks size, average execution time and mesh size. Both algorithms ran at the same conditions. It means that the time horizon and the tasks set were the same. Furthermore, the application is gathering the information about the allocation process and when the simulation is finished, this data are saved into the CSV file. CSV is easy to use in MS Excel to do the final analysis of algorithms performance. Application was written in Eclipse using Java. The example screenshot is shown in Fig. 5.

Fig. 5.Screenshot from the application

6. COMPARISON OF MBF AND FF Two experiments were run with different input values. In experiment #1 tasks were small and the execution time was long. In experiment #2 tasks were big and the execution time was short. 6.1. EXPERIMENT #1

The input values for experiment #1 are shown in Table 1. Table 1. The input values for experiment #1

Parameter Average task width Average task height Average task execution time The results for experiment #1 are shown in graphs below.

225

Value 10 points 10 points 1000 ms

Fig. 6a. Mesh fragmentation in experiment #1 for MBF

Fig. 6b. Mesh fragmentation in experiment #1 for FF

Fig. 6c. Allocation time in experiment #1 for MBF

Fig. 6d. Allocation time in experiment #1 for FF

In Fig. 6a and Fig. 6b mesh fragmentation was shown for MBF and FF algorithms. The values are expressed in percent [%]. The simulator checked the fragmentation every 1 second. In Fig. 6c and Fig. 6d the allocation time was shown. The values are expressed in milliseconds. 6.2. EXPERIMENT #2

The input values for experiment #2 are shown in Table 2. Table 2. The input values for experiment #2

Parameter Average task width Average task height Average task execution time

Value 20 points 20 points 100 ms

226

The results for experiment #2 are shown in graphs below.

Fig. 7a. Mesh fragmentation in experiment #2 for MBF

Fig. 7b. Mesh fragmentation in experiment #2 for FF

Fig. 7c. Allocation time in experiment #2 for MBF

Fig. 7d. Allocation time in experiment #2 for FF

In Fig. 7a and Fig. 7b mesh fragmentation was shown for MBF and FF algorithms. The values are expressed in percent [%]. The simulator checked the fragmentation every 1 second. In Fig. 7c and Fig. 7d the allocation time was shown. The values are expressed in milliseconds. 7. ARTIFICIAL NEURAL NETWORK As it was shown in Section 6, MBF algorithm gave better results than FF in case of fragmentation. The only problem was the allocation time. MBF was much slower then FF. 227

In this paper is shown the idea of usage of Artificial Neural Network (ANN) [3] to decrease the allocation time in MBF algorithm. To achieve the target an ANN was created in Matlab application. The topology and main configuration of the chosen ANN is shown in Table 3. Table 3. Artificial Neural Network configuration

Parameter Hidden layers number Number of input layer neurons Number of output layer neurons Number of hidden layer neurons Training function name Activation function

Value 1 27 25 10 Trainlm* [4] Sigmoid

It was 200-elements set used to train the ANN. The whole training process took about 2 minutes (for 200 training dataset; 25 input; 27 output). During training it is possible to watch the progress in real time graph. The graph is shown in Fig. 8.

Fig. 8. Training process graph

After the training ANN performance was 10-3 with training data and 10-2 with another data. Another data means datasets that wasn’t used during training process. 8. CONCLUSIONS AND PERSPECTIVES Results show that MBF is better than FF in both experiments in case of fragmentation level. In experiment #1, where tasks were small and the execution *

trainlm is a training function from Matlab. It is very fast but needs a lot of memory.

228

time was long, the fragmentation of the mesh was 10% lower using MBF. In experiment #2, where tasks were bigger and execution time were short, the difference was even bigger and was about 20%. It shows that performance of MBF is better than FF. The problem of MBF was the time that was necessary to find location for incoming task. In experiment #1, MBF was much slower than FF in the beginning of the simulation (when the mesh was empty). When the mesh was mostly busy the allocation time of both algorithms were quite similar, but MBF was slower. In experiment #2 MBF was much worst. The difference of allocation time was about 500ms. It was also shown that ANN can be used to learn how to allocate tasks in meshconnected processors network using MBF algorithm. It is also possible to try teaching ANN to allocate tasks, using other allocation algorithms. The main advantage of ANN is the quickness of working. There is a chance that ANN can be used to decrease the allocation time of MBF algorithm remarkably. In this paper, it was shown that ANN can be used to teach how to allocate tasks in mesh network. No simulations were run to check the real performance of such solution. In out future work, we will try to use ANN as effective and fast way to allocate tasks in mesh. REFERENCES [1] RUSSELL J. J., A Simulation of First and Best Fit Allocation Algorithms in a Modern Simulation Environment. Sixth Annual CCEC Symposium, 2008. [2] TENENBAUM A., WIDDER E., A comparison of first-fit allocation strategies. ACM Annual Conference/Annual Meeting, 2000, pp 875-883. [3] HEYKIN S., Neural Networks: A Comprehensive Foundation, Prentice Hall, 1998. [4] MATLAB Help.

229

Computer Systems Engineering 2009 Keywords: friction, limit cycles, rolling mill gauge control

Malgorzata SUMISLAWSKA∗ † Peter J. REEVE ∗ Keith J. BURNHAM ∗ Iwona POZNIAK-KOSZALKA† Gerald HEARNS‡

COMPUTER CONTROL ALGORITHM SIMULATION AND DEVELOPMENT WITH INDUSTRIAL APPLICATION

The paper addresses the prediction of limit cycles in a hot strip mill gauge control system. The effects of a strongly nonlinear friction force in the control loop are investigated. Use is made of the sinusoidal input describing function method in order to determine the limit cycle frequency and amplitude. The oscillations measured on the plant are reproduced by the model. Two methods are proposed in order to avoid the effects of the nonlinearity in the plant, namely, dither and friction compesation.

1. INTRODUCTION Rolling is a process of shaping a metal piece by a reduction of its thickness. The metal is compressed by passing it between rollers rotating with the same velocity and an opposite direction. The final stage of the rolling process is a finishing mill, where the main goal is to maintain the exit gauge, i.e. the thickness of the steel strip emerging from the mill, within the tight specifications and control it to a tolerance of 20µm. The finishing mill consists of several stands, which consecutively reduce the thickness of the steel strip. Each of the finishing mill stands is controlled separately. It has been observed that the exit gauge oscillates with a frequency od 0.36 Hz. It is belived that such a behaviour is due to limit cycles which are caused by a strongly nonlinear friction. Thus a model of a single finishing mill stand is developed. Then, limit cycles are predicted making use of a sinusoidal input dectribing function (SIDF) ∗

Control Theory and Applications Centre, Coventry University, Coventry, UK Wroclaw University of Technology, Wroclaw, PL ‡ Converteam, Rugby, UK †

230

Fig. 1. Details of controlled plant

and the frequency domain results are compared with the outcomes of a simulation. In order to eliminate the oscillations two methods are proposed and a simulation study is performed.

2. PLANT DETAILS A schematic view of the plant is presented in Fig. 1. The steel strip remains constantly in a contact with a pair of working rolls, which are supported by the backup rolls. The entry gauge of the strip is denoted as H, whilst h corresponds to the exit gauge. The hydraulic actuator at the top of the stack changes the position of the backup and work rolls, controlling the exit gauge [2, 4, 3]. The change of the exit thickness of the strip is dependent on the force acting on the metal piece Proll , further named roll force, and the position of the hydraulic piston z [2, 3]: Proll (1) ∆h = ∆z + M where M is the mill sensitivity to force (mill modulus) [2]. A change of the roll force in the point of operation can be described by the following linear equation: ∆Proll = −q∆h + R∆H (2) where Q and R are moduli of exit and entry gauge, respectively. 231

Harsh temperature contitions close to the rolling mill stand cause a direct measurement of the exit gauge impossible [2, 4, 3], hence a need arises to estimate the exit gauge change from the measured value of the roll force and the mill modulus: ∆he = ∆z + Pmeas

C M̃

(3)

where he refers to the estimated exit gauge, M̃ is the estimated value of the mill modulus and Pmeas corresponds to the measured value of the roll force. In order to improve the robustness of the control loop a compensation variable C < 1 is introduced [2]. There are two possibilities to measure the roll force: making use of the loadcel sensor at the bottom of the stack and from the pressure in the hydraulic chamber (cf. Fig. 1). Due to economic reasons the latter method is utilised. However, the force measured from the hydraulic pressure is affected by the mill housing friction (Pf ric ): Pmeas = Proll + Pf ric

(4)

It is believed that the friction force, due to its nonlinear character, leads to limit cycles, and hence to the oscillatory behaviour of the plant.

3. PLANT MODELLING 3.1.

STACK MODEL

The stack is modelled making use of a classical mass-spring-damper model (cf. Fig. 2). Due to a symmetrical construction of the stack, only upper backup and work rolls are taken into consideration. In the further analysis the damper d1 is replaced by a friction model, which introduces a non-linear dependency between the piston velocity and the friction force. 3.2.

ACTUATOR MODEL

The hydraulic actuator model is presented in Fig. 3. The term Kp refers to the proportional position controller gain and defines a relation between the position error and the fluid flow to the capsule q. p corresponds to the capsule pressure acting on the piston of area Ap . Dependency between the fluid flow into the capsule and the pressure acting on the piston area is represented by the following linear relation [5, 7, 9]: R qdt − Ap l (5) p = Kc Ap l 232

Fig. 2. Mass-spring-damper representation of stack

Fig. 3. Model of hydraulic servo system

Where l refers to the stroke length, and the term Kc corresponds to the hydraulic oil compressibility coefficient. The force acting on the hydraulic piston is given by: Fh = Ap p

(6)

The overall model of the rolling mill stand, containing the stack and the actuators models, is presented in Fig. 4. 3.3.

FRICTION MODEL

The friction is modelled as a sum of three components: a Coulomb friction, a viscous friction and a Stribeck friction (also named stiction). Pf ric = PC + PS + PV

(7)

where the term Pf ric denotes the total frictional force, whilst PV , PC and PS refer to the viscous, Coulomb and Stribeck friction, respectively. 233

Fig. 4. Overall plant model

Coulomb friction is modelled as follows: | ż |

PC = −FC sign(ż)(1 − e VC )

(8)

The term FC denotes the Coulomb friction level, whilst the exponential term is introduced in order to avoid a zero-crossing discontinuity. The element related to the viscous friction is modelled as a linear function of velocity: PV = −mV ż

(9)

where the term mV is a viscous damping of the frictional force. The Stribeck friction (stiction) model is given by: | ż |

| ż |

PS = −FS sign(ż)(1 − e V1 )e V2

(10)

The term FS determines the magnitude of the static friction, whilst V1 and V2 are utilised to shape the stiction model.

4. INVESTIGATIONS 4.1.

LIMIT CYCLE PREDICTION

Amplitude and frequency of the limit cycles predicted using sinusoidal input describing function (SIDF) method [1, 6]. The results of the frequency domain analysis are presented in Fig. 5. One can notice a strong dependence of the strip modulus on the amplitude of limit cycles. The amplitude of the oscillations increase with an increase of 234

Limit cycle prediction

−10

x 10

Limit cycle prediction

−9

x 10 C=1 C=0.9 C=0.8

Q=300T/mm Q=900T/mm Q=2000T/mm

1.5 1

Imag

4 2

Increasing amplitude of limit cycle

0.5

Increasing amplitude of limit cycle

0 −2

−0.5

−4

−14

−12

−10

−8

−6

−4

−2

Real

−1 −2.5

−2

−10

x 10

−1.5

−1 Real

−0.5

0 −9

x 10

Fig. 5. Prediction of limit cycles; left: impact of compensation variable, right: impact of strip modulus

the strip sensitivity to force. The influence of the strip modulus on the frequency of limit cycles in negligible. The compensation variable has also an impact on the amplitude of limit cycles, whilst no influence on the oscillations frequency. 4.2.

SIMULATION STUDY

The results obtained in the frequency domain are confronted with the outcomes of the simulation study. Fig. 6(a) shows a strong dependence of the limit cycle frequency on the compensation variable and the strip modulus. All the same, the influence of the above mentioned paremeters on the frequency of oscillations is negligible. Basic assumption of the SIDF technique is a sinusoidal input to the nonlinearity [1, 6, 8]. The simulation study shows that if the input to the nonlinear element strongly deviates from a sinusoide, the frequecy and amplitude of the oscillations significantly differ from those obtained making use of the SIDF method (cf. Fig. 6(b)). Futhermore, a dependency between the friction model coefficients and the shape of the input to the nonlinearity is observed. This fact is utilised for a reproduction of the oscillations measured on the plant (cf. Fig. 7).

5. PROPOSED CONTROL In order to suppress the limit cycles two solution are proposed, namely, dither and a model based friction compensation. 235

Simulated limit cycle for various Q 1.04 Q=500 T/mm Q=900 T/mm

Exit gauge [mm]

1 0.96 15

Simulated limit cycle for various C 1.04 C=0.9 C=1

1 0.96 15

(a) Influence of C and Q on limit cycle frequency and amplitude

Exit gauge [mm]

Dependence between input to nonlinearity and shape of limit cycles 1.1 Non−sinusoidal input to nonlinearity Sinusoidal input to nonlinearity 1

0.9 10

15 Time [s]

(b) Dependence of limit cycle frequency and amplitude on shape of input to nonlinearity Fig. 6. Time domain simulation of limit cycles

5.1.

DITHER

The simulation study shows that adding a high frequency signal to the piston position reference signal results in a limit cycles elimination (cf. Fig. 8). The dither is in a form of a square wave with the frequency of 25Hz (half of the servo-system controller sampling time) and amplitude of 15µm. Although a dither gives very promising results in a simulation it may be difficult to apply it on the real plant. Long-term vibrations of the hydraulic chamber may lead to an increased wear and tear of mechanical parts. This would shorten a lifetime of the actuator and, consequently, raise the maintenance costs of the plant.

236

Force [T] 50 Measured Simulated

0 −50 10

Gap position [mm] 0.05 0 −0.05 10

Gaugemeter exit gauge [mm] 0.05 0 −0.05 10

Time [s]

Fig. 7. Reproduction of measured data

5.2.

FRICTION COMPENSATION

Based on a friction model, the friction force is estimated making use of the piston velocity. The estimated friction force is then subtracted from the measured hydraulic force. Thus the approximated roll force, which is the difference between the measured force and the estimated friction, is an input to the gaugemeter. Variable environmental conditions of the plant, such as temperature, properties of lubricating agent, wear of contacting surfaces, make the friction difficult to estimate. Hence, the efficiency of a friction compensator in presence of a model mismatch is investigated. The dependencies of the ‘true’ friction model and the models used by the compensator on the piston velocity are presented in Fig. 9. One can notice a significant discrepancy between the friction acting on the piston and the model used for compensation. Simulation results of the friction compensation are presented in Fig. 9. Although the friction model mismatch is significant, the compensator performs very well. Making use of a linear viscous friction model, the amplitude of limit cycles is reduced ca. five times, whilst the second model (viscous plus Coulomb) virtually eliminates the unwanted oscillations. 237

Exit gauge [mm]

Transient response

Steady state

1.02

1.5

1.01

0.5

0.99

0.5

1.5

0.98 30

Gap [mm]

2.04 3

2.02

1.98 1.96

0.5

1.5

Time [s]

Fig. 8. Elimination of limit cycles by applying dither. Grey dashed line: before application of dither, black solid line: with dither

6. CONCLUSIONS The rolling mill is modelled making use of the well-known mass-spring-damper representation. A nonlinear model of the friction is developed. The sinusoidal-input describing function (SIDF) technique is utlisised in order to predict limit cycles. The inverstigation in the frequency domain shows a strong influence of the compensation veriable and the strip modulus on the amplitude of oscillations, but no impact on the limit cycle frequency. The predicted frequency of oscillations is 0.9 Hz, whilst the frequcny of registered signals is given by 0.36 Hz. The simulation study show a significant influence of the shape of the niput to the nonlinearity on the frequency of limit cycles, what is not a surprise since the basic assumption of the SIDF method is a sinusoidal input to the nonlinear element [1, 6, 8]. The developed model is capable of reproducing the measured data, if the input to the nonlinearity daviayes from a sinusoide. Two methods of limit cycle elimination are proposed: dither and model based friction compensation. A simulation of the former gives promising results, however there is a diffinculty in application of a dither in practice. Thus a need for more sophisticated limit cycle suppression method arises. The friction compensation virtually eliminates eliminates the oscillations, even in presence of a significant discrepancy between the friction acting on the hydraulic piston and the friction model used for compensation. 238

Effect of friction compensation

Friction models used for compensation x 10 4

true model 1 model 2

Exit gauge [mm]

Friction [N]

2 1 0 −1 −2

1.01 1 0.99 0.98

−3 −4 −1.5

no compensation model 1 model 2

1.02

−1

−0.5

0.5

Velocity [m/s]

1 −5

x 10

12 13 Time [s]

Fig. 9. Left: friction acting on the piston (denoted as ‘true’) and models used for compensation. Right: simulation of friction compensation

REFERENCES [1] ATHERTON D. P. Nonlinear control engineering. Van Nostrad Reinhold Company Ltd., Wokingham, 1975. [2] ALSTOM Power Conversion. Gaugemeter control, 2003. [3] YILDIZ S. K. et al. Dynamic modelling and simulation of a hot strip finishing mill. Applied Mathematical Modelling, vol. 33, pp. 3208 – 3225, 2009. [4] HEARNS G. Hot Strip Mill Gauge Control: Part 1. Converteam, 2009. [5] JELALI M. and KROLL A. Hydraulic servo-systems: modelling, identification and control. Springer-Verlag, London, 2003. [6] KHALL H. K. Nonlinear systems. Prentice Hall, Upper Saddle river, New Jersey, 3rd edition, 2002. [7] MERRITT H. Hydraulic Control Systems. John Wiley and Sons Ltd., New York, 1967. [8] VAN DE VEGTE J. Feedback control systems. Prentice Hall Inc., Englewood Cliffs, New Jersey, 3rd edition, 1994. [9] VIERSMA T. J. Analysis, Synthesis and Design of Hydraulic Servosystems and Pipelines. Elsevier Scientific Publishing Company, New York, 1980.

239

Computer Systems Engineering 2009 Keywords: nonlinear systems, model based predictive control, PI controller, HVAC, reduced energy consumption

Ivan ZAJÍC† Keith J. BURNHAM† Tomasz LARKOWSKI† Dean HILL‡

DEHUMIDIFICATION UNIT CONTROL OPTIMISATION

The paper focuses on heating ventilation and air conditioning (HVAC) systems dedicated for clean room production plants. The aim is to increase the energy efficiency of HVAC systems via control parameter optimisation. There is much scope for improvement within the humidity control, where the dehumidification units (DU) are employed. The current control of DU, utilises a proportional plus integral (PI) controller, which is sufficient for maintaining the specified levels of humidity. However, since the dehumidification process exhibits non-linear characteristics and also large transport delays are present the tuning of PI controller is a non-trivial task. The research focus is on applying control optimisation technique based on a model predictive controller in order to achieve tight specifications and energy efficient control performance.

1. INTRODUCTION Abbott Diabetes Care (ADC) UK, an industrial collaborator of Control Theory and Applications Centre, develops and manufactures the glucose and ketones test strips, which are designed to help people with diabetes. One of the production quality requirements is that the environmental conditions during the production are stable, where the air relative humidity has to be lower then 20% and the corresponding temperature is 20.5 ± 2o C. Heating ventilation and air conditioning (HVAC) systems are commonly used to maintain environmental conditions in industrial (and office) buildings. The HVAC system provides the manufacturing areas with conditioned fresh air such that the air tem† ‡

Control Theory and Applications Centre, Coventry University, Coventry, UK Abbott Diabetes Care, Witney, Oxfordshire, UK

240

perature and the relative humidity are regulated within specified limits. In some cases the air CO2 level is also required to be regulated, however this is not the case here. The HVAC systems are highly energy demanding. Just in ADC UK the estimated annual energy expenditure for 2009 is £3 million. The increased energy costs and the awareness of the environmental issues, such as CO2 emissions, prompts the need for increased energy efficiency of these systems. It is estimated in [4] that the 15% of HVAC’s overall energy usage can be saved by good control. The paper is focused on increasing the energy efficiency of HVAC systems located in ADC UK via control parameter optimisation. There are around 70 HVAC systems in ADC UK, where only one is chosen for testing and optimisation purposes. The typical HVAC plant comprises from two basic components, i.e. the dehumidification unit (DU) and air handling unit (AHU). Both of the units utilises a proportional plus integral (PI) controller for adjusting the gas and cooling/heating valve, respectively. Up to date the largest scope for improvement has been found within the humidity control, where the DU is employed, hence the research focus is narrowed to control optimisation of DU. The optimisation of temperature control is treated here as a further work. Some work on PI gain tuning and PI controller enhancement has been already done in [8], where the cost function minimisation technique based on the derived dehumidification process model is employed. The tuning method, which utilises an unconstrained model predictive controller (MPC) is applied here and the results are compared with those from [8]. This method provides more intuitive way of PI gain tuning, while the problem of the cost function selection is avoided. Since the MBPC controller is based on the quadratic cost function minimisation and can guarantee an optimal performance at each successive time instance the appropriate PI gains can be assigned online, i.e. gain scheduling. Moreover, the physical constraints such as the speed of valve modulating, valve operational limits and the production environment limits can be inherently implemented within the constrained MPC.

2. PLANT DESCRIPTION The chosen production area is a room with the designation CCSS2. The schematic of the HVAC plant is given in Figure 1. Starting at the point where the return air is extracted by suction from the manufacturing area (environmentally controlled room), denoted 1, and passed through the main duct to the mixing section, denoted 2. The return air is mixed with the fresh air from the fresh air plant and progress to the DU, where the mixed air is dehumidified. The dehumidified air then progresses from section 241

3 to the AHU, where the air is heated or cooled depending on the operating requirements. The conditioned air then continues to section 4, where a pre-configured amount of air is conducted to the air lock and back to manufacturing area. The detailed description of DU functionality will follow in subsequent section and the description of the other HVAC components can be found in eg. [1]. Temperature Humidity Exhaust

Outside air PI

Fresh air 3

Air handling unit

Dehumidification unit

Damper Measured signals 1

Room

Air lock Fig. 1. Schematic diagram of the HVAC system.

2.1.

DEHUMIDIFICATION UNIT DESCRIPTION

DU comprises of a large wheel with a honeycomb structure coated with a moisture absorbent desiccant, in this case, silica gel. The wheel rotates with a constant angular velocity of 0.17rpm. The process air is driven through the lower part of the wheel, which is approximately 34 of its overall surface. The silica gel absorbs the moisture from the air, however, this process is exothermal and the air is warmed as well. Consequently, to remove the absorbed water, from the silica gel, the inverse process has to be applied, hence heat needs to be provided. The hot outdoor air is blown through the upper part 242

of the dehumidification wheel, approximately 14 of its surface. The outdoor air is heated with a gas burner, as the hot outside air is driven through the dehumidification wheel it dries the silica gel and then goes to the exhaust. The more heated the outdoor air is, the more absorbed water is removed from the silica gel. The gas burner is coupled with the gas valve, so that by adjusting the gas valve position the temperature of the outdoor air blown through the dehumidification wheel can be regulated. Consequently, the capability of DU to absorb the water vapour from the process air is modulated as well. Due to the cross-coupling between air temperature (dry bulb temperature) and relative humidity the humidity level within the manufacturing area is measured in terms of the dew point temperature.

3. DEHUMIDIFICATION PROCESS IDENTIFICATION The modelling of the relatively complex MIMO HVAC system has been reduced here to consideration of the dehumidification process only, since this process has the greatest influence on the energy consumption. The humidity control system in the closed-loop (CL) setup is depicted in Figure 2, where P denotes the dehumidification process and D denotes the dynamics of disturbances caused by personnel within the clean room production area. The signal y(t) = y(t) + h(t) denotes the measured system output h(t) SYSTEM u2 (t) r(t)

e(t)

u1 (t)

D h̄(t)

ȳ(t)

y(t)

−

Fig. 2. Schematic of the closed loop system with existing PI controller.

(humidity measured in terms of dew point temperature), u1 (t) denotes the control ac243

tion, u2 (t) the (measurable) second input being the outdoor air relative humidity, r(t) is the set-point, e(t) denotes the error signal and h(t) denotes an assumed (not measured) load disturbance. Since u2 (t) has a direct undesirable impact on the humidity level within the production area, the relative humidity of the outdoor air can also be interpreted as a disturbance. For the system identification purposes assume single-input single-output discretetime system expressed in a difference equation form, i.e. y(t) + a1 (t)y(t − 1) + . . . + an (t)y(t − n) = b1 (t)u(t − 1) + . . . + bm (t)u(t − m), (1) where t denotes discrete time index. The state dependent parameters are denoted ai (t), i = 1, . . . , n, and bi (t), i = 1, . . . , m, respectively. It is assumed that the individual state dependent parameters are functions of one of the variables in a state variable vector x(t), see eg. [7]. The pure time delay, denoted d, given in sampling intervals is introduced such that the appropriate number of leading bi (t) parameters are zero, i.e. bi (t) = 0 for i = 0, . . . , d − 1. The dehumidification process P is modelled as a first order system with d = 5 samples, i.e. y(t) = −a1 (t)y(t − 1) + b5 (t)u1 (t − d) + o + ξ(t). (2) The individual state dependent parameters are expressed as a1 (t) =α1 − η1 u1 (t − d),

(3a)

b5 (t) =η2 u2 (t − d) + α2

(3b)

and ξ(t) denotes coloured noise given by ξ(t) = e1 (t) + c1 e1 (t − 1).

(4)

Constant offset is denoted o, e1 (t) is a white zero mean process noise having variance σ12 . Note, that the coefficients η1 and η2 can be interpreted as a bilinear terms. The load disturbance h(t) is assumed to be caused by personnel in the production area. However, there are also other disturbances, which are not modelled here, e.g. moisture from cleaning floor and work surfaces and moisture produced by machinery. For the tuning purposes the signal h(t) is assumed to have a staircase shape. A single stair represents a single person, who generates the moisture corresponding to 2o C increase of a dew point temperature. Two stairs represent two persons within the room. Maximum number of people within the room at the same time is 2. The width of a stair

244

represents the time of occupancy (time duration). It is expected that D can be described by a first order process K , (5) D(s) = s+a where the constant a (pole) is inversely proportional to the settling time, denoted Tset , i.e. a = 4/Tset . The settling time is chosen to be Tset = 5min. The gain, denoted K, is chosen to be unity. 3.1. PARAMETER ESTIMATION

The chosen production area had no personnel (h(t) = 0) nor operating machinery inside at the time of data acquisition. Consequently, the measured output is thus the case y(t) ≈ y(t) and a dehumidification process can be directly estimated. The HVAC system was operating in an open-loop (OL) setup, which allows OL estimation techniques to be used, such as the considered method of extended least squares (ELS), see [3]. However, in practice, the production process requires, that the specified humidity level is maintained at all times hence only CL estimation techniques can be considered in normal practice. Two data sets are collected from the clean room production area. Whilst the first data set, collected on 27/05/2009, is used for model estimation, the second data set, collected on 28/05/2009, is used for model validation. Both data sets contain signals y(t), u1 (t) and u2 (t), which were acquired with a sampling time of 1s. The estimation data set comprising of 35, 000 data samples is plotted in Figure 3. The measured signals were re-sampled with the new sampling time Ts = 32s and the corresponding time delay of 5 samples has been estimated, i.e. d = 5. The coefficients of model (2) have been estimated by adopting the method of ELS. The results are given in Table 1. Table 1. Estimated coefficients of dehumidification process model.

α1 -1.018

α2 -0.034

η1 6.728 × 10−6

η2 -0.001

o 0.838

c1 −4.930 × 10−6

The dehumidification process model (2) is simulated in OL setting using the inputs u1 (t) and u2 (t) acquired on 27/05/2009 and 28/05/2009, respectively. The fit (given in [%]) between the simulated system outputs and the measured outputs is assessed by kys (t) − y(t)k2 × 100, fit = 1 − (6) ky(t) − E[y(t)]k2 245

y(t) [o C]

−10 −20 0.5

1.5

2.5

3.5 4

u1 (t) [%]

x 10 60 50 40 30 0.5

1.5

2.5

3.5 4

u2 (t) [%]

x 10 85 80 0.5

1.5

2.5

Samples

3.5 4

x 10

Fig. 3. The estimation data set acquired on 27/05/2009 in OL setting with Ts = 1s.

where E[·] denotes the mathematical expectation and ys (t) is the simulated system output. The model fit is then 97.22% for estimation data set and 85.82% for validation data set. Note, that the parsimonious model structure (2) assures relatively high model fit and is suitable for assumed MPC.

4. CONTROL OPTIMISATION The optimisation scheme is based on the unconstrained non-minimal state space MPC, see [6], and consist of two stages. Firstly, the modelled process P is simulated in the CL setting with the MPC controller. During such a simulation the load disturbances caused by personnel are imposed. The tracking error, i.e. e(t) = r(t) − y(t), is acquired together with the corresponding optimal control action computed by the MPC 246

controller. Secondly, the optimal PI gains are found such that the squared error between the control action obtained by the MPC controller in the first stage and the control action obtained by the considered PI controller is minimised. Thus the tuning task reduces from tuning of the PI controller to tuning of the MPC controller. 4.1.

MODEL PREDICTIVE CONTROLLER

The MPC design is based on the mathematical model of the plant, which is assumed to be a non-minimal state space model. Defining the state variable vector as x(t) = ∆y(t) ∆y(t − 1) . . . ∆y(t − n + 1) T ∆u(t − 1) . . . ∆u(t − m + 1) y(t) , (7)

where ∆ is the differencing operator defined as ∆ = 1 − q −1 and q −1 is the backward shift operator defined as q −1 y(t) = y(t − 1), u(t) denotes the controllable input, i.e. u(t) = u1 (t). Note, that the states in such a state variable vector are current and past measurements of system output and past measurements of system input, hence the state estimation is avoided here. Then the corresponding non-minimal state space model is defined as follows x(t + 1) = Ax(t) + B∆u(t) (8) and the output equation is given by y(t) = Cx(t),

(9)

where the state transition matrix A, input vector B and output vector C are defined as   −a1 −a2 . . . −an b2 b3 . . . bm 0  1 0 0 0 0 0  0 0 0     . .  0 . 0 0 0 0 0 0 0     0 0 1 0 0 0 0 0 0    , 0 0 0 0 0 0 0 0 0 (10) A=     0 0 0 0 1 0 0 0 0     ..  0 . 0 0 0 0 0 0 0     0 0 0 0 0 0 1 0 0  −a1 −a2 . . . −an b2 b3 . . . bm 1 B=

b1 0 . . . 0 1 0 . . . 0 0 247

(11)

0 ... 0 0 0 0 ... 0 1

(12)

Note that the time indexes of the state variable parameters are relaxed here, i.e. a1 (t + 1) = a1 . Since the MPC controller does predict the future values of the system output, y(t+1), . . . , y(t+Hp ), at time t based on the mathematical model of the plant, the future values of the state variable parameters, a1 (t + 1), . . . , a1 (t + Hp ), are also required. However, these are in general unavailable and for simplicity are considered here to be constant over the prediction horizon, hence a1 (t) ≈ a1 (t + 1), . . . , a1 (t + Hp ). The aim of the MPC is to minimise the variance of the future error between the system output and set-point by predicting the output and knowing the set-point in advance, hence minimising the given cost function, see [5], i.e. JM P C = (Y − R)T (Y − R) + ∆U T λI∆U,

(13)

with respect to current and future values of the differenced control action yields the desired control action. The vector of future set-points (or reference signal) is defined as RT = Rr(t)

(14)

and R is the 1 × Hp unit vector given by R = 1 1 ... 1 .

(15)

The vector of the predicted system outputs is given by Y =

ŷ(t + 1|t) ŷ(t + 2|t) . . . ŷ(t + Hp |t)

(16)

where ŷ(t + j|t), j = 1, . . . , Hp denotes the predicted system output based on the information up to and including time t, and the vector of incremental control actions is defined as T ∆U = ∆u(t) ∆u(t + 1) . . . ∆u(t + Hc − 1) . (17) The user specific tuning parameters are prediction horizon Hp , control horizon Hc ≤ Hp and scalar cost weighting parameter λ. The analytical solution of the cost function minimisation is obtained by setting ∂JM P C =0 ∂∆U 248

(18)

and rearranging the solution with respect to ∆U leads to the (unconstrained) MPC algorithm −1 T ∆U = ΦT Φ + λI Φ [R − F x(t)] , (19)

where only the first term of ∆U is applied to the plant, hence

uM P C (t) = uM P C (t − 1) + ∆u(t).

(20)

The matrices F and Φ are defined as 

  F =  and



  Φ= 

CB CAB .. .

CA CA2 .. . CAHp

0 CB

    

(21)

... ... .. .

0 0

CAHp −1 B CAHp −2 B . . . CAHp −Hc B 4.2.



  . 

(22)

PI GAIN OPTIMISATION

The optimisation scheme is based on a so called control signal matching method [2]. In this case the PI gains are chosen to yield a close match between the PI and MPC control signals. So that the the tuning task of finding pair of PI gains moves towards tuning of the MPC controller instead, which in this case, is more straightforward. The PI controller algorithm is given by t

uP I (t) = Kp e(t) +

Kp Ts X e(i), TI

(23)

i=1

where only unknown parameters in the equation (23) are the proportional gain Kp and integral time TI , hence the PI control gains. In order to obtain the PI control gains, the following cost function is minimised JP I =

N X

[uP I (t|Kp , TI ) − uM P C (t)]2 ,

t=1

with respect to Kp and TI . 249

(24)

5. SIMULATION STUDY The environmental conditions are such that the air relative humidity has to be lower then 20% and the corresponding temperature is 20.5 ± 2o C. Considering the safety margins on the relative humidity the targeted relative humidity in the manufacturing area is 10% and the corresponding set-point is then r(t) = −11o C (measured in terms of dew point temperature). The main goals of the PI controller, in the considered case, is to maintain, firstly, the humidity level at the required set-point and, secondly, to reject the load disturbances. Since the load disturbances, i.e. personnel in the production area, may cause violation of the relative humidity limit (20%) the PI controller is tuned such that the fast disturbance rejection is achieved. 5.1.

SIMULATION SETUP

The MPC controller (19) is simulated in the CL setting with the humidity process model (2). During such a simulation the disturbances are introduced. The disturbance signal h is shown in the bottom part of the Figure 4 and filter D (5) has been discretised using zero order hold having sampling time Ts = 32s. The tracking error e(t) and the corresponding optimal control action uM P C (t), i.e. u1 (t), are acquired during such a simulation run. The setting of the MPC controller is chosen such that Hp = 100, Hc = 1 and λ = 2, which corresponds to rather active setting. For the completeness, the state variable vector and the corresponding triplet {A, B, C} are shown, hence x(t) = ∆y(t) ∆u1 (t − 1) ∆u1 (t − 2) ∆u1 (t − 3) T ∆u1 (t − 4) y(t) (25) and the matrixes {A, B, C} are

−a1  0   1 A=  0   0 −a1 

0 0 0 1 0 0

0 0 0 0 1 0

0 b5 0 0 0 0 0 0 0 0 0 b5

 0 0   0  , 0   0  1

(26)

(27)

0 1 0 0 0 0

0 0 0 0 0 1

250

, .

(28)

5.2.

SIMULATION RESULTS

The optimised PI gains based on the proposed optimisation procedure are Kp = −3.03 (−4.69), TI = 23.52 (27.89)min,

(29)

where the results given in brackets are those from [8]. Consequently, both pair of PI gains (29) were implemented in ADC UK yielding similar response. It is assumed this is due to the plant insensitivity to precise value of PI gains caused by e.g. the gas valve stiction, sensor resolution etc..The graphical results of a optimisation procedure are shown in Figure 4, where the uM P C (t) obtained from MPC and uP I (t) obtained via optimisation procedure are shown. Is is evident that the PI controller cannot achieve the same performance as MPC.

6. CONCLUSIONS The PI controller which maintains the humidity level within the production area has been optimised. The optimisation procedure is based on a control signal matching method, where the PI gains are chosen such that the control signal from model predictive controller matches the control signal from PI controller as close as possible. Since during such a optimisation procedure no measurement noise is present, rather non-minimal state space model based controller has been chosen, where the states are current and past measurements of system output and past measurements of system input. The optimised PI gains were compared to those obtained by cost function minimisation technique based on the identified state dependent model of the dehumidification process. Consequently, the two pairs of PI gains were applied in Abbott Diabetes Care yielding undistinguishable plant behaviour. It is anticipated, that this is cause by plant insensitivity to exact PI gains.

REFERENCES [1] HILL D., DANNE T., BURNHAM K. J., Modelling and control optimistion of desiccant rotor dehumidification plant within the heating ventilation and air conditioning systems of a medical deivce manufacturer. Proceedings of the International Conference on Systems Engineering ICSE, 2009, pp. 207–212. [2] JOHNSON M. A., MORADI M. H., PID control: New identification and design methods. Springer-Verlag, London, 2005. 251

u1 (t) [%]

40 35 30

uM P C uP I (t)

25 20 500

1000

1500

2000

2500

1000

1500

2000

2500

h(t) [o C]

2 1 0 500

Samples Fig. 4. Comparison of the control action computed utilising the unconstrained MPC, denoted uM P C , and the estimated control action computed by the PI, denoted uP I , where the PI gains are optimised based on the uM P C signal. (Ts = 32s).

[3] LJUNG L., System Identification - Theory for the user. Prentice Hall PTR, New Jersey, 1999. [4] LEVENMORE G., Building control systems - CIBSE guide H. Oxford, UK, 2000. [5] WANG L., Model predictive control system design and implementation using Matlab. Springer, 2009. [6] WANG L., YOUNG P. C., An improved structure for model predictive control using nonminimal state space realisation. Journal of Process Control, vol. 16, 2006, pp. 355–371. [7] YOUNG P. C., MCKENNA P., BRUUN J., Identification of non-linear stochastic systems by state dependent parameter estimation. Int. J. Control, vol. 74(18), 2001, pp. 1837–1857. [8] ZAJIC I., LARKOWSKI T., HILL D., BURNHAM K. J., Nonlinear compensator design for HVAC systems: PI control strategy. Proceedings of the International Conference on Systems Engineering ICSE, 2009, pp. 580–584. 252