Jamris 2014 Vol 8 No 2

Page 1

pISSN 1897-8649 (PRINT) / eISSN 2080-2145 (ONLINE)

VOLUME  8

N° 2

2014

www.jamris.org


Journal of Automation, mobile robotics & Intelligent Systems

Editor-in-Chief Janusz Kacprzyk (Systems Research Institute, Polish Academy of Sciences; PIAP, Poland)

Associate Editors: Jacek Salach (Warsaw University of Technology, Poland) Maciej Trojnacki (Warsaw University of Technology and PIAP, Poland)

Co-Editors: Oscar Castillo (Tijuana Institute of Technology, Mexico) Dimitar Filev (Research & Advanced Engineering, Ford Motor Company, USA) Kaoru Hirota (Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Japan) Witold Pedrycz (ECERF, University of Alberta, Canada) Roman Szewczyk (PIAP, Warsaw University of Technology, Poland)

Statistical Editor: Małgorzata Kaliczynska (PIAP, Poland)

Executive Editor: Anna Ladan aladan@piap.pl

The reference version of the journal is e-version.

Editorial Board:

Patricia Melin (Tijuana Institute of Technology, Mexico)

Chairman: Janusz Kacprzyk (Polish Academy of Sciences; PIAP, Poland)

Tadeusz Missala (PIAP, Poland)

Mariusz Andrzejczak (BUMAR, Poland)

Fazel Naghdy (University of Wollongong, Australia)

Plamen Angelov (Lancaster University, UK)

Zbigniew Nahorski (Polish Academy of Science, Poland)

Zenn Bien (Korea Advanced Institute of Science and Technology, Korea)

Antoni Niederlinski (Silesian University of Technology, Poland)

Adam Borkowski (Polish Academy of Sciences, Poland)

Witold Pedrycz (University of Alberta, Canada)

Wolfgang Borutzky (Fachhochschule Bonn-Rhein-Sieg, Germany)

Duc Truong Pham (Cardiff University, UK)

Chin Chen Chang (Feng Chia University, Taiwan)

Lech Polkowski (Polish-Japanese Institute of Information Technology)

Jorge Manuel Miranda Dias (University of Coimbra, Portugal)

Alain Pruski (University of Metz, France)

Bogdan Gabrys (Bournemouth University, UK)

Leszek Rutkowski (Czêstochowa University of Technology, Poland)

Jan Jablkowski (PIAP, Poland)

Klaus Schilling (Julius-Maximilians-University Würzburg, Germany)

Stanisław Kaczanowski (PIAP, Poland)

Ryszard Tadeusiewicz (AGH Univ. of Science and Technology in Cracow, Poland)

Tadeusz Kaczorek (Warsaw University of Technology, Poland)

Stanisław Tarasiewicz (University of Laval, Canada)

Marian P. Kazmierkowski (Warsaw University of Technology, Poland)

Piotr Tatjewski (Warsaw University of Technology, Poland)

Józef Korbicz (University of Zielona Góra, Poland)

Władysław Torbicz (Polish Academy of Sciences, Poland)

Krzysztof Kozłowski (Poznan University of Technology, Poland)

Leszek Trybus (Rzeszów University of Technology, Poland)

Eckart Kramer (Fachhochschule Eberswalde, Germany)

René Wamkeue (University of Québec, Canada)

Piotr Kulczycki (Cracow University of Technology, Poland)

Janusz Zalewski (Florida Gulf Coast University, USA)

Andrew Kusiak (University of Iowa, USA)

Marek Zaremba (University of Québec, Canada)

Mark Last (Ben–Gurion University of the Negev, Israel)

Teresa Zielinska (Warsaw University of Technology, Poland)

Editorial Office: Industrial Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, POLAND Tel. +48-22-8740109, office@jamris.org Copyright and reprint permissions Executive Editor

Publisher: Industrial Research Institute for Automation and Measurements PIAP

If in doubt about the proper edition of contributions, please contact the Executive Editor. Articles are reviewed, excluding advertisements and descriptions of products. All rights reserved © Articles

1


JOURNAL of AUTOMATION, MOBILE ROBOTICS & INTELLIGENT SYSTEMS VOLUME 8, N° 2, 2014 DOI: 10.14313/JAMRIS_2-2014

CONTENTS 3

36

Computer aided methods for stability analysis of 2D linear systems described by the first Fornasini-Marchesini model Mikolaj Buslowicz, Andrzej Ruszewski DOI 10.14313/JAMRIS_2-2014/12 9

Arm Manipulator Position Control Based On Multi-Input Multi-Output PID Strategy Fatima Zahra Baghli, Larbi El Bakkali, Yassine Lakhal, Abdelfatah Nasri, Brahim Gasbaoui DOI 10.14313/JAMRIS_2-2014/17 40

An adequate mathematical model of four-rotor flying robot in the context of control simulations Stanislaw Gardecki, Wojciech Giernacki, Jaroslaw Goslinski, Andrzej Kasinski DOI 10.14313/JAMRIS_2-2014/13 17

Mathematical modeling and computer aided planning of communal sewage networks Lucyna Bogdan, Grazyna Petriczek, Jan Studzinski DOI 10.14313/JAMRIS_2-2014/14 24

A new heuristic possibilistic clustering algorithm for feature selection Janusz Kacprzyk, Jan W. Owsinski, Dimitri A. Viattchenin DOI 10.14313/JAMRIS_2-2014/18 47

Extracting fuzzy classifications rules from three-way data Janusz Kacprzyk, Jan W. Owsinski, Dimitri A. Viattchenin DOI 10.14313/JAMRIS_2-2014/19 58

Failures location within water-supply systems by means of neural networks Izabela Rojek, Jan Studzinski DOI 10.14313/JAMRIS_2-2014/15 29

Outside the box: an alternative data analytics framework Plamen Angelov DOI 10.14313/JAMRIS_2-2014/16

2

Articles

A simple and efficient implementation of EKF-based SLAM relying on laser scanner in complex indoor environment Thomas Genevois, Teresa Zielinska DOI 10.14313/JAMRIS_2-2014/20


Journal of Automation, Mobile Robotics & Intelligent Systems

A

C

VOLUME 8,

M

D

F

S F

A

2D L M

-M

N∘ 2

2014

S

Submi ed: 2nd July 2013; accepted: 31th July 2013

Andrzej Ruszewski, MikoĹ‚aj BusĹ‚owicz DOI: 10.14313/JAMRIS_2-2014/12 Abstract: Computer aided methods for inves ga on of the asympto c stability of 2D discrete linear systems described by the ďŹ rst Fornasini-Marchesini model are given. The methods require computa on of eigenvalues of complex matrices or values of complex func ons. Eec veness of the stability tests are demonstrated on numerical examples. Keywords: 2D system, linear, discrete, stability, computa onal method

1. Introduc on

đ??ť(đ?‘§ , đ?‘§ ) = đ?‘§ đ?‘§ đ??ź − đ??´ − đ?‘§ đ??´ − đ?‘§ đ??´ ,

đ?‘¤(đ?‘§ , đ?‘§ ) = det đ??ť(đ?‘§ , đ?‘§ ) = det[đ?‘§ đ?‘§ đ??ź − đ??´ − đ?‘§ đ??´ − đ?‘§ đ??´ ]

(4)

of the model (1) is a polynomial in two independent variables � and � , of the form � � � ,

đ?‘Ž

= 1.

(5)

The model (1) is called asymptotically stable (Schur stable) if for đ?‘˘(đ?‘–, đ?‘—) ≥ 0 and bounded boundary conditions (2) the condition đ?‘Ľ(đ?‘–, đ?‘—) → 0 holds for đ?‘–, đ?‘— → ∞. From [1, 11] we have the following theorem. Theorem 1. The model (1) is asymptotically stable if and only if đ?‘¤(đ?‘§ , đ?‘§ ) ≠0,

∀|đ?‘§ | ≼ 1

���

∀|đ?‘§ | ≼ 1.

(6)

The polynomial (5) satisfying the condition (6) is called discrete stable or Schur stable. Several algebraic methods for asymptotic stability checking of such bivariate polynomials were given in [1]. Computational method for investigation of asymptotic stability of the Fornasini-Marchesini model (1) has been given in [2]. This method requires computation of eigenvalue-loci of complex matrices. The main purpose of this paper is to present new computational methods for checking the condition (6) of asymptotic stability of the model (1) which do not require a priori knowledge of the characteristic bivariate polynomial (5).

3. Solu on of the Problem

2. Problem Formula on Consider the state equation of the irst FornasiniMarchesini model of 2D linear system [9, 11, 12] đ?‘Ľ(đ?‘– + 1, đ?‘— + 1) = đ??´ đ?‘Ľ(đ?‘–, đ?‘—) + đ??´ đ?‘Ľ(đ?‘– + 1, đ?‘—) +đ??´ đ?‘Ľ(đ?‘–, đ?‘— + 1) + đ??ľđ?‘˘(đ?‘–, đ?‘—), đ?‘–, đ?‘— ∈ đ?‘? ,

đ?‘–, đ?‘— ∈ đ?‘? .

Theorem 2. The model (1) is asymptotically stable if and only if the following two conditions hold: đ?‘¤(đ?‘’ , đ?‘§ ) ≠0, |đ?‘§ | ≼ 1, ∀đ?‘Ś ∈ [0, 2đ?œ‹], đ?‘— = −1, (7)

(1) �(� , �

where đ?‘Ľ(đ?‘–, đ?‘—) ∈ â„œ , đ?‘˘(đ?‘–, đ?‘—) ∈ â„œ and đ??´ , đ??´ , đ??´ ∈ â„œ Ă— ,đ??ľâˆˆâ„œ Ă— . The boundary conditions for (1) are as follows đ?‘Ľ(0, đ?‘—) = đ?‘Ľ ,

(3)

where � and � are complex variables. The characteristic function

�(� , � ) =

There are several models of 2D discrete linear system [9, 11, 12]. The most popular is the FornasiniMarchesini model introduced in [9]. The problem of asymptotic stability of linear 2D systems has considerable attention since about 40 years. For the stability analysis of these systems various methods can be applied: analytical (similar to the Schur stability test of 1D systems) [1], based on Lyapunov stability theory [21, 22], based on LMI [8, 13, 23, 24], based on spectral radius [10, 17, 20, 25, 26], frequency domain methods [23] or algebraic methods for positive systems [12, 13, 14, 15, 16, 19]. The analytical methods require symbolic computations whereas the methods based on Lyapunov stability theory, LMI or spectral radius give only suf icient but not necessary conditions for stability of standard systems. The main purpose of this paper is to present new frequency domain necessary and suf icient conditions for investigation of asymptotic stability of the irst Fornasini-Marchesini model of 2D standard linear systems. The following notation will be used: đ?‘? - the set of non-negative integers; â„œ Ă— - the set of đ?‘›Ă—đ?‘š real matrices; đ?œ† {đ?‘‹} − đ?‘–-th eigenvalue of đ?‘‹.

đ?‘Ľ(đ?‘–, 0) = đ?‘Ľ ,

The characteristic matrix of the model (1) has the form

(2)

) ≠0,

|� | ≼ 1,

∀đ?œ” ∈ [0, 2đ?œ‹].

(8)

Proof. From [1, 2] it follows that (6) is equivalent to the conditions �(� , � ) ≠0,

|� | = 1,

|� | ≼ 1,

(9) 3


Journal of Automation, Mobile Robotics & Intelligent Systems

�(� , � ) ≠0,

|� | ≼ 1,

|� | = 1.

(10)

It is easy to see that conditions (9) and (10) can be written in the forms (7) and (8), respectively. Lemma 1. If the model (1) is asymptotically stable then |đ?œ† (đ??´ )| < 1, đ?‘– = 1, 2, ..., đ?‘›, (11) and |đ?œ† (đ??´ )| < 1,

đ?‘– = 1, 2, ..., đ?‘›.

(13)

The system (13) is asymptotically stable if and only if the condition (11) holds, i.e. the matrix đ??´ is Schur stable (is a Schur matrix). Similarly, substitution of đ??´ ≥ 0, đ??´ ≥ 0 and đ??ľ ≥ 0 in (1) gives the homogeneous state equation of discrete-time linear system đ?‘Ľ(đ?‘– + 1, đ?‘— + 1) = đ??´ đ?‘Ľ(đ?‘–, đ?‘— + 1),

(14)

which is asymptotically stable if and only if the condition (12) holds, i.e. the matrix đ??´ is Schur stable (is a Schur matrix). Asymptotic stability of the model (1) with any ixed triple of matrices đ??´ , đ??´ and đ??´ means that the condition (6) holds for this triple. In particular, asymptotic stability of the system with đ??´ ≥ 0 and đ??´ ≥ 0 (or đ??´ ≥ 0 and đ??´ ≥ 0) is equivalent to satisfaction of the condition (6) for đ??´ ≥ 0 and đ??´ ≥ 0 (or đ??´ ≥ 0 and đ??´ ≥ 0). Hence, the conditions (11) and (12) are necessary for asymptotic stability of the model (1). To show that the conditions (11) and (12) are not suf icient, we consider the scalar system (1) with đ??´ = 0, đ??´ = 0.5 ((11) and (12) hold) and đ??´ = 1. In this case the characteristic equation has the form đ?‘§ đ?‘§ −0.5đ?‘§ −1 = 0. From this equation we have that if, for example, đ?‘§ = 0 then đ?‘§ = −2 and if đ?‘§ = 0.5 then đ?‘§ = 2.5. This means that there exist such values of roots of the characteristic equation which do not satisfy the condition (6) and the system is unstable. Using the rules for computing the determinant of block matrices, we obtain that the characteristic matrix (3) of the model (1) can be computed from one of the following equivalent formulae đ??ť(đ?‘§ , đ?‘§ ) = [đ?‘§ đ??ź − đ??´ ][đ?‘§ đ??ź − đ?‘† (đ?‘§ )],

(15)

đ??ť(đ?‘§ , đ?‘§ ) = [đ?‘§ đ??ź − đ??´ ][đ?‘§ đ??ź − đ?‘† (đ?‘§ )],

(16)

where đ?‘† (đ?‘§ ) = (đ?‘§ đ??ź − đ??´ )

(đ??´ + đ?‘§ đ??´ ),

(17)

đ?‘† (đ?‘§ ) = (đ?‘§ đ??ź − đ??´ )

(đ??´ + đ?‘§ đ??´ ).

(18)

Using (4) and (15), (16) we can write đ?‘¤(đ?‘§ , đ?‘§ ) = det[đ?‘§ đ??ź − đ??´ ] det[đ?‘§ đ??ź − đ?‘† (đ?‘§ )], (19) 4

2014

đ?‘¤(đ?‘§ , đ?‘§ ) = det[đ?‘§ đ??ź − đ??´ ] det[đ?‘§ đ??ź − đ?‘† (đ?‘§ )]. (20) From (15) for đ?‘§ = đ?‘’

we have

đ??ť(đ?‘’ , đ?‘§ ) = [đ?‘’ đ??ź − đ??´ ][đ?‘§ đ??ź − đ?‘† (đ?‘’ )],

(21)

where

(12)

Proof. From (1) for đ??´ ≥ 0, đ??´ ≥ 0 and đ??ľ ≥ 0 one obtains the homogeneous state equation of the discretetime linear system đ?‘Ľ(đ?‘– + 1, đ?‘— + 1) = đ??´ đ?‘Ľ(đ?‘– + 1, đ?‘—).

N∘ 2

VOLUME 8,

đ?‘† (đ?‘’ ) = (đ?‘’ đ??ź − đ??´ )

(đ??´ + đ?‘’ đ??´ ).

(22)

Lemma 2. Let the necessary condition (12) be satisied. The condition (7) holds if and only if all eigenvalues of the complex matrix (22) have absolute values less than one for all đ?‘Ś ∈ đ?‘Œ = [0, 2đ?œ‹]. Proof. From (21) we have đ?‘¤(đ?‘’ , đ?‘§ ) = det[đ?‘’ đ??ź − đ??´ ] det[đ?‘§ đ??ź − đ?‘† (đ?‘’ )]. (23) If (12) holds then the matrix đ??ź đ?‘’ − đ??´ is nonsingular for all đ?‘Ś ∈ đ?‘Œ. Hence, from (23) it follows that the condition (7) is satis ied if and only if det[đ?‘§ đ??ź − đ?‘† (đ?‘’ )] ≠0,

|� | ≼ 1,

∀đ?‘Ś ∈ đ?‘Œ. (24)

Satisfaction of (24) means that all eigenvalues of the complex matrix (22) have absolute values less than one for all đ?‘Ś ∈ đ?‘Œ. Eigenvalue-loci of đ?‘† (đ?‘’ ) for đ?‘Ś ∈ [0, đ?œ‹] and for đ?‘Ś ∈ [đ?œ‹, 2đ?œ‹] are symmetric respect to the real axis of the complex plane. Therefore, we can equivalently consider in (24) the interval đ?‘Œ = [0, đ?œ‹] instead of the interval đ?‘Œ = [0, 2đ?œ‹]. From (16) for đ?‘§ = đ?‘’ we have đ??ť(đ?‘§ , đ?‘’

) = [đ?‘’

đ??ź − đ??´ ][đ?‘§ đ??ź − đ?‘† (đ?‘’

)]

(25)

and �(� , �

) = det[đ?‘’

đ??ź − đ??´ ] det[đ?‘§ đ??ź − đ?‘† (đ?‘’

)], (26)

where � (�

) = (đ?‘’

đ??ź −đ??´ )

(đ??´ + đ?‘’

đ??´ ).

(27)

Lemma 3. Let the necessary condition (11) be satisied. The condition (8) holds if and only if all eigenvalues of the complex matrix (27) have absolute values less than one for all đ?œ” ∈ Ί = [0, 2đ?œ‹]. Proof. If (11) holds then the matrix đ?‘’ đ??ź − đ??´ is nonsingular for all đ?œ” ∈ Ί. From (26) we have that the condition (8) is satis ied if and only if det[đ?‘§ đ??ź − đ?‘† (đ?‘’

)] ≠0,

|� | ≼ 1,

∀đ?œ” ∈ Ί, (28)

i.e. all eigenvalues of the matrix (27) have absolute values less than one for all đ?œ” ∈ Ί. Similarly as in Lemma 2, we can equivalently consider the interval Ί = [0, đ?œ‹] instead of the interval Ί = [0, 2đ?œ‹]. The conditions of Lemmas 2 and 3 can be written in the following forms |đ?œ† {đ?‘† (đ?‘’ )}| < 1,

∀đ?‘Ś ∈ đ?‘Œ,

đ?‘– = 1, 2, ..., đ?‘›

(29)


Journal of Automation, Mobile Robotics & Intelligent Systems

and

2014

0.44

)}| < 1,

∀đ?œ” ∈ Ί,

đ?‘– = 1, 2, ..., đ?‘›,

(30)

respectively. Theorem 3. The model (1) is asymptotically stable if and only if the conditions (11), (12), (29) and (30) are satis ied. Proof. The proof follows directly from Theorem 2 and Lemmas 1, 2 and 3. Computational methods for checking the conditions (29) and (30) for the Fornasini-Marchesini model (1), based on the eigenvalues-loci of the matrices (22) and (27), are given in [2]. It is easy to see that the conditions (29) and (30) can be written in the forms: đ?œ‚(đ?‘Ś) > 0 for all đ?‘Ś ∈ đ?‘Œ and đ?œ‡(đ?œ”) > 0 for all đ?œ” ∈ Ί, where đ?œ‚(đ?‘Ś) = 1 − max |đ?œ† {đ?‘† (đ?‘’ )}|,

(31)

đ?œ‡(đ?œ”) = 1 − max |đ?œ† {đ?‘† (đ?‘’

(32)

,...,

,...,

)}|.

Hence, from Theorem 3 one obtains the following lemma. Lemma 4. Let the necessary conditions (11), (12) hold. The model (1) is asymptotically stable if and only if đ?œ‚(đ?‘Ś) > 0 for all đ?‘Ś ∈ đ?‘Œ and đ?œ‡(đ?œ”) > 0 for all đ?œ” ∈ Ί or equivalently, the conditions = min đ?œ‚(đ?‘Ś) > 0,

đ?œ‡

∈

= min đ?œ‡(đ?œ”) > 0, (33) ∈Ί

are satis ied. Example 1. Consider the model (1) with the matrices đ??´ =

−0.3 0.4 0

đ??´ =

đ??´ =

0.1 −0.1 0.3

0.1 −0.2 0 0.4 0.1 0.3

0.3 0 −0.3

0.1 0.2 −0.2

−0.4 0 −0.2

,

0 0.3 0.1

,

−0.2 0.1 0.4

.

(34)

0.42 0.4 0.38 0.36 Ρ, ¾

|đ?œ† {đ?‘† (đ?‘’

đ?œ‚

N∘ 2

VOLUME 8,

1

0.34 0.32 0.3

2 0.28 0.26

0

1

- eigenvalues of đ??´ : 0.1166; 0.2343; 0.5491. Hence, the necessary conditions (11) and (12) hold, i.e. the matrices đ??´ and đ??´ are Schur stable. Plots of the functions đ?œ‚(đ?‘Ś) (đ?‘Ś ∈ đ?‘Œ) and đ?œ‡(đ?œ”) (đ?œ” ∈ Ί) are shown in Figure 1. By ‘o’ are denoted the endpoints of the plots. The ranges đ?‘Œ = [0, 2đ?œ‹] and Ί = [0, 2đ?œ‹] were digitized with the steps Δđ?‘Ś = 0.01đ?œ‹ and Δđ?œ” = 0.01đ?œ‹. From Figure 1 and also from the fact that đ?œ‚ = 0.3012 > 0 and đ?œ‡ = 0.2737 > 0 it follows that the conditions of Lemma 4 are satis ied and the model is asymptotically stable. Checking the conditions of Theorem 3 and Lemma 4 require computation of eigenvalues of the matrices (22) and (27). This may be inconvenient with respect

3

y, ω

4

5

6

7

Fig. 1. Plots of the func ons (31) (curve 1) and (32) (curve 2) for đ?‘Ś = đ?œ” ∈ [0, 2đ?œ‹] to computational problems, particularly in the case of ill conditioned matrices. Therefore, we present a new method for investigation of asymptotic stability of the model (1) which require computation only determinants of some matrices, not eigenvalues. Consider the polynomial đ?‘¤ (đ?‘’ , đ?‘§ ) = det(đ?‘§ đ??ź − đ?‘† (đ?‘’ )),

(35)

where the matrix đ?‘† (đ?‘’ ) is de ined by (22). From the classical Mikhailov theorem (see for example [18]) it follows that the condition (24) holds if and only if for any ixed đ?‘Ś ∈ đ?‘Œ plot of đ?‘¤ (đ?‘’ , đ?‘’ ) for đ?œ” ∈ Ί encircles in the positive direction đ?‘› times the origin of the complex plane. Direct application of the Mikhailov theorem to checking the condition (24) is not practically reliable for a large values of đ?‘›. Therefore, we introduce the rational function đ?œ™ (đ?‘’ , đ?‘’

Computing eigenvalues of đ??´ and đ??´ we obtain - eigenvalues of đ??´ : -0.1233; 0.1577; 0.5656.

2

)=

� (� , � ) , � (� )

đ?‘Ś ∈ đ?‘Œ,

(36)

instead of đ?‘¤ (đ?‘’ , đ?‘’ ), where đ?‘¤ (đ?‘§ ) is any Schur stable polynomial of degree đ?‘›. Lemma 5. The condition (29) holds if and only if for all ixed đ?œ” ∈ Ί plot of the function (36) does not encircle or cross the origin of the complex plane. Proof. If the reference polynomial đ?‘¤ (đ?‘§ ) is Schur stable then from the Argument Principle we have Δ arg đ?‘¤ (đ?‘’

) = đ?‘›.

(37)

∈Ί

From (36) it follows that for any ixed đ?‘Ś ∈ đ?‘Œ Δ arg

∈Ί

đ?œ™ (đ?‘’ , đ?‘’

) = Δ arg

∈Ί

−Δ arg

� (� , �

)

� (�

).

∈Ί

(38)

The matrix (22) for any ixed đ?‘Ś ∈ đ?‘Œ is Schur stable if and only if Δ arg đ?‘¤ (đ?‘’ , đ?‘’ ∈[ ,

]

) = Δ arg đ?‘¤ (đ?‘’ ∈[ ,

) = đ?‘›,

]

5


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

which holds if and only if Δ arg ∈Ί đ?œ™ (đ?‘’ , đ?‘’ ) = 0. Taking into account all đ?‘Ś ∈ đ?‘Œ, we obtain that the above holds if and only if for all ixed đ?œ” ∈ Ί plot of (36) as a function of đ?‘Ś ∈ đ?‘Œ does not encircle or cross the origin of the complex plane. The reference polynomial đ?‘¤ (đ?‘§ ) can be chosen in the form đ?‘¤ (1, đ?‘§ ) = det(đ?‘§ đ??ź − đ?‘† (1)),

(39)

where đ?‘† (1) = (đ??ź − đ??´ ) (đ??´ + đ??´ ), which we get from (35) and (22) substituting đ?‘Ś = 0. Schur stability of (39) is necessary for Schur stability of complex polynomial (35) for all đ?‘Ś ∈ đ?‘Œ. If đ?‘¤ (đ?‘§ ) = đ?‘¤ (1, đ?‘§ ), then đ?œ™ (đ?‘’ , đ?‘’

)=

� (� , � ) , � (1, � )

đ?‘Ś ∈ đ?‘Œ.

(40)

Plot of (40) as a function of đ?‘Ś ∈ đ?‘Œ (with any ixed đ?œ” ∈ Ί) is a closed curve. It begins with đ?‘Ś = 0 and ends with đ?‘Ś = 2đ?œ‹ in the point đ?œ™ (1, đ?‘’ ) = 1. Now, we consider the complex polynomial đ?‘¤ (đ?‘§ , đ?‘’

) = det(đ?‘§ đ??ź − đ?‘† (đ?‘’

)),

(41)

)=

� (� , � ) , � (� )

đ?œ” ∈ Ί,

2) plots of the function (44) do not encircle or cross the origin of the complex plane for all ixed đ?‘Ś ∈ đ?‘Œ. Applying computational method given in Theorem 4 we can take into consideration the following remark. Remark. Refer to point 1) of Theorem 4, one should set any ixed đ?œ” ∈ Ί, determined with appropriately small step Δđ?œ”, and draw plots of the function (40) separately digitizing the range đ?‘Œ with a suf iciently small step Δđ?‘Ś. For point 2) of Theorem 4 one should set any ixed đ?‘Ś ∈ đ?‘Œ, determined with appropriately small step Δđ?‘Ś, and draw plots of the function (44) separately digitizing the range Ί with a suf iciently small step Δđ?œ”. Plots should be smooth especially near the origin of the complex plane so that the important parts have not been neglected. Example 2. Using Theorem 4 check asymptotic stability of the model (1) with the matrices (34). In Example 1 it has been shown that the necessary conditions (11) and (12) hold. Computing eigenvalues of the matrices đ?‘† (1) = (đ??ź −đ??´ ) (đ??´ +đ??´ ) and đ?‘† (1) = (đ??ź −đ??´ ) (đ??´ +đ??´ ) we obtain respectively:

đ?‘¤ (đ?‘§ , 1) = det(đ?‘§ đ??ź − đ?‘† (1)),

đ?œ†

= 0.4201 + đ?‘—0.2872,

đ?œ†

= 0.4201 − đ?‘—0.2872, đ?œ† = −0.6204,

đ?œ† đ?œ†

= 0.4762 + đ?‘—0.2152, = 0.4762 − đ?‘—0.2152, đ?œ†

(42)

does not encircle or cross the origin of the complex plane, where � (� , � ) has the form (41) for � = � . The reference polynomial � (� ) can be chosen in the form

2014

1) plots of the function (40) do not encircle or cross the origin of the complex plane for all ixed đ?œ” ∈ Ί,

where the matrix đ?‘† (đ?‘’ ) is de ined by (27). Let đ?‘¤ (đ?‘§ ) be any Schur stable polynomial of degree đ?‘›. Proceeding similarly as in the case of Lemma 5, we obtain the following lemma. Lemma 6. The condition (30) holds if and only if for all ixed đ?‘Ś ∈ đ?‘Œ plot of the function đ?œ™ (đ?‘’ , đ?‘’

N∘ 2

(45)

(46)

= −0.5703.

Moduli of all eigenvalues (45) and (46) are less than one and the reference polynomials (39) and (43) are Schur stable. Plots of (40) and (44) are shown in Figures 2 and 3, respectively. The ranges Ί = [0, 2đ?œ‹] and đ?‘Œ = [0, 2đ?œ‹] for all plots was digitized with the steps Δđ?‘Ś = 0.01đ?œ‹ and Δđ?œ” = 0.01đ?œ‹.

(43) 1.5

where đ?‘† (1) = (đ??ź − đ??´ ) (đ??´ + đ??´ ). Schur stability of (43) is necessary for Schur stability of the complex polynomial (41) for all đ?œ” ∈ Ί. If đ?‘¤ (đ?‘§ ) = đ?‘¤ (đ?‘§ , 1), then

1

đ?œ™ (đ?‘’ , đ?‘’

đ?œ” ∈ Ί.

(44)

Plot of (44) as a function of đ?œ” ∈ Ί with the ixed đ?‘Ś ∈ đ?‘Œ is a closed curve. It begins with đ?œ” = 0 and ends with đ?œ” = 2đ?œ‹ in the point đ?œ™ (đ?‘’ , 1) = 1. From Theorem 3 and Lemmas 5 and 6 we have the following theorem. Theorem 4. Assume that the necessary conditions (11) and (12) are satis ied and the polynomials (39) and (43) are Schur stable. The model (1) is asymptotically stable if and only if the following two conditions hold: 6

Imaginary Axis

0.5

� (� , � ) , )= � (� , 1)

0

−0.5

−1

−1.5

0

0.5

Fig. 2. Plots of (40)

1

1.5 Real Axis

2

2.5

3


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

1.5

Imaginary Axis

1 0.5 0 −0.5 −1 −1.5

0

0.5

1

1.5 2 Real Axis

2.5

3

3.5

Fig. 3. Plots of (44) From Figures 2 and 3 is follows that the plots do not encircle the origin of the complex plane for all đ?‘Ś ∈ đ?‘Œ and đ?œ” ∈ Ί. This means, according to Theorem 4, that the model (1), (34) is Schur stable. Now we consider the 1st order FornasiniMarchesini model described by the equation đ?‘Ľ(đ?‘– + 1, đ?‘— + 1) = đ?‘Ž đ?‘Ľ(đ?‘–, đ?‘—) + đ?‘Ž đ?‘Ľ(đ?‘– + 1, đ?‘—) +đ?‘Ž đ?‘Ľ(đ?‘–, đ?‘— + 1) + đ?‘?đ?‘˘(đ?‘–, đ?‘—),

(47)

where đ?‘Ž , đ?‘Ž , đ?‘Ž and đ?‘? are real coef icients. For the system (47) the necessary conditions (11) and (12) take the forms |đ?‘Ž | < 1,

|đ?‘Ž | < 1.

(48)

The matrix (22) for the system has the form � (� ) =

đ?‘Ž +đ?‘’ đ?‘Ž . đ?‘’ −đ?‘Ž

(49)

It is easy to check that plot of (49) for đ?‘Ś ∈ đ?‘Œ = [0, 2đ?œ‹] is a circle with the center at real axis. This circle crosses real axis in points đ?‘†

= � (� ) =

đ?‘Ž +đ?‘Ž ,đ?‘† 1−đ?‘Ž

= � (� ) =

đ?‘Ž −đ?‘Ž . 1+đ?‘Ž

Hence, the irst condition (33) holds if and only if đ?œ‚

= 1 − max {|đ?‘† | , |đ?‘† |} > 0.

(50)

Similarly, we can show that the second condition (33) holds if and only if đ?œ‡

= 1 − max {|đ?‘† | , |đ?‘† |} > 0,

(51)

where �

2014

4. Concluding Remarks

2

−2

N∘ 2

= � (� ) =

đ?‘Ž +đ?‘Ž ,đ?‘† 1−đ?‘Ž

= � (� ) =

đ?‘Ž −đ?‘Ž . 1+đ?‘Ž

From the above and Theorem 3 we have the following condition. Lemma 7. The 1st order Fornasini-Marchesini model (47) is asymptotically stable if and only if the conditions (48) and (50), (51) are satis ied.

Simple necessary conditions (Lemma 1) and two computational methods for investigation of asymptotic stability of the irst Fornasini-Marchesini model (1) of 2D discrete linear systems have been given. The irst method (Theorem 3, Lemma 4) require computation of eigenvalues of complex matrices (22) and (27). Similar methods have been applied in [7, 23] to asymptotic stability analysis of the Roesser model of 2D systems and in [3] for the Fornasini-Marchesini and the Roesser type models of 2D continuousdiscrete linear systems. The second method (Theorem 4) require computation of values of functions (40) and (44) and therefore is simpler from the computation point of view. Similar methods have been applied in [3], [4], [5] and [6], respectively, to asymptotic stability analysis of 2D continuous-discrete linear systems described by the irst and the second Fornasini-Marchesini type models and the Roesser type model. The proposed methods can be applied to the stability checking of the second Fornasini-Marchesini model described by the state equation (1) with đ??´ ≥ 0. Acknowledgment This work was supported by the Ministry of Science and High Education of Poland under the grant no. S/WE/1/2011.

AUTHORS

Andrzej Ruszewski∗ – Bialystok University of Technology, Faculty of Electrical Engineering, Wiejska 45D, 15-351 BiaĹ‚ystok, Poland, e-mail: andrusz@pb.edu.pl, www: pb.edu.pl. MikoĹ‚aj BusĹ‚owicz – Bialystok University of Technology, Faculty of Electrical Engineering, Wiejska 45D, 15-351 BiaĹ‚ystok, Poland, e-mail: busmiko@pb.edu.pl, www: pb.edu.pl. ∗ Corresponding author

REFERENCES [1] Y. Bistritz, “On an inviable approach for derivation of 2-D stability tests�, IEEE Trans. Circuit Syst. II, vol. 52, no. 11, 2005, pp. 713–718. DOI: http://dx.doi.org/10.1109/TCSII.2005.852929 [2] M. Buslowicz, “Computer methods for stability investigation of the Fornasini-Marchesini model of linear 2D systems�, Measurement Automation and Robotics, no. 2, 2011, pp. 556–565 (in CD-ROM) (in Polish). [3] M. Buslowicz, “Computational methods for investigation of stability of models of 2D continuousdiscrete linear systems�, Journal of Automation, Mobile Robotics & Intelligent Systems, vol. 5, no. 1, 2011, pp. 3–7. [4] M. Buslowicz, “Stability of the second FornasiniMarchesini type model of continuous-discrete linear systems�, Acta Mechanica et Automatica, vol. 5, no. 4, pp. 1–5, 2011. 7


Journal of Automation, Mobile Robotics & Intelligent Systems

[5] M. Buslowicz, A. Ruszewski, “Stability investigation of continuous-discrete linear systems”, Measurement Automation and Robotics, no. 2, 2011, pp. 566–575 (in CD-ROM) (in Polish). [6] M. Buslowicz, A. Ruszewski, “Computer methods for stability analysis of the Roesser type model of 2D continuous-discrete linear systems”, Int. J. Appl. Math. Comput. Sci., vol. 22, no. 2, 2012, pp. 401–408. DOI: http://dx.doi.org/10.2478/v10006-012-0030-9 [7] M. Buslowicz, A.E. Rzepecki, “Computer methods for stability investigation of the Roesser model of 2D linear systems”, Measurement Automation and Robotics, no. 2, 2012, pp. 298–302 (in CD-ROM) (in Polish). [8] Y. Ebihara, Y. Ito, T. Hagiwara, “Exact stability analysis of 2-D systems using LMIs”, IEEE Trans. Automat. Control, vol. 51, no. 9, 2006, pp. 1509–1513. DOI: http://dx.doi.org/10.1109/TAC.2006.880789 [9] E. Fornasini, G. Marchesini, “State-space realization theory of two-dimensional ilters”, IEEE Trans. Automat. Control, vol. AC-21, 1976, pp. 484–492. DOI: http://dx.doi.org/10.1109/TAC.1976.1101305 [10] G.D. Hu, M. Liu, “Simple criteria for stability of two-dimensional linear systems”, IEEE Trans. Signal Processing, vol. 53, 2005, pp. 4720–4723. [11] T. Kaczorek, Two-Dimensional Linear Systems, Springer, Berlin, 1985. DOI: http://dx.doi.org/10.1007/BFb0005617 [12] T. Kaczorek, Positive 1D and 2D Systems, Springer, London, 2002. DOI: http://dx.doi.org/10.1007/978-1-4471-0221-2 [13] T. Kaczorek, “LMI approach to stability of 2D positive systems with delays”, Multidimensional Systems and Signal Processing, vol. 20, 2009, pp. 39–54. [14] T. Kaczorek, “Asymptotic stability of positive fractional 2D linear systems”, Bull. Pol. Acad. Sci., Tech. Sci.,vol. 57, no. 3, 2009, pp. 289–292. DOI: http://dx.doi.org/10.2478/v10175-010-0131-2 [15] T. Kaczorek, “Practical stability of positive fractional 2D linear systems”, Multidimensional Systems and Signal Processing, vol. 21, 2010, pp. 231–238. DOI: http://dx.doi.org/10.1007/s11045-009-0098-z

8

VOLUME 8,

N∘ 2

2014

[16] T. Kaczorek, Selected Problems of Fractional Systems Theory, Springer, Berlin 2011. DOI: http://dx.doi.org/10.1007/978-3-642-20502-6 [17] H. Kar, V. Sigh, “Stability of 2-D systems described by the Fornasini-Marchesini irst model”, IEEE Trans. Signal Processing, vol. 51, 2003, pp. 1675–1676. DOi: http://dx.doi.org/10.1109/TSP.2003.811237 [18] L.H. Keel, S.P. Bhattacharyya, “A generalization of Mikhailov’s criterion with applications”. In: Proc. of the American Control Conference, Chicago, USA, vol. 6, 2000, pp. 4311–4315. DOI: http://dx.doi.org/10.1109/ACC.2000.877035 [19] J. Kurek, “Stability of positive 2D systems described by the Roesser model”, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 49, no. 4, 2002, pp. 531–533. [20] T. Liu, “Stability analysis of linear 2-D systems”, Signal Processing, vol. 88, 2008, pp. 2078–2084. DOI: http://dx.doi.org/10.1016/j.sigpro.2008.02.007 [21] W.-S. Lu, “On a Lyapunov approach to stability analysis of 2-D digital ilters”, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 45, 1994, pp. 665–669. DOI: http://dx.doi.org/10.1109/81.329727 [22] T. Ooba, “On stability analysis of 2-D systems based on 2-D Lyapunov matrix inequalities”, IEEE Trans. Circuit Syst. I, Fundam. Theory Appl., vol. 47, 2000, pp. 1263–1265. [23] W. Paszke, E. Rogers, P. Rapisarda, K. Gałkowski, A. Kummert, “New frequency domain based stability tests for 2D linear systems”, Proc. of 17 Int. Conf. Methods and Models in Automation and Robotics, 2012 (CD-ROM). DOI: http://dx.doi.org/10.1109/MMAR.2012.6347922 [24] M. Twardy, “An LMI approach to checking stability of 2D positive systems”, Bull. Pol. Acad. Sci., Tech. Sci., vol. 55, no. 4, 2007, pp. 385–395. [25] X. Xiao, R. Unbehauen, “New stability test algorithm for two-dimensional digital ilters”, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 45, no. 7, 1998, pp. 739–741. [26] S.-F. Yang, C. Hwang, “s”, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 47, no. 7, 2000, pp. 1120–1123.


Journal of Automation, Mobile Robotics & Intelligent Systems

A A

M C

VOLUME 8,

M C

F S

F

N∘ 2

2014

R

Submi ed: 23rd April 2013; accepted: 26th July 2013

Stanislaw Gardecki, Wojciech Giernacki, Jaroslaw Goslinski, Andrzej Kasinski DOI: 10.14313/JAMRIS_2-2014/13 Abstract: In this paper a model of the dynamics of four-rotor flying robot is described in details. Control design must be preceded by the modeling and subsequent analysis of the robot behavior in simulator. It is therefore necessary to develop the mathema cal model as accurate as it is possible. The paper contains a detailed deriva on of the mathema cal model in the context of physics laws affecting the quadrocopter. The novelty of presented nota on is an exten on of Coriolis forces in linear accelera on and the gyroscopic effect on angular accelera on. In the valida on phase, the mathema cal model was verified with the use of proposed control algorithms. Simula on studies have demonstrated the adequacy of a MATLAB model to properly reflect the real quadrocopter dynamics. This would allow for its use in the simulator and a erwards to implement and verify of control laws on the real fourrotor flying robot. Keywords: unmanned aerial vehicle (UAV), quadrocopter, modeling, mathema cal model

1. Introduc on Understanding and control at the acceptable level of physical processes is possible due to proper modeling. By bringing together all the essential process features against the background of non-essential ones and the use of mathematics as well as computer, it is possible to build simulators to evaluate the more complex system properties such as dynamics. In the modeling phase of engineering work, in addition to the added value which is a new model and a possibility of acquiring a new knowledge through experimenting with it, modeling allows also for large savings on experiments costs. Studies on the model are repeatable and time-saving. They allow also for the introduction of certain extra conditions to be simulated. Finally, the results of simulation studies, which are normally available before the experiment, may be more easily established, described, and archived. To develop ef icient and rapid control algorithms for unmanned lying vehicles having complex structure, good quality mathematical models, which reliably relect the dynamics of these vehicles, are needed. In this article authors focused their attention on selected, four-rotor lying robot - called quadrocopter, which for the correct operation requires an appropriate control strategy. Naturally, proper control may be developed by experimental tests - on the real object, or by the use of simulator - which is a dedicated com-

puter program. The irst approach (physical modeling) is expensive and it is associated with tests carried out directly on the system hardware. The second way to its use requires an availability of the mathematical model. There are already many dynamics models focused on the four-rotor lying robot. They differ in the complexity and the level of the adopted simpliications. It is important that quadrocopter is a nonlinear dynamical object with multiple inputs and outputs. It is inherently unstable and its parameters are non-stationary in time. Of course, it would be beneicial to know all the necessary physical parameters of the quadrocopter’s elements a priori [2], but without a wind tunnel experiment - it is impossible to directly evaluate quadrocopter’s geometry, dynamics of its propellers, or other relevant aerodynamic parameters. The particular literature studies [4] let to draw the following conclusion: ”To overcome the in lexibility of the complex models, the National Aeronautics and Space Administration (NASA) has developed a so-called Minimum-Complexity Helicopter Simulation Math Model (MCHSMM) [3], which is a math model depending only on the basic data sources with the intention of low cost real-time simulation possibilities (...). One of the additional bene its from a MCHSMM, is the potential for a more clear understanding of the helicopter and its dynamics in general.” On the basis of MCHSMM, in further works, e.g. [4] there was a strive to develop methods for robust and optimal control of a robot based on this model. However, in the opinion of authors of this paper, too far-reaching simpli ications narrow the spectrum of control methods used. While the robust control in its idea compensates modeling errors and guarantees stability [11], such a control is not optimal, which is not without the signi icance from the perspective of energy expenditure (as robot batteries allow up to tens of minutes of light). On the other hand most often optimal control is associated with considerable amount of computing [8], which makes it dif icult to perform the relevant realtime calculations in the case of robot’s on-board control unit. Thus it would be preferable to use the adaptive control methods [8], among which ef icient methods of low computational complexity are available. Therefore, authors postulated a compromise, i.e. to create a mathematical model of the certain simpli ied robot structure (solid body), but taking into account all relevant physical phenomena, which the robot is subjected to, and which are often not included in other commonly used models, such as introduction of relationship describing the linear acceleration due to Cori-

9


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N∘ 2

2014

olis and angular acceleration due to the gyroscopic effect. In this way, this article is a supplement to previous approaches [10], [6], contributing to more faithfully re lect the dynamics of the robot.

2. Quadrocopter – Physical Object Quadrocopters differ in technical details, but have one common feature: the design of a system is based on four rotors powered by DC motors and supplied by battery (Fig. 1). The layout of the quadrocopter is discussed in details in [5]. The use of such a structure reduces the instrumentation costs by eliminating a variable rotor blades compared to the classic solution used in helicopters. In this way an increased stability and a relatively large capacity of the lying robot can be achieved. The robot’s avionics consists of the following on-board sensors: inertial measurement unit (IMU) - used for the stabilization of the robot’s position and orientation, an altimeter for the height stabilization, a thermometer and a power consumption meter [5]. The IMU contains three gyroscopes and triaxial accelerometer. This sensor communicates with on board processor via the SPI bus. Gyroscopes are used to determine the orientation of the robot in space (relative to the base). The accelerometers are used to reset the gyro drift. In quadrocopters, currently powertrain consisting of a speed controller, brushless motor and propeller, are mostly used. Their task is to generate a lift force and appropriate power steering. These are essential features from the perspective of robot motion control. The propeller used in each unit is responsible for the generation of the lift force and it is rotating in one direction, which is due to a constant pitch of the blade. For this reason, the controller is used only to modulate its speed, and it is not responsible for the direction of the propeller rotation.

Fig. 2. Quadrocopter’s geometric model

Fig. 3. Reference systems đ??¸đ??š, đ??ľđ??š and đ?‘†đ??š are shown in Figure 3. System đ?‘†đ??š is a virtual system whose aim is to adjust the orientation of the global đ??¸đ??š system to the local đ??ľđ??š system. In the notation of the mathematical model all physical quantities except the Euler angles are expressed in the local coordinate system đ??ľđ??š. Euler angles are the angles between the đ??ľđ??š and đ?‘†đ??š (or đ??¸đ??š) frame.

4. Mathema cal Model, Equa ons 4.1. Euler angles Euler angles de ine the orientation of the robot’s frame (đ??ľđ??š system) relative to the global system - the Earth đ??¸đ??š [9]. During the following calculations we will apply the standard rotation matrices (transformations): matrix of rotation in the axis đ?‘‹ about angle đ?œ™:

Fig. 1. Quadrocopter Hornet

đ??ś (đ?œ™) =

1 0 0 đ?‘?đ?‘œđ?‘ đ?œ™ 0 −đ?‘ đ?‘–đ?‘›đ?œ™

0 đ?‘ đ?‘–đ?‘›đ?œ™ đ?‘?đ?‘œđ?‘ đ?œ™

,

(1)

matrix of rotation in the axis đ?‘Œ about angle đ?œƒ:

3. Mathema cal Model – Reference Frames During mathematical modeling (for the general quadrocopter’s geometric model form Figure 4), the physical parameters are described in three different reference systems (or with respect to these systems), i.e. the Earth system (đ??¸đ??š), the robot system (local- đ??ľđ??š) and auxiliary system (đ?‘†đ??š). The relationship between 10

đ??ś (đ?œƒ) =

đ?‘?đ?‘œđ?‘ đ?œƒ 0 đ?‘ đ?‘–đ?‘›đ?œƒ

0 −đ?‘ đ?‘–đ?‘›đ?œƒ 1 0 0 đ?‘?đ?‘œđ?‘ đ?œƒ

and matrix of rotation in the axis đ?‘? about angle đ?œ“:

(2)


Journal of Automation, Mobile Robotics & Intelligent Systems

đ??ś (đ?œ“) =

đ?‘?đ?‘œđ?‘ đ?œ“ −đ?‘ đ?‘–đ?‘›đ?œ“ 0

đ?‘ đ?‘–đ?‘›đ?œ“ đ?‘?đ?‘œđ?‘ đ?œ“ 0

0 0 1

.

VOLUME 8,

(3)

While determining the transformation matrix, that allows to translate the system đ?‘†đ??š to đ??ľđ??š, authors follow the principle of rotation sequence â€?3-2-1â€? [7]. This sequence assumes that during rotations the following steps are performed: - Rotation Yaw; about angle đ?œ“ in the axis đ?‘? of the local coordinate system (đ??ľđ??š) - Rotation Pitch; about angle đ?œƒ in the axis đ?‘Œ of the new local coordinate system (đ??ľđ??š ) - Rotation Roll; about angle đ?œ™ in the axis đ?‘‹ of the new local coordinate system (đ??ľđ??šâ€?) Under this assumption and taking into account the matrices: (1), (2), (3), the total transformation matrix form the đ?‘†đ??š to the đ??ľđ??š system may be written: đ?‘… (Θ) = đ??ś (đ?œ™) â‹… đ??ś (đ?œƒ) â‹… đ??ś (đ?œ“) = đ?‘?đ?œƒđ?‘?đ?œ“ đ?‘ đ?œ™đ?‘ đ?œƒđ?‘?đ?œ“ − đ?‘?đ?œ™đ?‘ đ?œ“ đ?‘?đ?œ™đ?‘ đ?œƒđ?‘?đ?œ“ + đ?‘ đ?œ™đ?‘ đ?œ“

đ?‘?đ?œƒđ?‘ đ?œ“ đ?‘ đ?œ™đ?‘ đ?œƒđ?‘ đ?œ“ + đ?‘?đ?œ™đ?‘?đ?œ“ đ?‘?đ?œ™đ?‘ đ?œƒđ?‘ đ?œ“ − đ?‘ đ?œ™đ?‘?đ?œ“

−đ?‘ đ?œƒ đ?‘ đ?œ™đ?‘?đ?œƒ đ?‘?đ?œ™đ?‘?đ?œƒ

(4) , (5)

where đ?‘ and đ?‘? stands for đ?‘ đ?‘–đ?‘› and đ?‘?đ?‘œđ?‘ respectively. In the next step, knowledge about the transformation from đ??ľđ??š to the đ?‘†đ??š (đ?‘… (Θ)) is required, therefore the inverse of the matrix đ?‘… (Θ) must be calculated. However, this is an orthonormal matrix, so its inverse is equal to the transpose of this matrix: đ?‘… (Θ) = (đ?‘… (Θ))

= (đ?‘… (Θ))

(6)

đ?‘… (Θ) = đ?‘?đ?œƒđ?‘?đ?œ“ đ?‘?đ?œƒđ?‘ đ?œ“ −đ?‘ đ?œƒ

đ?‘ đ?œ™đ?‘ đ?œƒđ?‘?đ?œ“ − đ?‘?đ?œ™đ?‘ đ?œ“ đ?‘ đ?œ™đ?‘ đ?œƒđ?‘ đ?œ“ + đ?‘?đ?œ™đ?‘?đ?œ“ đ?‘ đ?œ™đ?‘?đ?œƒ

đ?‘?đ?œ™đ?‘ đ?œƒđ?‘?đ?œ“ + đ?‘ đ?œ™đ?‘ đ?œ“ đ?‘?đ?œ™đ?‘ đ?œƒđ?‘ đ?œ“ − đ?‘ đ?œ™đ?‘?đ?œ“ đ?‘?đ?œ™đ?‘?đ?œƒ

(7) (8)

In the notation above as well as in (5) vector Θ consists of Euler angles: đ?œ™ Θ= đ?œƒ (9) đ?œ“ In the further part of this work, the vector of Euler angles will be used for the purpose of de ining the transformation matrices. 4.2. Angular veloci es In the irst point of the description of the mathematical model (Section 4.1), the model assumption and a description of Euler’s notation (Chapter 3) have been presented. Euler angles also constitutes the base for the notation of angular velocities of the quadrorȯ are projections of the tor. The Euler rates (vector Θ) robot’s angular velocities (in the local coordinate system đ??ľđ??š) onto the axes of the đ?‘†đ??š coordinate system. This is an alternative notation in relation to the notation in the system đ??ľđ??š (vector Ί). This calculation is necessary, because measurements are performed by

2014

gyroscopes, which refer to the robot’s frame. Thus, in order to calculate the frequency of Euler’s speed (Euler rates) projection of the system đ??ľđ??š (Ί) on axes of the đ?‘†đ??š coordinate system must be made. Algorithm for the Euler rates calculation has been de ined by [9]. The irst step is the theoretical calculation of the angular velocity in the local coordinate system: Ί= đ?œ™Ě‡ 0 0

+ đ??ś (đ?œ™)

0 đ?œƒĚ‡ 0

đ?œ” đ?œ” đ?œ”

=

(10)

+ đ??ś (đ?œ™)đ??ś (đ?œƒ)

0 0 đ?œ“̇

,

(11)

which may be written in a matrix form: Ί = đ?‘ƒ â‹… Θ,̇

(12)

where: 1 0 0 đ?‘?đ?‘œđ?‘ đ?œ™ 0 −đ?‘ đ?‘–đ?‘›đ?œ™

đ?‘ƒ =

−đ?‘ đ?‘–đ?‘›đ?œƒ đ?‘ đ?‘–đ?‘›đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒ đ?‘?đ?‘œđ?‘ đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒ

(13)

The next step is to calculate the inverse of the matrix đ?‘ƒ : đ?‘ƒ = (đ?‘ƒ )

=

1 0 0

đ?‘ đ?‘–đ?‘›đ?œ™đ?‘Ąđ?‘”đ?œƒ đ?‘?đ?‘œđ?‘ đ?œ™

đ?‘?đ?‘œđ?‘ đ?œ™đ?‘Ąđ?‘”đ?œƒ −đ?‘ đ?‘–đ?‘›đ?œ™

(14)

)

Finally, having already đ?‘ƒ matrix, which depends on the angular speed of the local coordinate system, Euler rates may be de ined: Î˜Ě‡ = đ?‘ƒ â‹… Ί

Hence:

N∘ 2

(15)

4.3. Angular accelera on One of the simpli ications is the assumption of robot’s structure perfect stiffness. On the basis of this simpli ication, the description of angular acceleration of the robot may be determined in the same way as for the rigid body in space. Mathematical description for this effect is possible thanks to Euler equation of a rigid body motion [1]. The general form of the Euler equation may be de ined as: đ?‘‘đ??ż + Ί Ă— đ??ż = đ?œ?, đ?‘‘đ?‘Ą

(16)

where: đ??ż - angular momentum vector đ?œ? - vector of input torques Ί - vector of angular velocity For a rigid body is also true that: đ??ż = đ??ź â‹… Ί,

(17)

where đ??ź is the matrix of inertia moments of an object [7]: đ??ź 0 0 0 đ??ź 0 đ??ź= (18) 0 0 đ??ź 11


Journal of Automation, Mobile Robotics & Intelligent Systems

Substituting (17) to (16) (assuming the constancy of inertia moment) following equation may be obtained: đ?œ? = đ??ź â‹… ΊĚ‡ + Ί Ă— (đ??ź â‹… Ί)

(19)

=Ί Ă—đ??ż ,

(21)

where: Ί is the angular velocity vector of the robot in the local coordinate system đ??ľđ??š. Angular momentum vector is formed by multiplying the moment of inertia of the rotor (đ??ź ) by the rotor’s angular velocity about the axis where the rotation occurs (Ί ): đ??ż =đ??ź â‹…Ί =

đ??ź â‹…đ?œ” đ??ź â‹…đ?œ” đ??ź â‹…đ?œ”

=

0 0 đ??ź â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” )

(22)

(23)

It must be noted that the rotor angular velocity vector (Ί ) contains zeros for the components in the axes of đ?‘‹, đ?‘Œ and the sum of the rotor angular velocity đ?œ” (where đ?‘– varies in the range of < 1, 4 >) in đ?‘?-axis. This is because rotors axes are parallel to the đ?‘? axis in the đ??ľđ??š system, which is valid in this description of gyroscopic effect. Knowing the đ??ż expanded: đ?œ?

=

đ?œ” đ?œ” đ?œ”

and Ί , equation (20) may be

Ă—

0 0 đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” )

đ?œ” đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” ) −đ?œ” đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” ) 0

(25)

With the inal form of gyroscopic torques vector, Euler’s equation for the object (16) may be upgraded: (26)

In this equation the resulting quantity is the angular acceleration, so after transformations it may be written: ), (27) ΊĚ‡ = đ??ź (đ?œ? − Ί Ă— (đ??ź â‹… Ί) − đ?œ? 12

(28)

−đ?œ” đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” )) −đ?œ” đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” )) 0

(29)

Similar to the case of angular accelerations, linear accelerations are also calculated in the local coordinate system BF. This is because all forces and torques operate in the local coordinate system - directly on the robot. The general form of the equation for linear acceleration may be written in the form of the second law of dynamics: đ?‘šđ?‘Ž = đ??š − đ??š − đ??š

−đ??š

,

(30)

where: đ?‘š - total mass of the robot, đ?‘Ž - accelerations vector of the robot, đ??š - active force vector in the local coordinate system, đ??š - gravitational forces vector in local coordinate system, đ??š - Coriolis forces vector due to rotations of the entire robot, đ??š - Coriolis forces vector due to rotors rotations of the robot, As may be seen above, this equation consists of a constraint, which is the active force decreased by the gravitational force and the Coriolis force. The irst of Coriolis forces results from the rotation of the entire robot: đ??š

= 2đ?‘š â‹… Ί Ă— đ?‘Ł = 2đ?‘š â‹…

đ?‘Ł đ?œ” −đ?‘Ł đ?œ” đ?‘Ł đ?œ” −đ?‘Ł đ?œ” đ?‘Ł đ?œ” −đ?‘Ł đ?œ”

,

(31)

while the second Coriolis force is due to the rotation of rotors: (32) đ??š = 2đ?‘š â‹… Ί Ă— đ?‘Ł =

= (24)

đ?œ? = đ??ź â‹… ΊĚ‡ + Ί Ă— (đ??ź â‹… Ί) + đ?œ?

+

+

4.4. Linear accelera on

The effect of gyroscopic precession motion manifests itself on the object, which is equipped with a rotating mass. It is therefore true that: Ί = Ί,

(đ?œ? − đ?œ” đ?œ” (đ??ź − đ??ź )) (đ?œ? − đ?œ” đ?œ” (đ??ź − đ??ź )) (đ?œ? − đ?œ” đ?œ” (đ??ź − đ??ź ))

ΊĚ‡ =

(20)

where: đ??ż – vector of angular momentum of rotors – vector of input torques đ?œ? Ί – precession angular velocity vector

2014

which, after total multiplication of all components, leads to the form:

This equation, however, due to the gyroscopic effect caused by the motors and rotors, must be modi ied. Gyroscopic effect is described by the basic equation of the gyroscope: đ?œ?

N∘ 2

VOLUME 8,

2đ?‘š â‹…

−đ?‘Ł â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” ) đ?‘Ł â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” ) 0

,

(33)

where mass of the entire object is equal to đ?‘š, while the mass of propellers and rotors is equal to đ?‘š . In equation (30) the force of gravity should also be explained - it is a projection of force from the system đ?‘†đ??š to the system đ??ľđ??š:

đ??š = đ?‘… (Θ) â‹…

0 0 đ?‘”

=

−đ?‘ đ?‘–đ?‘›đ?œƒđ?‘šđ?‘” đ?‘ đ?‘–đ?‘›đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒđ?‘šđ?‘” đ?‘?đ?‘œđ?‘ đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒđ?‘šđ?‘”

(34)


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

Having already all components, the equation (30) may be expanded to its extended inal form: đ?‘Ž=

1 đ?‘š

−2

−2

đ?‘š đ?‘š

đ?‘“ đ?‘“ đ?‘“

−

−đ?‘”đ?‘ đ?‘–đ?‘›đ?œƒ đ?‘”đ?‘ đ?‘–đ?‘›đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒ đ?‘”đ?‘?đ?‘œđ?‘ đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒ

đ?‘Ł đ?œ” −đ?‘Ł đ?œ” đ?‘Ł đ?œ” −đ?‘Ł đ?œ” đ?‘Ł đ?œ” −đ?‘Ł đ?œ”

+

(35)

+

(36)

−đ?‘Ł â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” ) đ?‘Ł â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” ) 0

6. The Complete Model Having the successive elements of model and forces, one may write a complete model of a lying object. The irst part refers to a linear acceleration, which is de ined in the đ??ľđ??š local coordinate system:

đ?‘Ž=

5. Forces and Forcing Torques

đ??š = đ?‘?(đ?‘¤ + đ?‘¤ + đ?‘¤ + đ?‘¤ )

(38)

Because this force acts only in the đ?‘? axis direction

2014

where đ?‘™ is equal to arm force (length of the quadrocopter arm), while đ?‘‘ refers to the reaction torque gain coef icient in the đ?‘? axis direction.

(37)

In the mathematical transformations derived above, two forcing vectors were used, namely the forces vector đ??š and the torques vector đ?œ?. However, these physical quantities were not yet speci ied. To do so, the lift force of the quadrocopter must be determined. This force is equal to the product of the sum of squared rotors angular velocities and the gain factor đ?‘?.

N∘ 2

đ?‘”đ?‘ đ?‘–đ?‘›đ?œƒ −đ?‘”đ?‘ đ?‘–đ?‘›đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒ −đ?‘”đ?‘?đ?‘œđ?‘ đ?œ™đ?‘?đ?‘œđ?‘ đ?œƒ

+

+

đ?‘Ž đ?‘Ž đ?‘Ž

=

−2(đ?‘Ł đ?œ” − đ?‘Ł đ?œ” ) −2(đ?‘Ł đ?œ” − đ?‘Ł đ?œ” ) −2(đ?‘Ł đ?œ” − đ?‘Ł đ?œ” )

(đ?‘Ł â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” )) (đ?‘Ł â‹… (đ?œ” + đ?œ” + đ?œ” + đ?œ” )) 0

2 −2

+

0 0 đ?‘?(đ?œ” + đ?œ” + đ?œ” + đ?œ” )

(42)

+ (43)

+

(44)

(45)

The second part are Euler rates, which will be used in order to determine changes of robot’s đ??ľđ??š local system orientation ralative to the auxiliary system đ?‘†đ??š: Î˜Ě‡ =

đ?œ™Ě‡ đ?œƒĚ‡ đ?œ“̇

=

đ?œ” + đ?œ” đ?‘ đ?‘–đ?‘›đ?œ™đ?‘Ąđ?‘”đ?œƒ + đ?œ” đ?‘?đ?‘œđ?‘ đ?œ™đ?‘Ąđ?‘”đ?œƒ đ?œ” đ?‘?đ?‘œđ?‘ đ?œ™ − đ?œ” đ?‘ đ?‘–đ?‘›đ?œ™ +đ?œ” đ?œ”

(46)

(47)

The last part refers directly to angular accelerations: ΊĚ‡ =

Fig. 4. Coordinate systems on the robot (Figure 4) of the đ??ľđ??š local system, therefore it may be written as: đ??š=

đ?‘“ đ?‘“ đ?‘“

=

0 0 đ?‘?(đ?œ” + đ?œ” + đ?œ” + đ?œ” )

(39)

Another issue is the generation of moments of forces, which arise only in the case of unbalanced rotation speeds of rotors. That is when the: (� + � + � + � ) ≠0

(40)

The equation of generated moments may be written as: đ?œ?=

đ?œ? đ?œ? đ?œ?

=

đ?‘™ â‹… đ?‘?(đ?œ” − đ?œ” ) đ?‘™ â‹… đ?‘?(đ?œ” − đ?œ” ) đ?‘‘(đ?œ” + đ?œ” − đ?œ” − đ?œ” )

,

(41)

đ?œ”̇ đ?œ”̇ đ?œ”̇

=

(48)

đ?‘™ â‹… đ?‘?(đ?œ” − đ?œ” ) đ?‘™ â‹… đ?‘?(đ?œ” − đ?œ” ) đ?‘‘(đ?œ” + đ?œ” − đ?œ” − đ?œ” )

(49)

− (đ?œ” đ?œ” (đ??ź − đ??ź )) − (đ?œ” đ?œ” (đ??ź − đ??ź )) − (đ?œ” đ?œ” (đ??ź − đ??ź ))

(50)

− (đ?œ” đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” )) − (đ?œ” đ??ź (đ?œ” + đ?œ” + đ?œ” + đ?œ” )) 0

(51)

In this model, both the vector đ?‘Ž as well as the vector ΊĚ‡ describe the relationships which exist in the local coordinate system đ??ľđ??š. Euler rates vector Î˜Ě‡ describes the relationship between systems: đ??ľđ??š and đ?‘†đ??š. 13


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N∘ 2

2014

7. The Results of Simula on Tests The mathematical model of the quadrocopter, described above, was implemented in the MATLAB environment for testing and to assess the impact of the dynamics on the behavior of real four-rotor lying robot. Simulation tests allowed to verify the light trajectory of the robot at various control signal levels. The most important advantage in this case was the ability to analyze the object stability to the different external disturbances, which in the work with a real robot equipped with several sensors to record light, in general might easily expose him to crash. Implementation and simulation tests were preceded by the identi ication of physical quantities characterizing parameters of the robot from the Figure 2 in the general model (42) - (51). According to the notation (42) - (51), transparency of the model is guaranteed by the knowledge of: a mass of the propeller-rotor (đ?‘š ), total mass of the robot (đ?‘š), moment of inertia in X, Y, Z axes (respectively đ??ź , đ??ź , đ??ź , đ??ź ), as well as rotor coef icients đ?‘? and đ?‘‘ and the arm length đ?‘™. For the purpose of further simulations, appropriate measurements were done and numerical values of coef icients were obtained: - total mass of the robot: đ?‘š = 2.3 [đ?‘˜đ?‘”]

Fig. 5. Euler angle đ?œ™ during the simulated ight

- mass of the rotor: đ?‘š = 0.01 [đ?‘˜đ?‘”] - arm length of the robot: đ?‘™ = 0.2825 [đ?‘š]

Fig. 6. Euler angle đ?œƒ during the simulated ight

- moment of inertia in the X axis: đ??ź = 0.0250 [đ?‘˜đ?‘”â‹…đ?‘š ] - moment of inertia in the Y axis: đ??ź = 0.0230 [đ?‘˜đ?‘”â‹…đ?‘š ] - moment of inertia in the Z axis: đ??ź = 0.0475 [đ?‘˜đ?‘”â‹…đ?‘š ] - moment of inertia of the rotor: đ??ź = 0.000065 [đ?‘˜đ?‘” â‹… đ?‘š ] - coef icient: đ?‘? = 0.01149 - coef icient: đ?‘‘ = 0.001 For the quadrocopter mathematical model with parameters given above, a number of simulations was undertaken. During analysis and results veri ication, particular attention was paid to Euler angles. They provide the most important information about the behavior of the orientation while maneuvering in the air. This information is key and essential knowledge for the development of an ef icient algorithm of quadrocopter control. For tests purposes of implemented mathematical robot’s model, authors have proposed the use of đ?‘ƒđ??źđ??ˇ controllers due to clear, intuitive operation and simplicity of parameters tuning (with heuristic method). Each controller has been dedicated to one of the Euler angles. For proper settings of the controllers parameters (selected by use of Swarm algorithm), graphs shown in Figures 5, 6, 7, 8 and 9, have been generated. In Figures 5, 6 and 7 Euler angles change over time is presented for the given, external disturbance. The controller response is directly visible on courses shown in Figure 8 and 9 - it is noted that every change of angular velocity begins and ends at the speed of 21 radians/s. This value corresponds to the generated thrust, which provides the compensation of gravity forces by the drivetrain of a quadrocopter. Figures: 8 and 9 purposely grouped rotors angular velocities 14

Fig. 7. Euler angle đ?œ“ during the simulated ight

Fig. 8. Angular veloci es of the robot’s rotors (pair of rotors no.2 and no.4) associated with the Euler angles

pairs as: 1-3 and 2-4. It should be emphasized that this grouping, results from quadrocopter’s architecture - each pair of rotors is associated precisely with one axis in the system coordinates of the robot (đ??ľđ??š,


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N∘ 2

Fig. 9. Angular veloci es of the robot’s rotors (pair of rotors no.1 and no.3) associated with the Euler angles

Fig. 12. Euler angle đ?œ“ with and without (WGE) the gyroscopic eect

local coordinate system). The next part of the experiment was to ind the differences that may result from an incomplete physical model. For this purpose, authors removed the gyroscopic effect from the equation (28). This effect refers only to the axis đ?‘‹ (angle đ?œ™) and is due to the rotation of the entire robot:

8. Summary

đ?œ? = −đ?œ” đ?œ” (đ??ź − đ??ź )

(52)

The results are shown in Figures: 10, 11 and 12.

Fig. 10. Euler angle đ?œ™ with and without (WGE) the gyroscopic eect

Fig. 11. Euler angle đ?œƒ with and without (WGE) the gyroscopic eect It may be noted that the gyroscopic effect is a result of unbalanced of inertia movements about for axes đ?‘? and đ?‘Œ (52). As it turns out, this is a very important effect, which in dynamic situations may strongly in luence the orientation of the robot. The omission of this effect is therefore unacceptable.

2014

The ways how to assess the ef iciency and validation of the mathematical model in science and engineering are multiple, but physical forces impose somehow the use of certain measures of assessment. In the case of such a complex object as four-rotor lying robot, a reasonable way to assess the adequacy of its mathematical model, is to compare responses of both: real object and model - for the forces set in deterministic way. It should be remembered that certain factors can not be measured and predicted in a real system as they have a random character. Also, the same measurement method may be unreliable, having consequences on the dynamics of the lying robot under control. Data acquisition system with sensors or onboard vision system which might be used for experimentation should have to be quick and precise. Such instrumentation increases signi icantly the costs of experiment in the modeling phase. Besides, not all behaviors of quadrocopter can be checked on the real object - as the risk of robot damaging is high (for example a rapid set value in control signal). Therefore, in searching for a compromise, it was decided to use the mathemathical analysis of Euler angles while considering the orientation of robot model in the context of the gravity force. The obtained results con irm that the mathematical model adequately describes the behavior of a simulated robot, and allows for its implementation in the simulator (computer program) relecting the physical aspects of the environment. The obtained simulation tool enables a better understanding of robot navigation aspects, which can not be directly tested on a the real robot. Tests in the simulator will be the next step undertaken by the authors in order to develop and propose further ef icient control algorithms of quadrocopter subjected to the external disturbances (wind, turbulence, air temperature variation).

AUTHORS Stanislaw Gardecki – Poznan University of Technology, Institute of Control and Information Engineering, ul. Piotrowo 3A, 60-965 Poznan, Poland, e-mail: stanislaw.gardecki@put.poznan.pl. Wojciech Giernacki – Poznan University of Technol15


Journal of Automation, Mobile Robotics & Intelligent Systems

ogy, Institute of Control and Information Engineering, ul. Piotrowo 3A, 60-965 Poznan, Poland, e-mail: wojciech.giernacki@put.poznan.pl. Jaroslaw Goslinski∗ – Poznan University of Technology, Institute of Control and Information Engineering, ul. Piotrowo 3A, 60-965 Poznan, Poland, e-mail: jaroslaw.a.goslinski@doctorate.put.poznan.pl. Andrzej Kasinski – Poznan University of Technology, Institute of Control and Information Engineering, ul. Piotrowo 3A, 60-965 Poznan, Poland, e-mail: Andrzej.Kasinski@put.poznan.pl. ∗

Corresponding author

REFERENCES [1] Wie B., Space Vehicle Dynamics and Control, The American Institute of Aeronautics and Astronautics - Educational Series, 1998. DOI: http://dx.doi.org/10.2514/4.860119 [2] Ohanian O.J., Ducted Fan Aerodynamics and Modeling, with Applications of Steady and Synthetic Jet Flow Control, Virginia Polytechnic Institute, 2011. [3] Hef ley R.K., Mnich M.A., Minimum-Complexity Helicopter Simulation Math Model, National Aeronautics and Space Administration, 2003. [4] Hald U.B., Autonomous Helicopter - Modelling and Control, Aalborg University, 2005.

16

VOLUME 8,

N∘ 2

2014

[5] Gardecki S., Kasinski A., ”Testing and selection of electrical actuators for multi-rotor lying robot”, PAK, 2012. [6] Erginer B., Altug, E., ”Modeling and PD Control of a Quadrotor VTOL Veh”. In: Proceedings of the 2007 IEEE Intelligent Vehicles Symposium, 2007. [7] Craig J.J., Introduction to Robotics Mechanics and Control, Addison-Wesley, 1989. [8] Banka St., Multivariable Control Systems: A Polynomial Approach, ZUT University Publishing, 2007. [9] Bak T., Modeling of Mechanical Systems, Lecture note – Aalborg University, 2002. [10] Azzam A., Wang X., ”Quad Rotor Arial Robot Dynamic Modeling and Con iguration Stabilization”. In: 2nd International Asia Conference on Informatics in Control, Automation and Robotics, 2010.DOI: http://dx.doi.org/10.1109/CAR.2010.5456804 [11] Albertos P., Sala A., Multivariable Control Systems: An Engineering Approach, Springer-Verlag London, 2004. DOI: http://dx.doi.org/10.1016/ j.automatica.2005.04.003


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Mathematical Modeling and Computer Aided Planing of Communal Sewage Networks Submitted: 20th March; accepted 30th August 2013

Lucyna Bogdan, Grażyna Petriczek, Jan Studziński

DOI: 10.14313/JAMRIS_2-2014/14 Abstract: In the paper the basic questions connected with modeling of wastewater networks are presented. Methods of modeling basic sewage parameters and appropriate calculation algorithms are described. The problem concerns the gravitational networks divided by nodes into branches and sectors. The nodes are the points of connection of several network segments or branches or the points of changing network parameters as well as of location of sewage inflows to the network. The presented algorithms for networks hydraulic calculation concern sanitary or combined sewage nets. It is assumed that the segments parameters such as shape, canal dimension, bottom slope or roughness are constant. Because of these assumptions all relations considered concern the steady state conditions for the network. The calculation of flow velocities and the filling heights in the segments of the wastewater net are carried out for the known slopes and diameters of the canals. Keywords: mathematical modeling of sewage network, hydraulic parameter of canal

1. Characteristic of Sewage Systems Taking into account a design and the operating processes we can distinguish the following sorts of sewage: housekeeping (sanitary) sewage, industrial sewage, rain wastewater, drainage sewage and ground water. The following sewage systems can be marked out depending on the kind of wastewater dump: a) combined sewage system b) separated sewage system c) semi-separated sewage system. In an universal sewage system (combined system) all kinds of the wastewater are led using the common canals. At the present time the separated sewage systems are mostly used and there are two separated sewage nets to notice: a) a sewage net, used for the housekeeping sewage and for the industrial sewage b) a rainwater net, used for carrying out the rainwastewater. The semi-separated sewage system is a system enclosing two kinds of nets: the housekeeping net and the rainwater one. In this system the sewage net can receive a part of the rain run-off. In this paper the following basic assumptions are made:

� Only housekeeping or combined sewage nets are considered, divided into branches and segments by nodes. � The nodes are the points of connection of several network segments or branches or the points of changing of network parameters as well as of location of sewage inflows to the network (sink basins, rain inlets, connecting basins). In the connecting nodes the flow balance equations and the condition of levels consistence are satisfied. � It is assumed that the segments parameters such as shape, canal dimension, bottom slope or roughness are constant. Because of these assumptions all relations concern the steady state problem. � The nets considered are of gravitational type.

2. Basic Problems

Designing and analysis of sewage networks are connected with the following tasks: 1. Making hydraulic analysis of the network for known section crosses and for known canal slopes. In this case the calculation of filling heights of the canals as well as the calculation of flow velocities depending on the sewage flow rates must be done. These calculations are done for the respective net segments using the earlier received flow values. 2. Designing of new segments of the network. It concerns the case when the new segments of the network must be added to the existing ones. In this situation diameters and canals slopes must be chosen for the new canals. It is assumed that the sewage inflows are known.

3. Basic Hydraulic Dependences in Sewage Network

According to Manning formula the flow velocity of sewage depends on hydraulic radius R and radius R depends on the filling height H. The Manning formula for velocity v has the form: 2

1

v = 1 ⋅R 3 ⋅J2

(1) n where: R – hydraulic radius, J – canal slope, n – roughness coefficient, v – flow velocity. The relations presented in the following concern the canals with circular section. From Manning formula and taking into account canal geometry one can obtain the following relation:

17


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

for H <= 0.5d:

0.16

)

(2b)

 sin ϕ   R = 14 d1 − ϕ  

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

H/d

Fig. 2. Dependences between the hydraulic radius and canal filling degree for different diameter values

8 7 6

(3b)

d d sin ϕ + ⋅ 4 8 π − 0.5ϕ

V

5 4 3 2

(3c)

1

0.85

H/d

0.95

8.0%

0.8

0

0.7

where: A – cross-section area, H – filling height, r – radius of circular canal, φ – central angle, d – canal inside diameter. From the above expressions one can see that for circular canals the cross-section area A and the hydraulic radius R depend on the canal filling height H and as a result the sewage flow velocity depends also on the canal filling height H when canal slope J and diameter d are given. We define for the following the canal filling degree in form of relation H/d. In Figures 1 and 2 the relations between A and H/d and between R and H/d for different diameters values d are shown. The figures show that section area A increases monotonically with growing canal filling degree H/d. For greater diameter values the increase of section area is faster and its values are greater. The greatest value of A is in the case of total canal filling and it equals πd2/4. Hydraulic radius R increases from zero and achieves its maximum for the filling ratio of 81.3% and then it decreases to the value equal to half of the canal height. For the total filling and for the half canal filling the value of radius is d/4. For greater diameters d also

0.5%

J

Fig. 3. Dependences between flow velocity v, canal filling degree H/d and canal slope J for roughness coefficient n=0.013 and for canal diameter d=0.6 the hydraulic radius grows but the shape of the curves does not depend on d. The sewage velocity depends on the canal parameters like diameter, canal slope and roughness coefficient and on the canal filling degree (Fig. 3). The sections of the surface from Fig. 3 using planes J=const. are presented in Fig. 4. It shows that the function describing velocity v depending on filling degree H/d has the shape similar to the function describing hydraulic radius R. The sewage velocity increases from zero and achieves its maximum for the filling degree of 81.3% and then it decreases to the value equal to half of the canal height. Greater diameters d v 6 5

A 0.3

d=0.2

4

0.25 d=0.2

0.2

d=0.3 0.15

d=0.4 d=0.5

0.1

d=0.6

0.05

d=0.3

3

d=0.4 d=0.5

2

d=0.6

1 0 0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

H/d

Fig. 1. Dependences between the cross-section area and canal filling degree for different diameter values 18

0

0.6

0.02

0.4

R=

d=0.6

0.04

(3a)

ϕ = 2 ⋅ arccos(2 ⋅ H − 1) d

d=0.5

0.06

0.5

πd 2 d 2 − ⋅ (ϕ − sin ϕ) 4 8

d=0.4

0.08

0.3

A=

d=0.3

0.1

0.1

for H > 0.5d:

(2c)

0.12

0.2

(

(2a)

d=0.2

0.14

0

0.18

ϕ = 2 ⋅ arccos 1 − 2 ⋅ H d

2014

R 0.2

d2 A= ⋅ (ϕ − sin ϕ) 8

N° 2

Articles

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

1.1

H/d

Fig. 4. Dependences between flow velocity v and canal filling degree H/d for roughness coefficient n=0.013 and for canal slope J=5% by different diameter values d


Journal of Automation, Mobile Robotics & Intelligent Systems

v

9 8 7 6 5

VOLUME 8,

H/d=0.75

4 3 2 1 0

J 0%

1%

2%

3%

4%

5%

6%

7%

8%

9% 10% 11%

Fig. 5. Dependences between flow velocity v and canal slope J for given canal filling degree H/d

v 9 8 7 6

J=10%

5

J=6%

4

J=4%

3 2

J=0,5%

1 0

J=8%

J=1%

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

H/d

Fig. 6. Dependences between flow velocity v and canal filling degree H/d for roughness coefficient n= 0.013 and for diameter value d=0.6 for different canal slopes J increase only velocities v but the shape of the curves presented does not depend on d. The section of the surface from Fig. 3 using plane H/d=const is shown in Fig.5. It shows that the flow velocity increases monotonically with the growing canal slope for the given filling degree. Fig. 6 shows the relation between flow velocity v and canal filling degree H/d for different slope values J. One can see from Fig. 6 that greater values of canal slope increase only velocity values and they do not influence the shape of the curves presented.

4. Algorithm for the Calculation of Wastewater Networks

4.1 The algorithm for Calculation of Canal Filling Heights and Flow Velocities The algorithm presented requires the following data for its calculation: � type of the network– housekeeping sewage net or combined sewage net � structure of the network – numbers of segments and nodes and type of nodes � maximal sewage inflow into the network and the corresponding node number � slows of canal bottoms and the canal dimensions. The task of the algorithm is to determine the fol� lowing values for given values of rate inflows Qi: � filling heights in each wastewater network segment, � flow velocity for each network segment.

N° 2

2014

The calculation scheme presented below is for the canals with circular section. The algorithm consists of the following steps:

Step 1. Entering the network structure and input data, i.e. number of nodes NW, number of segments NO, set of nodes W={j=1,…..,NW}, set of segments U={i=1,…..,NO}, set of diameters {di}, set of slopes for segments Ji, i=1,…,NO, roughness coefficients ni.

Step 2. Calculating the inflow rates for network input nodes; they are calculated depending on the kind of sewage. For the housekeeping and industrial sewages the maximal hour inflow Q for given network segment can be calculated according to the relation: [1], [4], [7], [8] Q h max =

Nh maxśrMq ⋅ 24

(4)

where: M – number of residents for the given segment of the net, qśr – average wastewater amount for average housekeeping unit, Nhmax – rate of irregularity for twenty four hours. For the rain the wastewater inflow can be expressed as follows: [1], [4], [7], [8]

Q = qd · ψ · F · φ

(5)

where: Q – rain wastewater inflow caused by infiltration [dm3l/s], F – area of drainage basin for the canal segment considered [ha], ψ – ratio between the rain wastewater amount passing into canals and the rain wastewater amount coming from the whole area given, ϕ – rate of delay between the rain time and the time of infiltration result, q d – rain intensity.

Step 3. For given rate inflows Qi in segments i=1,…., NO one can determine the following values: filling heights Hi, hydraulic radius values Ri and flow velocities vi. 1. From the Manning formula and taking into account the canal geometry one can obtain the following relations with x =

For H/d £ 0.5

β ⋅ F1 ( x ) − Q = 0 2

ϕ1 ( x ) 3

ϕ1 ( x ) = 2 ⋅ arccos(1 − 2 ⋅ x ) For H/d > 0.5

β ⋅ F2 ( x ) − Q = 0

(6a)

5

(ϕ ( x ) − sin (ϕ1 ( x ) ))3 F (x) = 1 1

H : d

(6b) (6c)

(7a)

Articles

19


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

5

F2 ( x ) = 2 ⋅

(π − 0.5 ⋅ ϕ 2 ( x ) + 0.5 ⋅ sin (ϕ 2 ( x ) )) 3 (7b) 2 (π − 0.5 ⋅ ϕ 2 ( x ) ) 3

ϕ 2 ( x ) = 2 ⋅ arccos(2 ⋅ x − 1)

β = 0.5 ⋅ n1

5

 1 3 1 ⋅ (d) ⋅   ⋅ J 2 4 8 3

(7c) (8)

where: H – filling height, j – central angle, d – inside canal diameter, J – canal slope, n – roughness coefficient, Q – rate inflow, H/d - canal filling degree. The β parameter in (8) depends on canal diameter d and on canal slope J and for the fixed diameter values and canal slopes it is constant. Solving equations (6a)–(7b) we obtain canal filling degree H/d as a function of flow rate Q. 2. For canal filling degree H/d calculated above the hydraulic radius Ri should be determined according to the formula: For H/d <= 0.5:

 sin ϕ   R = 14 d1 − ϕ  

(

(9b)

For H/d > 0.5:

d  π − 0.5ϕ + 0.5 sin(ϕ)   R =  4 π − 0.5ϕ 

)

ϕ = 2 ⋅ arccos 1 − 2 ⋅ H d

(9a)

(

)

ϕ = 2 ⋅ arccos 2 ⋅ H −1 d

(10a) (10b)

3. The flow velocity should be calculated from:

v=

2

1R3 n

1 ⋅J2

(11)

Knowing the network geometry, i.e. slopes, shapes and diameters of canals as well as the wastewater inflows Qi, one can calculate filling heights and flow velocities for each network canal. The calculations are carried out for each network segment beginning from the farthest one and going step by step to the nearest segment regarding the wastewater treatment plant. Step 4. The equations of flow balances

∑Qj = 0

j≠ i

and the conditions of surface levels equality are calculated in each network node.

20

Step 5. The whole network will be calculated once again with the wastewater inflows changed. Under Articles

N° 2

2014

assumption of constant sewage flows in the network segments the sewage system simulation can be executed for a sequence of time steps, for a couple of hours or days; by such the calculation the change of the wastewater inflows occurring with the time must be considered. There is to notice that the parameters analyzed in the algorithm, i.e. filling heights, hydraulic radius values and flow velocities depend on the wastewater inflows and by the rain wastewaters there is important to take into account their changes and to repeat the simulation runs according to their frequency. The algorithm presented can be considered as a part of the complex model for calculation sewage networks also under unsteady state conditions.

4.2 Analysis of Equations (6a)-(6c) and (7a)-(7b)

Equations (6a)–(7b) for calculating the canal filling degree are nonlinear and to solve them the standard numerical methods for solving nonlinear algebraic equations can be applied. In order to determine the equation roots some conditions for parameter β and sewage flow Q must be fulfilled that will be discussed in the following. Function F(x)=F1(x)+F2(x) is continuous in values range (0; 1>. For x=1, i.e. for the full canal filling, there is F= 2p and for x=0.5 we get F= p. In values range (0; 0.8> function F(x) is growing monotone. In values range (0.8; 1>. function reaches its maximum Fmax = 6.7588 for x = 0.9381. It is diminishing in values range (0.9381; 1>. This analysis has been done for d=0.6, J=1% and n=0.013. For fixed network parameters like canal diameter d and canal slope J, equation β·F(x)–Q=0 has solutions depending on sewage flow Q (Fig. 7). β*F-Q 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 0

H/d 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

-0.2 -0.3 -0.4 Q=0.1

Q=0.2

Q=pi*β

Fig. 7. Diagrams of function β×F(x)–Q for different values of Q in values range (0; π×β> Equation β·F(x)–Q=0 has the following roots: 1. For xÎ(0; 0.5> there is only one root and the following inequality must be fulfilled: 0< Q £ p×β. This inequality defines a values range for sewage flows Q for fixed canal diameters d and canal slopes J. 2. For xÎ(0.5; 1> equation β×F(x)–Q=0 has the following roots: ¨  one root for xÎ(0.5; 1) and p·β < Q < 2p×β ¨  two roots for xÎ(0.5; 1> and 2p·β £ Q < β×6.7586936, whereas for Q=2p·β there are x1=1 and x2=0.81963.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

β *F-Q

0.3 0.2 0.1 0 0

-0.1

0.1

0.2

0.3

0.4

0.5

-0.2

0.6

0.7

0.8

0.9

1

1.1

H/d

`

-0.3 -0.4 -0.5 -0.6 -0.7 Q=0.45

Q=0.58

Fig. 8. Diagrams of function β×F(x)–Q for π·β < Q < 2π×β

0.1 H/d

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 Q=2*p*β

Q=0.63

Fig. 9. Diagrams of function β×F(x)–Q for 2π·β £ Q < β*Fmax

H/d 0.9

N° 2

2014

decide what are the solutions for the given flow Q and whether the value of Q is not greater than the upper limit β×6.7586936, what means the lack of solutions. In such the case a change of one or of both of the fixed network parameters d and J must be considered. The result of the above relations is that the flow value Q depends on the parameter β. The parameter β depends on the canal diameter d and on the canal slope J. The equation describing the dependence of canal filling on the flow in the range ( 0; 2p×β) has one solution in this range and that is why this range is relevant. In Fig. 10 the relation between the solution of equation β×F(x)–Q=0 and flow Q for d=0.6, J=2%, n=0.013 and 0<Q<2p×β is graphically shown.

4.3 Calculation of Canal Diameters for Given Flow Values

The calculation procedure shown below concerns the following cases: � flow Q exceeds the upper boundary of values domain for β×6.7586936; then a change of values for given canal diameters d and slopes J have to be considered. � new segments must be added to the existing network; then the diameters and slopes must be defined for the new canals under the assumption that the sewage inflows Q into the canals have been forecasted and they are known. In both cases while calculating diameters and slopes for the new canals for given flows Q the inequality 2p×β–Q>0 has to be considered. The fulfilling of the inequality warrants the existence of only 1 solution of the equation describing the relation between canal slope J and canal flow Q. The calculation procedure consists of the following steps which are realized for the forecasted and fixed flow values Q: Step 1. Determination of canal slope value J. The value can be determined according to the existing technical standards or calculated regarding the relations for minimal slopes which are known from literature [7], [9], [13], [12]:

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Q

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fig. 10. Relation between the solution of β×F(x)–Q=0 and flow Q

The results of the discussion are shown in Figures 8 and in Fig. 9 the case with two roots of equation β·F(x)–Q=0 is presented with Q=2p×β and Q=0.63 (Q < β · 6.7588). For the fixed network parameters such as a canal diameter d and canal slope J the above relations let to

J = da

(12a)

τ 4 ⋅ τ min ⋅ (π − 0.5 ⋅ ϕ) 1 ⋅ J = min = ρ ⋅ R ρ ⋅ (π − 0.5 ⋅ ϕ + 0.5 ⋅ sin ϕ) d

(12b)

where a – parameter depending on the art of sewage system, or

(

)

with ϕ = 2 ⋅ arccos 2 ⋅ H − 1 , d

where J – minimal canal slope ensuring still the occurrence of canal self purification, tmin – tangential tension kg/m2, when tmin >0.225 kg/m2 for communal and industrial wastewater, ρ – specific gravity of sewage kg/m3, R – hydraulic radius. The canal slope shall be calculated for 60%–70% of the canal filling height. Articles

21


Journal of Automation, Mobile Robotics & Intelligent Systems

Step 2. Solution of the following equation: 8

ς⋅d3 − Q = 0

()

ς = πn ⋅ 14

5 3

1

⋅J2

VOLUME 8,

2014

d

(13)

If a solution d ∗ of the equation exists, then inequality

1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Q 0

0.1

8

ς⋅d3 − Q > 0

N° 2

is valid for all values d > d ∗ . If canal slope J has been calculated from relations (12a)–(12b) and now a value d greater than d ∗ will be taken into account, then one shall pass to Step 1 and the canal slope must be calculated again. If a solution of equation (13) does not exists then one shall return to Step 1, change the value J and solve once again equation (13). In Fig. 11 the relations between the solution of equation (13) (concerning canal diameter d) and canal flow Q for different canal slopes J are shown.

5. Computational Example

The considered algorithm has been tested on an exemplary housekeeping network consisting of 17 nodes connected by 16 segments. The net has got 9 input nodes (W6, W7, W8, W10, W11, W14, W15, W16, W17) and 1 output node W1. Other nodes constitute the connections between different segments of the network. [11]

0.2

0.3

0.4

0.5

J=J self-purification

0.6

0.7

0.8

0.9

J=0.5%

1 J=1%

1.1

1.2

1.3

J=1/d

Fig. 11. Relations between canal diameter d and canal flow Q for different canal slopes J

The arrows in Fig. 12 show the sewage flow direction. The sewage flow rates values for the input nodes are given. The flow rates in the connection nodes should be calculated according to the balance equation. For the respective segments the values of diameters d and canal slopes J are given. For such a structure of the net the fillings H/d and the velocities of flows v in respective segments are calculated. The conclusion is that for these values of sewage rate flows and for the given values of geometric parameters (diameters and canal slopes), the heights of filling are lower than the half of canal diameters. So there is a possibility of increasing of the input flows in some sewage nodes. The calculations results are shown in Table 1.

Table 1. The results of hydraulic computations for the exemplary net shown in Fig. 12 Upper node

Lower node

Segment.

D [m]

Q [dm3/s]

J‰

W7

W5

2

0.2

0.31

5

W10

W9

4

0.2

0.36

6

W6

W5

W11 W9 W4 W8 W3

W4 W9 W4 W3 W3 W2

1

3 5 6 7 8 9

W14

W13

10

W13

W12

12

W15 W16

W12

W17

22

W5

W2

Articles

W13 W12 W2

W1

W1

11 13

14

15

16

0.2

0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

0.2

0.2

0.2

1.4

0.53

1.14 1.13 2.13 3.91 0.11 4.12 0.11 0.32 0.66 0.24

2.76

6.33

7.61

H/d

v [m/s]

8.09%

0.259

5

10.72%

5

15.08%

9

13.03%

5

27.78%

5 5

8.32%

20.48% 4.98%

5

28.53%

5

8.22%

5

4.98%

0.,309 0.383 0.289 0.469 0.460 0.549 0.189 0.557 0.189 0.261

5

11.59%

5

23.29%

0.497

5

39.42%

0.661

5

5

7.18%

35.70%

0.325 0.24

0.629


Journal of Automation, Mobile Robotics & Intelligent Systems

W6

VOLUME 8,

N° 2

2014

W5 W7

W8

W4

W3 W9

W10

W16

W11

W12 W13 W14

W2

W17 W15

W1

Fig.12. Structure of the sewage net used computational example

ACKNOWLEDGEMENTS The paper is a result of the research project No N N519 6521 40 financed by the Polish National Center of Science NCN.

AUTHORS

Lucyna Bogdan, Grażyna Petriczek, Jan Studziński – Systems Research Institute Polish Academy of Sciences, Newelska 6, 01–447 Warsaw, Poland, email: Lucyna.Bogdan@ibspan.waw.pl, Grazyna.Petriczek@ibspan.waw.pl, Jan.Studzinski@ibspan.waw.pl *Corresponding author

[5] Jaromin K., Jlilati A., Borkowski T., Widomski M., Ładóg G., „Rodzaje materiału i sposoby eksploatacji a współczynniki szorstkości w przewodach kanalizacji grawitacyjnej”. In: Proceedings of ECOpole, vol. 2, no. 2, 2008, in Polish. [6] Karnowski J.M., „Warunki transportu wleczonych części mineralnych w przewodach kołowych o dowolnym nachyleniu”. In: Materiały Konferencji Naukowo-Technicznej PZITS, Poznań 1973, in Polish. [7] Kwietniewski M., Nowakowska–Błaszczyk A., „Obliczenia hydrauliczne kanałów ściekowych na podstawie krytycznych natężeń stycznych”. Wodociągi i Kanalizacja, 13, 1981, in Polish. [8] Niedzielski W., „Charakter przepływu w sieci kanalizacji deszczowej”, Ochrona Środowiska, no. 434/3–4, 1984, pp. 20–21, in Polish. [9] Puchalska E., Sowiński N., „Wymiarowanie kanałów ściekowych metodą krytycznych naprężeń stycznych”, Ochrona Środowiska, no. 3–4, 1984, pp. 53–62, in Polish. [10] Serek M., „Zastosowanie mikrokomputerów do obliczania sieci kanalizacji deszczowej”, Ochrona Środowiska, no. 488/1–2, 1986, pp. 27–28, in Polish. [11] Służalec A., Sieć kanalizacji ściekowej – obliczenia hydrauliczne. Raport Badawczy IBS PAN, Warsaw 2010, in Polish. [12] Wartalski J., „Komputerowe metody projektowania i analizy hydraulicznej sieciowych układów kanalizacyjnych”, Ochrona Środowiska, no. 434/3–4, 1984, pp. 20–21, in Polish. [13] Wartalski A., Wartalski J., „Projektowanie hydrauliczne rurociągów z tworzyw sztucznych”. Ochrona Środowiska, no. 1/76, 2000, pp. 19–24, in Polish. [14] WILO Polska – Producent pomp i urządzeń sanitarnych, „Podstawy odprowadzania i pompowania ścieków”. Oferta handlowa, in Polish.

REFERENCES [1] Biedugnis S., Metody informatyczne w wodociągach i kanalizacji, Oficyna Wydawnicza Politechniki Warszawskiej, Warsaw 1998, in Polish. [2] Bogdan L., Petriczek G., Zagadnienia modelowania sieci kanalizacyjnej dla potrzeb zarządzania przedsiębiorstwem wodociągowym, series: Studia i Materiały Polskiego Stowazrzyszenia Zarządzania Wiedzą, vol. 22, 2009, pp. 32–42, in Polish. [3] Błaszczyk W., Stamatello H., Blaszczyk P., „Kanalizacja. Sieci i pompownie”. vol. 1, publ.: Arkady, Warsaw 1983, in Polish. [4] Chudzicki J., Sosnowski S., „Instalacje kanalizacyjne”, publ.: „Seidel-Przywecki”, Warsaw 2004 in Polish.

Articles

23


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Failures Location within Water Supply Systems by Means of Neural Networks Submitted: 20th March 2013; accepted 30th August 2013

Izabela Rojek, Jan Studziński DOI 10.14313/JAMRIS_2-2014/15 Abstract: In the article the neural networks used for failures location for water supply networks are presented. To do this a hydraulic model of the water net, as well as an appropriate developed monitoring system have to be used. The current applications of monitoring systems installed in the waterworks do not realize their possibilities. The monitoring systems provided as autonomic programs to collect and record the information about flows and pressures of water in source pumping stations, in the pump stations bringing up the water pressure inside the water net and in the pipes of water supply network give a general knowledge about state of its work, but if they would be used as elements of IT systems supporting the water network management, they could help to solve the tasks concerning detection and localization of water leaks. The models of failures location in water nets described in the paper are created by means of neural networks in the form of MLP nets. Keywords: water-supply networks, network hydraulic model, detection and location of water leakages, neural networks

1. Introduction

24

The main goals of a municipal water network are the supply of water to the water net users and correct operation of the water net assuring an appropriate water pressure in the water net end nodes, efficient removing of the failures, as well as planning and executing of activities concerning conservation, modernization and extension of the network whereas the water supplied and distributed under the water net users has to be of a suitable quality and sufficient quantity [1]. The operation and control of a water network is a difficult and complex process. The problem of detection and localization of hidden leakages in the water network is one of the most important water net management tasks. This is because of the water losses caused by the water net damages; and the resulted water losses can reach sometimes even 30% of the total water production what has essentially and negatively financial results of waterworks. Therefore, the fast location and elimination of water leaks and (especially of these hidden ones) can bring the measurable economic advantages both for the waterworks and for the water net end users. The different stages of the whole process of elimination of water leaks can be defined in the following way:

− failures detection – a failure case can be determined −

by observation of a bigger water tribute, but the failure location cannot be defined; failures location – the failure place in the water net can be determined by means of some suitable algorithms and with the use of a monitoring system, the water net hydraulic model and particularly also of neuronal networks; failures counteraction – using the failures historical data, development of models to forecast the water net emergency and the subsequent planning of network revitalization, the rate of the water net unreliability can be essentially reduced.

2. The Algorithm for Water Net Failures Localization

Different approaches and computational algorithms to aid detection and location of water leaks in water networks have been already presented in the past and current literature [1, 2, 3, 4]. In every case a water network hydraulic model and a monitoring system installed on the water net are the basic tools for making the calculations. An appropriate computer infrastructure exploited on the water network is needed for practical realization of these algorithms. A monitoring system, a calibrated hydraulic model of the water net, as well as a GIS system for generating the water net numerical map should be included as key components into this infrastructure. Such the extended computer infrastructure permits not only to detect and locate the water net failures but also to manage the network executing the tasks like water net control, water quality analysis and improvement, water net optimization and design, etc. [5, 6]. This means that the high developed ICT tools are useful and indispensable for water network management making it easy, right and optimal. In the following an algorithm to detect and locate the water leaks in municipal water networks is described. It uses the neuronal nets to create a classifier identifying and situating the water leaks arising in the water net. The algorithm consists of the following steps: 1. Determination of a ranking list of sensitive points in the water net, using an algorithm for planning the monitoring systems. 2. Choice of a suitable number of the most sensitive measuring points for the monitoring system to be installed on the water network. 3. Development of a hydraulic model of the investigated water net and its calibration using the data from the monitoring system installed.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

4. Determination of standard distributions of pressure and flow values using the data from the measuring points of the monitoring system; these distributions are calculated for standard loads of the whole water network and of its end nodes. 5. Simulation of leakage events in subsequent nodes of the water net by means of the hydraulic model and recording of pressure and flow values measured in the measuring points of the monitoring system. 6. On the base of the failures data recorded, creation of the water leaks classifiers in form of neural networks and choice of the best classifier regarding the criterion of largest sensibility. 7. On line measuring the water flow and pressure values in the water net using the monitoring system and comparison of the current data with these standard ones. 8. In case of an essential difference between the standard and current data recorded by the monitoring system, use of the classifier to find out the water net node in which the water leak possibly happened.

2.1. Determination of Sensitive Points in the Water Net Investigated

In order to find out the best location of sensors for the measuring points of the monitoring system to be installed on the water net the so-called sensitive points of the water net have to be determined. There are in state to collect the information concerning the changes in the water network not only in the points where they are installed but also from the remote surroundings. These sensitive points one can name as characteristic points of the water net in contrary to the so-called dead points in which only the local changes of the water network can be noticed. The usual practice while developing monitoring systems consists in extension of number of the monitoring points what stays in opposition to the procedure shown above. A suitable choice of a comparatively small number of characteristic points in the water net can be equivalent regarding the quality and quantity of the information collected with larger number of points situated in less sensitive places of the network. To determine the sensitive points of the water net the following formulas [7] can be used: S pm =

∑ (∆p / p ) L ∑L

k ≠m

m

k ≠m

m

km

km

S qm =

∑ (∆q / q ) L ∑L

k ≠m

m

k ≠m

m

km

N° 2

2014

Fig. 1. Structure of the procedure for planning the monitoring system net by the monitoring system are recorded in the data base, which is mostly the branch data base of a GIS system and then there are used by a hydraulic model to calculate the sensitivity values of the water net nodes. One can see that to calculate these values a monitoring system installed on the water net, as well as a calibrated water net hydraulic model are needed what is not the case at the beginning of the procedure. Because of that the procedure is realized iteratively in the following steps: the first step means a calibration of the hydraulic model using data got from a measurement experiment performed at the water net; the second step means the sensitive points calculation using the hydraulic model calibrated and then the installation of the monitoring system in the selected measuring points; the third step if realized means mostly the recalibration of the hydraulic model with use of the monitoring system already installed.

2.2. Development of a Hydraulic Model of the Water Net

To find out the sensitive points of the water net for planning the monitoring system the hydraulic model of the real Polish water net has been used. The hydraulic calculations can be made only with the hydraulic graph of the water net, which is topologically correct, that is compact and without any un-continuities. Hydraulic graphs can be generated and exported to the hydraulic models by GIS systems and such the mechanism is shown in Fig. 2. Such the operation makes

km

where: k –node with the water leak simulated, m – measurement point considered, p – water pressure, q – water flow, Dpm and Dqm – differences in measurements for standard and emergency states of operation of the water net, L – distance between the points k and m. The correct measurement points are these ones with the highest sensitivity values. In Fig. 1 the circulation of information while planning the monitoring system is shown. The data collected from the water

Fig. 2. Export of a water net hydraulic graph from a GIS system to the water net hydraulic model Articles

25


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Fig. 4. Data file with the measured values of flows in the monitoring points of the water net; case with 10 monitoring points Fig. 3. Hydraulic model of the water net investigated ready for simulation of water leaks the calculation with the hydraulic model much easier and faster as if the hydraulic graph would be designed using the software interface of the hydraulic model. After the data export from GIS system is already completed then the calculation with the hydraulic model can be executed quite apart from GIS (Fig. 3).

2.3. Simulation of Leakage Events in The Subsequent Water Net Nodes

26

In the research presented two cases of investigation have been realized: for the monitoring systems consisted of 10 and of 20 monitoring points located on the water network in its most sensitive nodes. The execution of hydraulic calculations of the water net for its standard load without any water leaks, simulation of leakages in the subsequent water net nodes using the hydraulic model and recording of flow values from the monitoring points for both cases of the monitoring systems and for both cases of the water net operation, in standard and in failures modes, leads to the preparation of learning files for the neural networks. In Fig. 4 one can see the data file got from the hydraulic model with the flow values calculated for 10 measuring points of monitoring system for the standard mode of the water net operation. Similar files can be got from the hydraulic model for the water leaks subsequently simulated in all net nodes. While simulating the water net failures the following activities have to be made: − simulation of water leaks in all water net nodes with several changes of the flow values (by means of the hydraulic model), − computation of flow differences in the monitoring points, occurring between the standard and failure modes of the water net operation, − determination of the measuring points with strongest reaction on the water leak in the dedicated water net nodes, − making data files with recorded flow values from the monitoring points for standard and failure modes of water net operation and with most sensitive monitoring points determined. With the data files made, the neuronal classifiersof water leaks to signalize the cases of water net failures and to notify their possible localization can be created. Articles

2.4. Creation of the water leaks classifier in form of neural network The models of failure location in the water net have been created by use of neural networks of MLP type. These neural networks are invariably the most widespread and universal networks used currently for solving much differentiated scientific and practical problems like technical, economical, medical ones, etc. In a multi-layer network with the error backpropagation algorithm (called multi-layerperceptron, MLP) the choice of a number of neurons put onto the input layer is conditioned by the dimension of data vector x. The neuronal model consists of the sum of elements x1, x2, ..., xN (in form of the input vector x = [x1, x2, ..., xN]T)) multiplied by weight coefficients wi1, wi2, ..., wiN (in form of the weights vector wi = [wi1, wi2,..., wiN] T regarding neuron i) and of an additional value wi0:

ui = ∑ wij x j + w i0 The resulted signal ui is given to a non-linear activation function f(ui) which is mostly an one-polar sigmoid function: 1 f (ui ) = 1 + exp ( − bui )

The algorithm of the error back-propagation is the basic algorithm supervising the learn process of the multi-layer and one-way neuronal networks. While executing the algorithm a gradient method is used to minimize the error function. On the first step of investigation the monitoring system with 10 measuring points has been used and in 36 selected nodes of the water net the water leaks have been simulated. The whole water net investigated contains 390 nodes. On the second step of investigation the number of monitoring points raised to 20 and the number of water net nodes with the water leaks simulated raised to 44. The water leaks classifier has been created according to the methodology developed in [8]. When the calculation runs have been realized while creating the classifier in form of a MLP two parameters of the neuronal network have been changed: number of neurons on the hidden layer of the network computed and number of its learning epochs. The first parameter changed its value from 5 to 25 and the second parameter has taken optionally the values 200, 500 and 1000.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N째 2

2014

Fig. 5. A fragment of data file for teaching the neural networks In Fig. 5 a fragment of the data file used for teaching the neuronal nets is shown in the case of 10 monitoring points considered. The columns from 24 till 36 in the file show the flow values in the water net nodes in which the water leaks have been simulated. These flow values are calculated using the hydraulic model for both modes of the water net operation, i.e. for the standard load of the water net and for its different failures loads. The last column of the data file shows the numbers of the monitoring points that reacted strongest on the flow changes caused by the water leaks simulated in the subsequent water net nodes; number 0 in the column means the standard mode of the water net operation, i.e. the work of the water net without any water leak.

Fig. 6. Structure of the procedure for simulating the water leaks

Table 1. Calculation results for 10 monitoring points and water leak simulations in 36 water net nodes No.

Neural Network Name

Learning quality

Testing quality

Validation quality

2

MLP 36-15-11

97.66

97.78

95.56

1

3

4

MLP 36-08-11

MLP 36-22-11

MLP 36-19-11

88.32

94.39

97.66

95.56

97.78

97.78

88.89

93.33

95.56

5

6

MLP 36-21-11

MLP 36-24-11

94.39

97.20

97.78

93.33

97.78

97.78

7

MLP 36-23-11

97.66

97.78

95.56

No. 1

Neural Network Name

Learning quality

Testingquality

Validation quality

2

MLP 44-18-21

100.00

100.00

100.00

3

MLP 44-21-21

98.02

96.30

98.15

5

MLP 44-19-21

90.08

88.89

83.33

Table 2. Calculation results for 20 monitoring points and water leak simulations in 44 water net nodes

4

6

7

MLP 44-25-21

MLP 44-18-21

MLP 44-24-21

MLP 44-08-21

97.22

87.70

80.56

93.66

100.00

85.19

75.93

90.74

98.15

83.33

81.49

94.44

In Fig. 6 the circulation of information while simulating the water leaks in the water net is shown. In the first case of investigation, with 10 monitoring points and with 36 water net nodes with water leak simulations, the number of examples (records) included into the data file used for teaching the neuronal nets amounted to 304 and in the second case, with 20 monitoring points and 44 water net nodes with water leak simulations, the number of examples amounted to 360. In each case the teaching data file was divided into 3 under-files, i.e. the learning file containing 70% of examples, the testing file with 15% of examples and the validation file with 15% of examples. The results of calculations done with different structures of the MLP neuronal nets and for 2 cases of investigation are shown in Tables 1 and 2. The quality values of learning, testing and validation of the neuronal nets are given in % in the tables. An exemplary notation MLP 36-811 in Table 1 means a MLP neuronal net with 1 hidden layer and with 36 neurons on the input layer (due to 36 water net loads with water leak simulations), with 8 neurons on the hidden layer and with 11 neurons on the output layer (due to 10 monitoring points and to 1 standard mode of the water net operation). Articles

27


Journal of Automation, Mobile Robotics & Intelligent Systems

One can see from the results shown that a MLP neuronal net can be a good IT tool for finding out the water leaks in municipal water networks. The quality of calculation depends principally on 2 variables: on the number of monitoring points which are installed on the water net and on the number of neurons on the hidden layer in case of 3-layer MLP neuronal nets. If the number of monitoring points is relatively small then better results will be get with a larger number of neurons on the hidden layer. But if the number of monitoring points is relatively great than for getting a good quality of results an optimal number of neurons must be find out for the hidden layer and it shall be not especially big. The best neuronal model that has been calculated for 10 monitoring points is MLP 36-24-11 with 24 neurons on the hidden layer and in the second case with 20 monitoring points the model MLP 44-18-21 issued as the best one with only 18 neurons on the hidden layer. There is interesting to see that in both cases of neuronal nets modeling the validation results are not worse than these ones of learning and testing runs. This means that the neuronal models of MLP type are good fitted for solving the problems of detection and localization of water leaks in municipal water networks.

3. Conclusions

The use of the methods of artificial intelligence and especially of neuronal networks can be very helpful while solving the problems of computer aided management of municipal waterworks. The application of neuronal networks to detect and locate the water leaks in water networks is only one of several tasks which are to solve in the waterworks and where the neural nets can be used. The solution of the whole set of management tasks concerning the so-called soft management and hard management can be effectively computer supported and automatically executed by integrated ICT systems consisting of many close cooperating programs under which the neuronal net algorithms are only ones of key elements [9]. The calculation results presented here have got as yet only an academic worth for they have not been tested under practical conditions. A practical application of the algorithms described requires an advanced computer infrastructure installed on the water network and this concerns especially the installation of an adequate monitoring system. Unfortunately such the situation exists until now in not any waterworks in Poland because of very high costs of measurement devices needed. It seems nevertheless that an application of neural networks for solving some management tasks in waterworks can be very useful and can introduce a new quality while operating the municipal water networks. AUTHORS

28

Izabela Rojek*-Kazimierz Wielki University in Bydgoszcz, Institute of Mechanics and Applied Computer Science, Chodkiewicza 30, 85-064 Bydgoszcz, Poland, e-mail: IzaRojek@ukw.edu.pl Articles

VOLUME 8,

N° 2

2014

Jan Studziński – Systems Research Institute Polish Academy of Sciences, Newelska 6, 01-447 Warszawa, Poland, email: Jan.Studzinski@ibspan.waw.pl *Corresponding author Acknowledgements The paper is a result of the project No NR14-001110/2010 financed by the Polish Ministry of Sciences and Higher Education. REFERENCES

[1] Studziński J., “Decisions making systems for communal water networks and wastewater treatment plants”. In: Studzinski J., Hryniewicz O. (eds.), Modeling Concepts and Decision Support in Environmental Systems, System Research Institute, Polish Academy of Sciences, Series: System Research 49, Warsaw, 2006. [2] Rojek I., Studziński J., “Failures location algorithms for water-supply network by use of the neural networks”, Studies and Proceedings of Polish Association for Knowledge Management, Bydgoszcz, vol. 8, 2011, pp. 146–156 (in Polish). [3] Wyczółkowski R., Moczulski W., “Concept of intelligent monitoring of local water supply systems”. In: Materials of AI-METH 2005 – Artificial Intelligence Methods. 16th-18thNovember, 2005, Gliwice, Poland. [4] Wyczółkowski R., Wysogląd B., “An optimization of heuristic model of a water supply network”. In: Computer Assisted Mechanics and Engineering Science, CAMES, no. 14, 2007, pp. 767–776. [5] Studziński J., “The innovations of XXI age – the modern information techniques as management aids in network enterprises”. In: Straszak A. (eds.), Innovative Mazovia, Publishing House of SWPW, Płock, 2010 (in Polish). [6] Farmani R., Ingeduld P.,Savic D., Walters G., Svitak Z., Berka J., “Real-time modeling of a major water supply system”. In: International Conference on Computing and Control for the Water Industry, no. 8, Exeter, ROYAUME-UNI, vol. 160, no. 2, 2007, pp. 103–108. [7] Straubel R., Holznagel B., “Mehrkriteriale Optimierung für Planung und Steuerung von Trink- und Abwasser-Verbundsystemen”. Wasser Abwasser, 140, no. 3, 1999, pp. 191-196. [8] Rojek I., “Support of decision making processes and control in systems with different scale of complexity using artificial intelligence methods”. Publishing House of Kazimierz Wielki University, Bydgoszcz, 2010 (in Polish). [9] Studziński J., “IT System for Computer Aided Management of Communal Water Networks by Means of GIS, SCADA, Mathematical Models and Optimization Algorithms”. In: Proceedings of the First International Conference on Information and Communication Technologies for Sustainability ICT4S, L.M. Hilty, B. Aebischer, G. Andersson, W. Lohmann (eds.) ETH Zurich, 14th–16thFebruary, 2013, pp. 123–127.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Outside the Box: An Alternative Data Analytics Frame-work Submitted: 13th February 2013; accepted 26th February 2013

Plamen Angelov DOI: DOI 10.14313/JAMRIS_2-2014/16 Abstract: In this paper, an alternative framework for data analytics is proposed which is based on the spatially-aware concepts of eccentricity and typicality which represent the density and proximity in the data space. This approach is statistical, but differs from the traditional probability theory which is frequentist in nature. It also differs from the belief and possibility-based approaches as well as from the deterministic first principles approaches, although it can be seen as deterministic in the sense that it provides exactly the same result for the same data. It also differs from the subjective expert-based approaches such as fuzzy sets. It can be used to detect anomalies, faults, form clusters, classes, predictive models, controllers. The main motivation for introducing the new typicality- and eccentricity-based data analytics (TEDA) is the fact that real processes which are of interest for data analytics, such as climate, economic and financial, electro-mechanical, biological, social and psychological etc., are often complex, uncertain and poorly known, but not purely random. Unlike, purely random processes, such as throwing dices, tossing coins, choosing coloured balls from bowls and other games, real life processes of interest do violate the main assumptions which the traditional probability theory requires. At the same time they are seldom deterministic (more precisely, have always uncertainty/noise component which is nondeterministic), creating expert and belief-based possibilistic models is cumbersome and subjective. Despite this, different groups of researchers and practitioners favour and do use one of the above approaches with probability theory being (perhaps) the most widely used one. The proposed new framework TEDA is a systematic methodology which does not require prior assumptions and can be used for development of a range of methods for anomalies and fault detection, image processing, clustering, classification, prediction, control, filtering, regression, etc. In this paper due to the space limitations, only few illustrative examples are provided aiming proof of concept. Keywords: data density, proximity measures, RDE, data analytics, data-driven approaches, machine learning, Bayesian

1. Introduction Probability theory was around for over two centuries [1]. It is well established and widely (over)used. Its basis was set up by Thomas Bayes,

generalised later by Pierre-Simon Laplace and other researchers based on observations of purely random processes, such as games and gambling. It is perfectly suitable for describing such purely random processes and variables. However, it is also (extremely) widely (over)used to describe real world processes which are not purely random and have inter-sample dependence, not normal distributions and may have small number of observations. For example, climate, economic, physical, biological, social, psychological and many other real processes are complex and difficult to tackle using first principle or expert-based models. The traditional probability theory, on the other hand, is based on several assumptions which do not hold in practice, such as: a) independence of the individual data samples (observations) from each other; b) large (theoretically, infinite) number of data samples (observations); c)  prior assumption of the distribution or kernel (most often, normal/Gaussian). The first assumption is fully satisfied for pure random processes, but not for real processes which are usually of interest. Therefore, the application of traditional probability theory for pure random processes is justified, but the same is not necessarily the case for the real processes which are the vast majority of applications of interest. In this paper, a new systematic framework for data analytics is proposed which requires no prior assumptions or kernels, user- or problem-specific thresholds and parameters to be pre-specified. It is entirely based on the data and their mutual distribution in the data space. It does not require independence of the individual data samples (observations); on the contrary, the proposed approach builds upon their mutual dependence. It also does not require infinite number of observations and can work with as little as 3 data samples. The new typicality and eccentricity based data analytics (TEDA) is an alternative statistical framework which can work efficiently with any data except pure random processes when individual data samples (observations) are completely independent from each other. For such pure random data the traditional probability theory is the best tool to be used. However, for real data processes – which are the majority of the cases – we argue that TEDA is better justified, because it does not rely on assumptions which are not satisfied by such processes.

29


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

The term typicality was used recently [2] to describe “the extent to which objects are ‘good examples’ of a concept”. By differ from [2] were only conceptual, philosophical considerations are made, in this paper a systematic mathematical framework is introduced. Eccentricity can be very useful for anomaly detection, image processing, fault detection, etc. Both typicality and eccentricity can be very useful for development of new clustering, classification, multimodel prognostic, control, soft sensors, etc. In the remainder of this paper the proposed new TEDA methodology will be described first in section 2. In section 3, some simple examples will be provided mostly aiming proof of concept. Next, in section 4, an anomaly detection approach based on eccentricity will be outlined. In section 5 the clustering, classification in section 6 and prediction and control in section 7, all within the TEDA framework are outlined. Finally, section 8 concludes the paper.

2. Description of the proposed methodology

Let us start with data samples (observations) that we may have, xÎRn (where n is the number of features/ characteristics; in Fig. 1 n=2 for illustration purposes; in the rest of the paper we will use n=1 without any limitations to the concept which is applicable for any positive integer). If we have a single or just two data samples (observations) there is no much sense to introduce the value of its typicality and eccentricity. It will be the only value observed/recorded or the only single distance between the two samples, k=2 (they will be equally untypical except the extreme case when they coincide when they will be equally typical). For any number of data samples, k>1 we can define the distance between them, d. This distance/proximity measure can be of any form, e.g. Euclidean, Mahalonobis, cosine, Manhattan/city/L1, etc. Let us denote the distance between two data samples, xi and xj by dij.

k

x kj =

2π kj k

∑π

= k i

i =1

∑d

2

30

π k (x j ) = π kj =

k

∑d i =1

ij

k >1

j >1

(1)

The eccentricity of a particular jth (j >1) data samArticles

ij

k

∑π

i =1 k k

∑∑ d

k i

i =1

il

l =1 i =1

j >1

>0

k >2

(2)

The coefficient 2 is due to the fact that each distance is counted twice and can be seen as a normalisation coefficient. The typicality of the jth (j >1) data sample calculated when k (k >2) non-identical data samples are available is defined as the complement of the eccentricity, ξ of that data sample: k

t kj = 1 − x kj

k >2

∑π

j >1

k i

i =1

>0

(3)

It is easy to check that both eccentricity, x and typicality, t are bounded: k

∑x

0 < x kj < 1

k i

i =1 k

∑t

0 < t kj < 1

k i

i =1

k

=2

∑π

k i

i =1

∀k > 2 ∀j > 1 (4a)

>0 k

∑π

=k − 2

k i

i =1

>0

∀j > 1

(4b)

These definitions of eccentricity and typicality resemble fuzzy set membership functions since being values between 0 and 1, summing up to a value larger than 1 (for τ and k=3 it sums up to 1). We can also introduce normalised eccentricity and typicality which integrate to 1: ζ kj =

ξ kj 2

k

∑ζ i =1

k i

= 1 0 < ζik < k

∑π

We can also calculate the accumulated proximity/ sum distances, π to all available data samples from a given, jth (j>1) data sample calculated when k (k>1) data samples are available:

2014

ple calculated when k (k >2) data samples are available (and they are not all the same by value) is defined as the relative (normalised) π of that data sample as a fraction of π’s of all other data samples:

Fig. 1. A 2D data distribution (A is a rather eccentric data point; B – a typical one)

N° 2

i=1

t kj =

τ kj k −2

k

∑t i =1

k i

k i

> 0

= 1 0 < t ik < k

∑π i=1

k i

> 0

1 k >2 2

j >1

1 k >2 k −2

(5)

j >1

(6)

Normalised eccentricity and typicality resemble probability distribution function (pdf) in that they sum to 1, but they are different as they do not require the prior assumptions that are a must for the probability theory and they represent both the spatial distribution pattern and the frequency of occurrence of a data sample.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

All of the above definitions are applicable to data streams (online, when k is incrementally increased). In case of data sets (offline, fixed amount of data, k) the upper index, k can be omitted in all notations because it only indicates based on how many data samples the respective value has been calculated. The above definitions are global (defined over all available data); one can also define local eccentricity and typicality. They can also be very useful for local regions, groups/ clusters/data clouds, classes (then summation is only over the data samples concerned), see also section 6. The typicality can also be seen as an analogue to the histograms of distributions, but it is in a closed analytical form and does take into account the mutual influence of the neighbouring data samples/ observations. When normalised eccentricity is above 1/k the data sample is rather untypical/eccentric/ anomalous. If the value of typicality is above 1/k then the data sample is rather typical. It can also be proven that both eccentricity and typicality can be calculated recursively by updating only the global or local mean, µ and scalar product, X for the cases when Euclidean square distance [3], [4] is used and similarly if cosine [5] or Mahalonobis square distance [6] are used. For example, for the Euclidean square distance, without limiting the scope of applicability of the typicality and eccentricity in general, we have [3], [4]:

(

pkj = k || x j − mk ||2 + X k − || mk ||2 mk =

Xk =

k − 1 k −1 1 m + xk k k

k − 1 k −1 1 X + || x k ||2 k k

m1 = x1

)

X 1 =|| x1 ||2

(7)

(8) (9)

where m– recursively updated (local or global) mean; X is the recursively updated scalar product. Furthermore, we do not need to calculate k

∑π

i =1

k i

each time, but we can update it recursively by: k

k −1

∑π = ∑π i =1

k i

i =1

k −1 i

+ 2π kk

π 11 = 0

2014

Algorithm 1. TEDA (Euclidean square distance used for d) Initialise k=1; j=1; x1; X 1 =|| x1 ||2; m1 = x1; p11= 0; WHILE data points from the data stream are available (or until not interrupted) DO 1. Read the next (k:=k+1) data point xk; 2. Update a. mk from equation (8);

b. Xk from equation (9);

3. For 1£ j£ k compute π j from equation (7); k

k

4. Update

∑π i =1

k i

from equation (10);

5. For 1£ j£ k compute:

a. x jk from equation (2);

b. t kj from equation (3); c.

z jk from equation (5);

d. t kj from equation (6);

End WHILE The recalculation/update of the values of eccentricity and typicality based on k data samples from the same values based on k–1 data samples (observations) can be seen as similar to the posterior probabilities update in the Bayesian rule. The prior estimation for feasible but not yet observed data points can be done by interpolating the existing x, z, t, t. Interpolation can be local or global, linear or more complex.

3. TEDA Primer

Due to the space and time limitations in this paper only the basic concept will be laid down and several illustrative examples aiming proof of concept. Further publications will detail and expand this new theoretical framework for objective data analytics. Let us consider an extremely simple data stream which consists of just three data samples (these may be thought of – without limiting the generality of the concept – as values of the temperature in oC):

(10)

N° 2

y = {20; 12; 10}

(11)

Obviously, k = 3. We can easily get:

x3 = {0.9;0.5;0.6} t3 = {0.1;0.5;0.4} V3 = {0.45;0.25;0.3} 3

x3 that = {0.9;0.5;0.6} V3 = {0.45;0.25;0.3} t 3 = t The coefficient 2 is due to the fact each dis- t3 = {0.1;0.5;0.4} (12) tance counts once from the kth point towards the ith point and once from the ith point towards the kth point. We can see that the sum of eccentricity values is It can be proven [12] and it is obvious from (7) that = 2 and of the typicality values is =1; they are between the minimum value of eccentricity (and respectively, 0 and 1 as expected. Similarly, the normalized the maximum of the typicality) is obtained for the eccentricity and typicality both sum up to 1 with the data points that are closer to (or coincide with) the normalized eccentricity being in the range [0;0.5] and mean, mk which is quite logical. It has to be stressed normalized typicality- in the range [0;1] as expected. that this applies globally as well as locally. Moreover, the normalized typicality of y2 is above 1/3, Now, we can formulate the following recursive which means it is a typical value of the temperature procedure: (based on these three observations). We can also Articles

31


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of of Automation, Automation, Mobile Mobile Robotics Robotics & & Intelligent Intelligent Systems Systems Journal

see in Fig.2 top line of plots that the eccentricity of see in Fig.2 top line of plots eccentricity of y1=20°C is substantially higherthat thanthe that of the other ydata =20°C is substantially higher than that of the other samples and the normalized eccentricity (0.45) 1 data samples the normalized eccentricity (0.45) is >1/3. The and typicality of y2=12°C is highest; the is >1/3. The typicality y2o=12°C is>1/3, highest; the normalized typicality of yof =10 C is also but less 3 normalized y3=10oC is also >1/3, but less obvious thantypicality that of y2of =12°C. obvious of y2=12°C. are quite logical and, The than abovethat observations The above are prior quiteassumptions logical and, importantly, we observations did not made any importantly, we did made any prior on the number of not data points, their assumptions distributions, on the number of used dataany points, their distributions, kernels, we did not expert or first principles kernels, we did any objectively expert or first knowledge. Yet,not weused derived theprinciples common knowledge. derived objectively the eccentric, common knowledge Yet, fact we that y1=20°C is rather knowledge that thus, y1=20°C is rather unusual (for fact England), a candidate foreccentric, anomaly unusual (for England), anomaly being declared while y2thus, =12°Caiscandidate the mostfor typical one, beingadeclared while =12°C is the most typical one, thus, candidate for ayprototype. 2 thus, candidate for a prototype. If awe have an additional observation of 18°C then weestimate have an additional observation then we Ifcan posterior values of x, of z, 18°C t, t using we estimate posterior values x, result z, t, twill using the can procedure described above andofthe be the procedure described the result will be different because it will beabove basedand on four data samples different it will based onestimation, four data samples observedbecause not three (fact,benot prior similar observed not threetheory’s (fact, not prior estimation, to the probability Bayesian rule), z4: similar to the probability theory’s Bayesian rule), z4: z 44 = { y ;17} = {20,12,10,17} z = { y ;17} = {20,12,10,17} We can use the procedure (Algorithm 1) to easily get can use the procedure (Algorithm 1) to easily get the We following: the following: x44 = {0.6;0.43;0.54;0.43} t 44 = {0.4;0.57;0.46;0.57} x = {0.6;0.43;0.54;0.43} t = {0.4;0.57;0.46;0.57} (13) (13) We can check that both x and t sum up to 2 and Wet sum can check x and t sum tot2isand z and up to 1that andboth that the range of xup and bez and t0sum to 1 and range of x and t is1/2. between andup 1 while of zthat andthe t it is between 0 and tween 0 and 1 while of z and t it is between 0 and 1/2.

Fig. 2. The left column represents the eccentricity; the Fig. The left represents the corresponds eccentricity;to The leftcolumn column represents they right2.one - typicality; top line of plots right one – typicality; top line of typicality; top line of plots corresponds to and the bottom one – to z. Right hand side vertical axesy and the bottom one – to z.the Right hand side eccentricity/ vertical axes on each plot represent normalized on each plot represent the normalized eccentricity/ typicality, z/t resp.) typicality, z/t resp.) 56 56 32

Articles Articles Articles

VOLUME 8, VOLUME 8, 8, VOLUME

N° 2 N° 22 N°

2014 2014 2014

We can also observe (compare in Fig. 2 the two lines Weplots) can also (compare in Fig. 2 the twotolines of thatobserve by adding just a single point close the of plots) that that by adding just a single to the data sample was eccentric the point wholeclose pattern is data sample that the whole pattern is changing with all was foureccentric data samples becoming more changing in with all four dataeccentricity samples becoming more balanced terms of their and typicality balanced in terms of their eccentricity and (which typicality with higher normalized typicality of 0.286 is with notably higher normalized 0.286 (which is also higher thantypicality 1/k=1/4, of but not so promialso notably 1/k=1/4, but not so prominently now), higher for the than two inner samples, z2 = 12°C and nently now), foristhe twological inner for samples, z2 = 12°C and z4 = 17°C which quite these observations z4 = 17°C which is quite unlike logical iffor thesewere observations and for the UK climate these numbers and for the UK would climatehave unlike if these were numbers of bingo which been completely indepenof bingo which would been independent indeed. Note thathave we do notcompletely need to assume any dent indeed. or Note we do not need to assume distribution to that parameterize it a priori. We any can distribution or todistributions parameterizearound it a priori. We data can derive two local the two derive theabove two data samplestwo thatlocal havedistributions normalized around typicality 1/k, samples typicality above 1/k, namely z2that and have z4 butnormalized this would be knowledge extractnamely z2 and z4 but this would extracted/learned automatically from be theknowledge data. ed/learned automatically from thenow data.we have two This is quite logical because is quite logical because now weofhave veryThis simple groups/data clouds/clusters data two and very simple groups/data clouds/clusters of to data and modes of the distribution. It is important stress modes ofifthe It is important to stress that even it isdistribution. not strongly obvious the two modes of that even if it is not strongly obvious thedata twoautomatimodes of the distribution were derived from the the derived from thenot data automaticallydistribution (they both were are above 1/k = 1/2), assumed or cally (they both above 1/k = 1/2), notfor assumed or pre-defined! In are TEDA there is no need prior aspre-defined! Inthe TEDA there is no need prior assumptions. All useful information is for contained in sumptions. All the useful information is contained in the data distribution. the Finally, data distribution. let us consider a more realistic data stream Finally, letamount us consider a more realistic data stream with a larger of data: n14 = {20.2, 3, 6.4, 11.6, 8.2, {20.2, 3, 6.4,represent 11.6, 8.2, with11.2,5.2, a larger amount n14 =3.8} 2.2, 6.2,0.2, of 1,data: 4.8, 2.4, which 2.2, 11.2,5.2, 6.2,0.2, 1, 4.8, measured 2.4, 3.8} which represent the precipitation (rainfall) (in mm) at Filthe (rainfall) mm) at Filton precipitation station near Bristol, UKmeasured in the first(in two weeks of ton station near Bristol, UKthe in the first two weeks of January 2014 [11]. Due to larger amount (k=14) January 2014 [11]. Due the larger amount descri(k=14) of (still 1D, n=1) data wetoused the procedure of 1D,Algorithms n=1) data we used isthe procedure bed(still in the 1 which based on the descrisquare bed in the Algorithms 1 which is based on theand square Euclidean distance reEuclidean distance and recursive calculations. cursive calculations. It is clearly seen from the It isthat clearly the plots the seen high from amount plots that (over the high amount of rainfall 20 mm) on of (overDay 20 is mm) on therainfall New Years rather the New Years rather untypical. It is Day alsoisuntypiuntypical. It isfirst alsotwo untypical for these wecal these first weeks for of January 2014two to have eks January 2014 to have low oflevel of precipitation, low of precipitation, closelevel to 0. The most typical close to of 0. The most amount rainfall fortypical these amount of rainfall for these two weeks of January 2014 two weeks was 6.2 mm.of January 2014 wasEven 6.2 mm. with these exEven simplistic with these extremely (handtremely simplistic the (handcrafted) examples difcrafted) examples difference that TEDAthe brings ference that TEDA brings in comparison with the train comparison with the traditional probability theory, ditional probability theory, deterministic, possibilistic, deterministic, fuzzy and otherpossibilistic, representafuzzy and other representations is obvious. For examtions is obvious.probability For example, traditional ple, traditional probability theory would suggest equal theory suggest equal (1/3 orwould 1/4) probability for (1/3 or 1/4)(we probability for all samples also do not all samples (we also do not


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

Fig. 3 Real rainfall data from Bristol, UK, first two weeks of January, 2014. The notations are same as in Figure 2 need to build histograms which provide no information about (completely ignore) the inter-sample influence. An alternative which is often (over)used is to impose/assume a distribution or a kernel, for example Gaussian/normal or another type, to determine its parameters (where to position it, how much will be the spread). To escape these problems, sometimes, one can also use a mixture of distributions, but then the parameters that need to be determined are even more and the problem is not fully solved as the distributions are approximations of the real/true ones. On the other hand, in comparison with the fuzzy set theory [7] we do not need to ask experts, to build membership functions. What we only need is to calculate the eccentricity and typicality of each observation and we get (recursively/computationally efficiently using (7)–(10)) in a closed analytical form (equations (2)– (3)) the distributions. Moreover, the information that we have in x and t (resp. z and t) is closer to the nature of real processes (not the pure random ones and not subjective ones, for which the traditional probability and fuzzy set theories, respectively are more appropriate). In particular, from Fig. 2 we can see that y2 = 12°C is the most typical data sample, but its degree of typicality is significantly reduced if another sample is added. On the contrary, the data sample/observation y1=20°C is rather eccentric initially but becomes neutral (both typicality and eccentricity are about 1/k) when we add the fourth sample. Now, if we try to answer the question, “What is the most typical or likely temperature (or amount of rainfall) based on the observations we have (y and z for the temperature and the real observations of the rainfall, v)”. TEDA suggests that based on 3 observations, y the most typical/likely temperature is 12°C (t=t=0.5 that is 50%). Based on the same limited number of observations we can also conclude that to have a temperature 10°C is also not untypical (t=t=0.4 that is 40%), but this is now much closer to 1/k=1/3. To have a temperature 20°C is possible, but not very typical for England (t=t=0.1 that is 10% based on these three observations). If we have another observation of z4=17°C, however the typicality changes significantly (t=0.2 that is 20%)

VOLUME 8, VOLUME 8,

N° 2 N° 2

2014 2014

because it is now based on a quite different (balanced) data pattern, but this is still below the 1/k level which means that it is still untypical (but less so based on these four observations in comparison with the three observations only). For the same observations the traditional probability theory [1] would suggest p=1/3 (same for all the 3 observations) or we would need to choose and parameterize distribution(s). However, the problems of how many distributions to consider, which type of distributions to use for particular data sets, what their parameters are left for the problem solver to decide. Many prefer to approximate the real/ true distributions with some smooth functions (such as Gaussian and others), but these are just approximations. The reason for the difference between TEDA and the traditional probability theory is the spatial awareness which in the traditional probability theory is ignored, but in real processes is a fact. For example, for the very simplistic example we considered above, 2 data samples are quite close and influence each other. TEDA offers instead an automatic mechanism to extract the real/true data distributions and a closed analytical recursive form (which can be differentiated and analyzed) and this is dictated by the data pattern, not pre-defined or assumed. In addition, it does take into account inter-sample influence which is typical for real processes (not pure random ones).

4. Anomaly Detection Based on Eccentricity

Anomaly detection in TEDA can easily and intuitively be done on the basis of eccentricity. For example, any data sample that has high normalised eccentricity (ξ>1/k) is a suspected anomaly. Different algorithms can be developed and applied to image processing and video analytics, fault detection, user behaviour modelling, etc. One can take into account not only the absolute value of x and z, but also the context of the problem at hand. However, the eccentricity offers new angle of view towards the problems in comparison with the traditionally used probability because of the reasons mentioned above. For example, the eccentricity of the data sample/observation y1=20oC is much higher than that of the other data samples but the probability of all samples is equal (1/3). No distributions or kernels needs to be assumed, no need to have large amount of data, the distribution of typicality and eccentricity can be extracted from the data in a closed analytical form, (2)–(3), and is exact (not approximated), recursively updated. One emerging area of research and interest for the society is the study of extreme natural and man-made (anthropogenic) events (including, but Articles Articles

57 33


Journal of Automation, Mobile Robotics & Intelligent Systems

not limited to climate, volcanoes, earthquakes, tsunami, nuclear and other disasters, terrorism, etc.) – traditional probability theory is limited in studying the probability of occurrence of such events and this is also limited by the amount of available data, representativeness of the ‘training data’ which in such problems are a bottleneck, distributions which are not normal etc. TEDA framework offers not only a convenient approach to easily detect anomalies, but also to estimate the degree of severity (how bigger ζ is in comparison with 1/k and, respectively, how smaller τ is in comparison with 1/k).

5. Clustering and Data Clouds Based on Typicality

Clustering is an important part of pattern recognition, machine learning, data mining [1] and many other related areas, including autonomous learning systems [3]. The term “data cloud” was introduced in the so called AnYa framework [8] and differs from clusters by the fact that data clouds have no specific shape, parameters, and boundaries. In TEDA, data clouds (or clustering if that is the preferred form of data partitioning) can be formed on the basis of the typicality. For example, data sample that has the highest t is logical to be selected as the focal point/prototype or a centre of the first data cloud (or cluster). There can be different ways to form the other data clouds (or clusters) but their focal points/ prototypes (or centres) will also have high typicality (e.g. it is logical to require τ>1/k). For example, a zone of influence/radius can be defined and the data points that are outside the zone of influence/radius of the data point with the maximum τ (τmax) and have τ>1/k can be considered as candidates to be prototypes/ focal points of the next data clouds/clusters and the point with the maximum τ but out of these points only (except the data points that fall in the zone of influence associated with the previous focal point(s)) will, be the obvious new focal point, etc. until there are points that satisfy these conditions. It is important to stress that within TEDA framework one can extract automatically and recursively closed form analytical expressions of the real/true distributions of local typicality and eccentricity with the former resembling the membership functions or pdf but being conceptually different (we can argue richer because it takes into account objectively both the frequency of occurrence and the spatial distribution and mutual influences). For example, if we have data of two clusters/data clouds, say coloured blue and red, we can automatically and recursively extract from the real data distribution, xred, tred, and xblue, tblue. In an online and evolving scenario in the memory only the accumulated values per data cloud/cluster can be kept (not for each data sample – see steps 2 and 4 of the TEDA procedure (Algorithm1 in section 2) – these include: xi*, mi*, Xi*,

58 34

Articles

k

∑π ; i =1

k i

VOLUME 8,

N° 2

2014

where i* denotes the index of the data cloud/cluster prototype/focal point. The typicality and eccentricity can be updated for the current, kth data point only plus for the data cloud/cluster prototypes, centres, not for all past, k data (see steps 3 and 5 of the procedure). An important aspect is the dynamic nature of the data streams and their order dependency. One can chose to have a forgetting factor or mechanism to introduce the importance of the time instances when a particular data sample was read. This is important for data streams.

6. Classification based on typicality

Classification is another central element of pattern recognition, machine learning, data mining [1]. Within the TEDA framework classification can be done using local (instead of global) values for x, t, z, and t. The main difference between the global and local expressions is the summation limits – the data samples over which the summation is performed. In the global case, it is performed over all available (by this moment in time) data samples, k. In the local case, the summation is over a group of data samples from a particular class or data cloud/cluster (in general, there may be more than one data cloud/cluster per class [3]); say, if we have data for healthy and ill patients, good and bad examples of something etc. we can accumulate data samples/observations for each one separately and get in this way, x good,tgood, and xbad,tbad. Then the classifiers of zero, first or higher order can be built similarly to the AutoClass concept described in [3, chapter 8]. Zero order classifier means using the label of the classifier (singleton) as an output. First order means using a regression style classifier where for each data cloud/cluster a separate linear regression function over the input features is generated. Higher order classifiers may have non-linear output. In all cases the input part of the classifier can be seen as a clustering or data clouds (or simply data partitioning) which was described in the previous section. Zero order classifiers are more attractive from the point of view that they are easier to interpret and can be fully unsupervised [9]. First order classifiers, on the other hand, can lead to a better performance [10].

7. Prediction (and Control) Based on Typicality

Predictive models and controllers can be built using the multi-model principle where each local submodel is quite simple (e.g. linear or even zero order singleton) [3]. The problem then translates to the decomposition of the data space into (possibly overlapping) local regions in which often overlapping regimes and local behaviours is easier to define and tune. This problem on its own has been demonstrated to be possible to successfully address using clustering (or forming data clouds) in [3]. Clustering was described earlier in section 5, therefore, due to the lack of space in this paper we will limit to just pointing towards the applicability of TEDA framework to develop and design new predictors and controllers which are not based on the traditional probability theory and, thus,


Journal Journal of of Automation, Automation, Mobile Mobile Robotics Robotics & & Intelligent Intelligent Systems Systems Journal of Automation, Mobile Robotics & Intelligent Systems

do do not not suffer suffer from from the the limitations limitations and and assumptions assumptions on on which it is based, e.g. normal or known distribution do not suffer from the limitations and assumptions on which it is based, e.g. normal or known distribution of the variables, (in)dependence of the data samples/ which it is based, e.g. normal or known distribution of the variables, (in)dependence of the data samples/ observations, their limited amount, etc. of the variables, (in)dependence of the data samples/ observations, their limited (not (not infinite) infinite) amount, etc. Moreover, the proposed TEDA framework makes posobservations, their limited (not infinite) amount, etc. Moreover, the proposed TEDA framework makes possible to extract recursively in a closed analytical form Moreover, the proposed TEDA framework makes possible to extract recursively in a closed analytical form the distributions from real sible to extract recursively a closed analytical form the exact exact distributions frominthe the real data. data. the exact distributions from the real data.

8. 8. 8.

Conclusions Conclusions Conclusions In In this this paper, paper, aa new new systematic systematic framework framework for for

data based on typicalityand this paper, a new systematic framework for dataInanalytics, analytics, based on the the typicalityand eccentrieccentricity of the data is proposed which is spatially-aware, data analytics, based on the typicalityand eccentricity of the data is proposed which is spatially-aware, non-frequentist non-parametric. The city of the data isand proposed which is spatially-aware, non-frequentist and non-parametric. The proposed proposed new typicalityand eccentricity-based data analytics non-frequentist and non-parametric. new typicality- and eccentricity-based The data proposed analytics (TEDA) framework is free from prior assumptions new typicalityand eccentricity-based data analytics (TEDA) framework is free from prior assumptions (such as Gaussian or any other specific distribution (TEDA) framework is free from prior assumptions (such as Gaussian or any other specific distribution of of the the to have subjectively defined (such as Gaussian other specific distribution of the data, data, the need needor toany have subjectively defined memmembership functions, kernels, specific proximity/distanthe data, the need to have subjectively defined membership functions, kernels, specific proximity/distance availability of of bership functions, kernels, specificamount proximity/distance measures, measures, availability of infinite infinite amount of data, data, inindependence of the data samples, etc.). Both, typicality ce measures, availability of infinite amount of data, independence of the data samples, etc.). Both, typicality and can be by dependence of the samples, etc.). Both, typicality and eccentricity eccentricity candata be calculated calculated by computationally computationally efficient recursive formulas. Typicality and eccentricity can be calculated by computationally efficient recursive formulas. Typicality resembles resembles fuzfuzzy membership functions (having maximum 1) is efficient recursive formulas. Typicality resembles fuzzy membership functions (having maximum 1) but but is objectively derived from the data pattern (not due to zy membership functions (having maximum 1) but is objectively derived from the data pattern (not due to prior assumptions). Normalised typicality resembles objectively derived from the data pattern (not due to prior assumptions). Normalised typicality resembles pdf aa sum/integral equal 1) spatiallyprior assumptions). Normalised typicality pdf (having (having sum/integral equal to to 1) but but is isresembles spatiallyaware. TEDA does not require any prior assumptions pdf (having a sum/integral equal to 1) but is spatiallyaware. TEDA does not require any prior assumptions and in a more natural manner represents the aware. TEDA does not require any prior assumptions and in a more natural manner represents the real real (not (not purely random) processes, such as climate, economand in a more natural manner represents the real (not purely random) processes, such as climate, economics, processes. TEDA no prior purely random) processes, suchrequires as climate, ics, industrial industrial processes. TEDA requires no economprior asassumptions and provides close analytical expression ics, industrial processes. TEDA requires no prior assumptions and provides close analytical expression and multimodal distributions entirely from sumptions provides close analytical expression and extracts extractsand multimodal distributions entirely from the data. It can be used for development of a range and extracts multimodal distributions entirely from the data. It can be used for development of a range of methods for anomalies and fault detection, image the data. It can be used for development of a range of methods for anomalies and fault detection, image processing, classification, prediction, conof methods clustering, for anomalies and fault detection, processing, clustering, classification, prediction,image control, filtering, regression, etc. In this paper due to the processing, clustering, classification, prediction, trol, filtering, regression, etc. In this paper due toconthe space limitations, only few illustrative examples are trol, filtering, regression, etc. In this paper due to space limitations, only few illustrative examples the are provided aiming proof of concept. space limitations, only few illustrative examples are provided aiming proof of concept. provided aiming proof of concept.

AUTHOR AUTHOR Plamen AUTHOR Plamen Angelov Angelov –Intelligent –Intelligent Systems Systems Research Research Lab, Lab,

VOLUME VOLUME 8, 8, VOLUME 8,

N° N° 22 N° 2

2014 2014 2014

5. 5. 5. 5.

J. J. Iglesias Iglesias et et al., al., “Creating “Creating evolving evolving user user behavior behavior profiles automatically”, IEEE Trans. on J. Iglesias et al., “Creating evolving behavior profiles automatically”, IEEE Trans. user on Knowledge Knowledge Data Engineering, vol. 24, no. 5, May 2012, profiles automatically”, IEEE Trans. on Knowledge Data Engineering, vol. 24, no. 5, May 2012, pp. pp. 854–867. DOI: Data Engineering, 24, no. 5, May 2012, pp. 854–867. DOI: vol.http://dx.doi.org/10.1109/ http://dx.doi.org/10.1109/ TKDE.2011.17 854–867. DOI: http://dx.doi.org/10.1109/ TKDE.2011.17 6. D. Kolev et al., “ARFA: TKDE.2011.17 6. D. Kolev et al., “ARFA: Automated Automated Real-time Real-time Flight Flight Data Analysis using Evolving Clustering, 6. D. Kolev et al., “ARFA: Automated Real-time Flight 6. Data Analysis using Evolving Clustering, Classifiers Classifiers and Density Estimation”. In: IEEE Data Analysis using Evolving Clustering, Classifiers and Recursive Recursive Density Estimation”. In: Proc. Proc. IEEE Symposium Series on Computational Intelligence, and Recursive Density Estimation”. In: Proc. IEEE Symposium Series on Computational Intelligence, SSCI’2013, April 2013, ISBN Symposium Series on Computational Intelligence, SSCI’2013, 16–19 16–19 April 2013, Singapore, Singapore, ISBN 978-1-4673-5855-2/13, pp. 91–97. DOI: http:// SSCI’2013, 16–19 April 2013, Singapore, ISBN 978-1-4673-5855-2/13, pp. 91–97. DOI: http:// dx.doi.org/10.1109/EAIS.2013.6604110 978-1-4673-5855-2/13, pp. 91–97. DOI: http:// dx.doi.org/10.1109/EAIS.2013.6604110 7. L. A. Zadeh, “Fuzzy sets, Information dx.doi.org/10.1109/EAIS.2013.6604110 7. L. A. Zadeh, “Fuzzy sets, Information and and Control, Control, vol. 8, no. 3, 1965, pp. 338–353. DOI: http:// 7. L. A. Zadeh, “Fuzzy sets, Information and 7. vol. 8, no. 3, 1965, pp. 338–353. DOI: Control, http:// dx.doi.org/10.1016/S0019-9958(65)90241-X vol. 8, no. 3, 1965, pp. 338–353. DOI: http:// dx.doi.org/10.1016/S0019-9958(65)90241-X 8. P. 8. dx.doi.org/10.1016/S0019-9958(65)90241-X P. Angelov, Angelov, R.Yager, R.Yager, “A “A New New Type Type of of Simplified Simplified Fuzzy Rule-based Systems, International Journal 8. P. Angelov, R.Yager, “A New Type of Simplified 8. Fuzzy Rule-based Systems, International Journal of General Systems”, v.41(2): 163-185, Jan. Fuzzy Rule-based Systems, International Journal Rule-based Systems, International of General Systems”, v.41(2): 163-185, Jan. 2012. 2012. DOI: http://dx.doi.org/10.1080/03081079.2011. of General 163-185, Jan.no. 2012. Journal of Systems”, General v.41(2): Systems”, vol. 41, 2, DOI: http://dx.doi.org/10.1080/03081079.2011. 634807 DOI: http://dx.doi.org/10.1080/03081079.2011. pp. 163–185, Jan. 2012. DOI: http://dx.doi.org/ 634807 9. B. 634807 9. 10.1080/03081079.2011.634807 B. S. S. J. J. Costa, Costa, P. P. P. P. Angelov, Angelov, L. L. A. A. Guedes, Guedes, “Fully “Fully Unsupervised Fault Detection and 9. B. S. J. Costa, P. P. Angelov, L. A. Guedes, “Fully 9. B. S. P. L. and Identification Unsupervised Fault Detection Identification Based Estimation and Unsupervised Fault Density Detection and Identification Based on on Recursive Recursive Density Estimation and SelfSelfevolving Cloud-based Classifier”, Neurocomputing, Based on Recursive Density Estimation and Selfevolving Cloud-based Classifier”, Neurocomputing, 2014, evolving Cloud-based Classifier”, Neurocomputing, 2014, to to appear. appear. 10. P. Angelov, 2014, to appear. 10. P. Angelov, et et al., al., “Symbol “Symbol Recognition Recognition with with aa new new Autonomously Evolving Classifier 10. P. Angelov, et al., “Symbol Recognition with a new 10. Autonomously Evolving Classifier AutoClass”, AutoClass”, 2014 on Adaptive Autonomously Evolving Classifierand AutoClass”, 2014 IEEE IEEE Conference Conference on Evolving Evolving and Adaptive Intelligent Systems, EAIS-2014, 2–4 June, 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS-2014, 2–4 June, 2014, 2014, Linz, to Intelligent Systems, EAIS-2014, 2–4 June, 2014, Linz, Austria, Austria, to appear. appear. 11. http://www.martynhicks.co.uk/weather/data. Austria, to appear. 11. Linz, http://www.martynhicks.co.uk/weather/data. php?page=m01y2014, 11. http://www.martynhicks.co.uk/weather/data. 11. php?page=m01y2014, accessed accessed 6 6 February February 2014 2014 12. P. Angelov, “Fuzzily Connected php?page=m01y2014, accessed 6 February 2014 12. P. Angelov, “Fuzzily Connected Multi-Model Multi-Model Systems Evolving from 12. P. Angelov, “FuzzilyAutonomously Connected Multi-Model 12. Angelov, Systems Evolving Autonomously from Data Data Streams”, IEEE Transactions on Systems, and Systems Evolving Autonomously from Streams”, IEEE Transactions on Systems, Man, Man,Data and Cybernetics – part B, Cybernetics, vol. 41, no. Streams”, IEEE Transactions on Systems, Man, and Streams”, IEEE Cybernetics – part B, Cybernetics, vol. 41, no. 4, 4, August 898–910. Cybernetics part B, Cybernetics, Cybernetics, vol. 41, no. 4, August 2011, 2011,– pp. pp. 898–910. August 2011, pp. 898–910.

School Computing and Lancaster Plamen –Intelligent Systems Research Lab, School of ofAngelov Computing and Communications, Communications, Lancaster University, LA1 4WA, UK. School of Computing and Communications, Lancaster University, LA1 4WA, UK. University, LA1 4WA, UK. E-mail: E-mail: p.angelov@lancaster.ac.uk p.angelov@lancaster.ac.uk E-mail: p.angelov@lancaster.ac.uk

REFEREnCEs REFEREnCEs REFEREnCEs 1. C. Bishop, Bishop, Machine 1. C. Machine Learning Learning and and Pattern Pattern Springer, 2009. 1. Classification, C. Bishop, Machine Learning and Pattern Classification, Springer, 2009. 2. D. D. Osherson, E. “Discussion: Classification, 2. Osherson, E. E.Springer, E. Smith, Smith,2009. “Discussion: On On typicaltypicalityOsherson, and vagueness”, Cognition, vol. 64, 1997, pp. 2. ity D. E. E. Smith, “Discussion: typicaland vagueness”, Cognition, vol. 64,On 1997, pp. 189–206. ity and vagueness”, Cognition, vol. 64, 1997, pp. 189–206. 3. P. Angelov, Autonomous Learning Systems: From 189–206. 3. P. Angelov, Autonomous Learning Systems: From Data Streams to in time, John 3. Data P. Angelov, Autonomous Learning Systems: Streams to Knowledge Knowledge in Real Real time, From John Willey, Dec. 2012, ISBN: 978-1-1199-5152-0. DOI: Data Streams to Knowledge in Real time, Willey, Dec. 2012, ISBN: 978-1-1199-5152-0. John DOI: http://dx.doi.org/10.1002/9781118481769 Willey, Dec. 2012, ISBN: 978-1-1199-5152-0. http://dx.doi.org/10.1002/9781118481769 DOI: 4. P. P. Angelov, Anomalous Anomalous System http://dx.doi.org/10.1002/9781118481769 4. Angelov, System State State Identification, Identification, GB1208542.9 patent application, priority date,15 4. GB1208542.9 P. Angelov, Anomalous System State Identification, patent application, priority date,15 May 2012. GB1208542.9 patent application, priority date,15 May 2012. May 2012.

Articles Articles Articles

59 59 59 35


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Arm Manipulator Position Control Based On Multi-Input Multi-Output PID Strategy Submitted: 3rd June 2013; accepted: 27th January 2014

Fatima Zahra Baghli, Larbi El Bakkali, Yassine Lakhal, Abdelfatah Nasri, Brahim Gasbaoui DOI 10.14313/JAMRIS_2-2014/17 Abstract: A robot manipulator is a multi-articulated mechanical system, in which each articulation is driven individually by an electric actuator. As the most used robot in industrial application, this system needs an efficient control strategy such as the classical PID control law by means of which each articulation is controlled independently. This kind of control presents a lot of inconvenient, such as error of each articulation isn’t taken account into others. In this work we present a Multi Input Multi Output (MIMO) PID controller to ensure the articulation robot control strategy, the results obtained present satisfactory and shows clearly the efficiency of the present PID-MIMO controller. Keywords: robot, articulation, PID, Control, MIMO

1. Introduction

36

The robot arm manipulators in recent years had a slight growth in the industry. Because in standard industrial the controllers don’t include the non-linearities between the joints of the robot and the problem of modelling the dynamics and motion control arise the complications of the system when computing the inertia tensor of a moving rigid body [1, 4]. The purpose of robot arm control is to maintain the dynamic response of a computer-based manipulator in accordance with some pre specified system performance and goals. Most of the robot manipulators are driven by electric, hydraulic, or pneumatic actuators, which apply torques (or forces, in the case of linear actuators) at the joints of the robot [1, 4,12]. Conventional robot control methods depend heavily upon accurate mathematical modelling, analysis, and synthesis [4]. Dynamic modeling of manipulators is a very active field of research, it’s can be used to investigate the system responses and system properties, like finding the stability of the system [11]. Robot control is the spine of robotics. It consists in studying how to make a robot manipulator do what it is desired to do automatically; hence, it includes in designing robot controllers. Typically, these take the form of an equation or an algorithm which is realized via specialized computer programs. Then, controllers form part of the so-called robot control system which is physically constituted of a computer, a data acquisition unit, actuators (typically electrical motors), the robot itself and some extra “electronics”. In this work

two types of control problems was studied feed forward control and computed torque control [11, 13]. In this work after the system modeling, simulation and control robot manipulator using two articulations for motion using MatLab/Simulink software were carried, when the proposed MIMO PID controlled is used to improve the articulation robot stability. Two types of control PID and PID-MIMO were studied and analysed, and comparative studies were made. The reminder paper was structured as follow: the robot modelling is presented in second part of this paper, in the third part of this paper the PID-MIMO is detailed, the results discussion are presented in the last part of this paper and finally conclusion was given.

2. Robot Description

An industrial robot is defined by ISO as an automatically controlled, reprogrammable, multipurpose manipulator programmable in two or more axes. The field of robotics may be more practically defined as the study, design and use of robot systems for manufacturing (a top-level definition relying on the prior definition of robot). Typical applications of robots include welding, painting, assembly, pick and place (such as packaging, palletizing), product inspection, and testing; all accomplished with high endurance, speed, and precision. Fig. 1. Pick and Place Robot manipulator Motion control: for some applications, such as simple pick-and-place assembly, the robot needs merely to return repeatedly to a limited number of pre-taught positions. For more sophisticated applications, such as welding and finishing (spray painting), motion must be continuously controlled to follow a path in space, with controlled orientation and velocity. Most robot manipulator employed in industrial are controlled by PID algorithms independently at each joint. This kind of control present a lot of inconvenient, for instance the error of each articulation isn’t taken into consideration. This can affect the performance of the system such as the precision, the rapidity and the quality of the product. So, in this work we propose a new approach of control to resolve this problem.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

3. Robot Dynamic Modelling

4. Control Law Used (PID MIMO)

The dynamical equation of manipulator robot of n solids articulated between us is given by the following Lagrange method [12]:

Generally, a classical PID controller of each articulation controlled independently is given with the main following formula [6, 7, 8, and 10]: The classical PID control law of first articulation is given by:

τ = M(q)q&& + C(q,q& ) + G(q )

(1)

Where: & && ∈ n denote the joint angle, the joint velocity and q,q,q the joint acceleration, M(q) ∈ n×ndenote the inertia matrix, C( q,q& ) ∈ n×n denote the centrifugal and Coriolis force matrix, G(q) ∈ n denote the gravitational force vector and t(t) is the torque. In the Figure 2 a schema of a two degree of freedom of arm manipulator is given. x1

y0

a2

q2

y1

q1

x0

Fig. 2. Structure of manipulator robot of two degree of freedom The robot dynamics is defined as:

 (m + m2 )a12 + m2a22 + 2m2a1a2c2 m2a22 + m2a1a2c2  M(q) =  1  m2a22 + m2a1a2c2 m2a22  (2)

 − m2 a1a2 (2q&1q&2 + q&12 ) s2  C (q, q& ) =    m2 a1a2 q&12 s2 

(3)

 (m1 + m2 ) ga1c1 + m2 ga2 c12  G (q) =    m2 ga2 c12

(4)

τ  τ =  1  τ2

(5)

With: c1 = cos (q1 ); c2 = cos (q2 ); s1 = sin(q1 );

s2 = sin(q2 ); c12 = cos (q1 + q2 ); s12 = sin(q1 + q2 )

The Table 1 presents the used robot manipulator simulation parameters. Tab. 1. Used Robot Parameters

Arm 1 Arm2

Weight : mi (kg)

Height : ai (m)

0.432

1.2

0.432

1.5

τ1 (t ) = K p1 ε1 (t ) + K d 1

dε1 (t ) 1 + ε1 (t )dt dt K i1 ∫

(6)

When the classical PID control law of second articulation is given by:

τ2 (t ) = K p2 ε2 (t ) + K d 2

dε2 (t ) 1 + ε2 (t )dt dt K i2 ∫

(7)

Where: e1 and e2 are the main position errors of each articulation controlled independently. The multivariable PID (MIMO-PID) controller of the two motors is given by the following formula:

g

a1

τ i (t ) = K pi  ε i (t ) + [ K di ]

dε i (t ) + [K ii ] ∫ ε i (t )dt (8) dt

Where the terms Kpi, Kii, and Kdi define: * The proportional term: providing an overall control action proportional to the error signal through the all pass gain factor [8, 9, 10]. * The integral term: reducing steady state errors through low frequency compensation by an integrator. * The derivative term: improving transient response through high frequency compensation by a differentiator. And ε i = qdi − qi (i = 1 ,2) represent the error signal, qdi is the input reference signal and Kpi, Kii, Kdi are respectively the gain proportional, integral and derive. There MIMO PID computations parameters is based on the try and error and our controllers have two important considerations: the position references of the two articulations, and the second articulation have take into account the error position of the first articulation Our PID-MIMO parameters are given as follows: Tab. 2. The PID-MIMO Parameters

Kp Ki Kd

Articulation 1

Articulation 2

1200

1200

2500

2500

400

400

5. Simulations Results SISO control based on classical PID model and MIMO model based on multi input multi outputs PID were tested to sinus response trajectory. This simulation applied to two degrees of freedom robot arm was implemented in Matlab/Simulink. Trajectory performance, torque performance and position error are compared in these controllers. Articles

37


Journal of Automation, Mobile Robotics & Intelligent Systems

ε1 (t )

q1* q2*

ε 2 (t )

PID1

τ 1 (t )

PID 2

τ 2 (t )

VOLUME 8,

N° 2

2014

q1

Arm Manipulator

q2

Fig. 3. Arm manipulator robot classical PID Control

ε1 (t )

q1*

τ 1MIMO (t )

q

q1

Arm Manipulator

PID MIMO

* 2

q2

τ 2MIMO (t )

ε 2 (t )

Fig. 4. Arm manipulator robot PID-MIMO Control

The trajectory performances: Figures (5, 6, 7) are show tracking performance for first and second arm (link) with PID and PID- MIMO for sinus trajectories. By comparing sinus trajectory with PID and PIDMIMO: For the first link controlled by PID, the output does not coincide with the reference (Fig.5) but by the PID-MIMO they coincident as shown in (Fig.5) PID’s overshoot (3%) is higher than PID-MIMO (0%).

1

output ref

1 Angle [rad]

ref output

2

PID- Second link

1.5

PID- First link

3

Angle [rad]

For the second link controlled by PID, the output does not coincide with the reference (Fig. 6) but by the PID-MIMO they coincident after t=2s as shown in (Fig. 8).

0

0.5 0 -0.5 -1

-1

-1.5

-2 -3 12

14

16 time [sec]

18

20

0

2

4 6 time [sec]

8

10

Fig. 7. PID –Second link trajectory

Fig.5. PID (First link trajectory) PID MIMO- Second link

1.5

PID MIMO- First link

3

0.5

Angle [rad]

Angle [rad]

1

ref output

2 1 0 -1

15

16

17 time [sec]

18

Fig. 6. PID-MIMO (First link trajectory) 38

Articles

0 -0.5 -1

-2 -3 14

output ref

19

20

-1.5

0

1

3 2 time [sec]

Fig. 8. PID MIMO (Second link trajectory)

4

5


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

Error computation compare: Figures 9 and 10 are showes error performance, by comparing position error for the first and second link; PID’s error is higher than PID-MIMO. First link steady state error

1

PID PID MIMO

error [rad]

0.5

0

15

17

16

18

time [sec]

19

20

Fig. 9. PID and PID-MIMO for the first link compared error 1

Second link steady state error

PID MIMO PID

Error [rad]

0.5

0

-0.5 14

15

16

17

time [sec]

18

19

20

Fig. 10. PID and PID-MIMO for the first link compared error We can summaries all the obtained results in the Table 2: Tab. 3. PID and PID MIMO Results Controller

PID

PID-MIMO

Links

Link1

Link2

Link1

Link2

Position error[rad]

0.519

0.265

0.045

0.014

1108

315.5

170.5

238

overshoot [%] Torque [Nm]

2014

AUTHORS Fatima Zahra Baghli*, Larbi El Bakkali, Yassine Lakhal – Laboratory Modeling and Simulation of Mechanical Systems, Abdelmalek Essaadi, Faculty of Sciences, BP.2121, M’hannech, 93002, Tétouan, Morocco. E-mail: baghli.fatimazahra@gmail.com. E-mail: larbi_elbakkali_20@hotmail.com. Abdelfatah Nasri, Brahim Gasbaoui – Bechar University, B.P 417 Bechar , 08000, Algeria. E-mail: nasriab1978@yahoo.fr. E-mail: z.gasbaoui@yahoo.com. *Corresponding author

-0.5 14

N° 2

3%

0%

0%

6. Conclusion In this present work an arm manipulator robot using two degree of freedom was controlled using two types of controls strategies, SISO control based on classical PID model and MIMO model based on multi inputs multi outputs PID controller (PID-MIMO) , this last one present maximum control structure of our control model and give more and more efficiency for the robot model with more position stability and good dynamical performances with no overshoot so industrials would take into account the efficiency of the developing control model for the futures two freedom robot design considerations.

References [1] Kelly R., Santibáñez V., Loría A., Control of Robot Manipulators in Joint Space, 2nd Edition, Springer, 2005. [2] Johnson M.A., Moradi M.H., PID Control New Identification and Design Methods, Springer, 2005. [3] Christos K. Volos, Motion direction control of A robot based on chatotic synchronization, JAMRIS, vol. 7, no. 2, 2013, 64–69. [4] David I., Robles G., “PID control dynamics of a Robotics arm manipulator with two degrees of Freedom”, Control de Processos y Robotica, 17th August 2012, 3–7. [5] Nasri A., Gasbaoui B., “A novel multi-drive electric vehicle system control based on multi-input multi-output PID controller”, Serbian Journal of Electrical Engineering, vol. 9, no. 2, June 2012, 275–291. [6] K.H. Ang, Chang G., Yun Li, “PID control system analysis, design and technology”, IEEE Transaction on Control System Technology, vol. 13, no. 4, 2005, 559–577. [7] Astrom K.J., Hagglund T., PID controllers: theory, design, and tuning, 2nd ed., Publ.: Instrument society of America, 1995. [8] Wang J.S., Zhang Y., Wang W., "Optimal design of PI/ PD controller for non-minimum phase system", Transactions of the Institute of Measurement and Control, 2006, vol. 28, no.1, 27–35. DOI: http:// dx.doi.org/10.1191/0142331206tm160oa [9] Bingul Z., A new PID tuning technique using differential evolution for unstable and integrating processes with time delay, ICONIP 2004 "Proceedings Lecture Notes in Computer Science", 3316, 254–260. [10] Allaoua B., Laoufi A., Gasbaoui B., "Multi-Drive Paper System Control Based on Multi-Input Multi-Output PID Controller", Leonardo Journal of Sciences, Issue 16, January–June 2010, 59–70. [11] Spong M.W., Hutchinson S., Vidyasar M., Robot dynamics and Control, 2nd ed., January 2004. [12] Baghli F., Lakhal Y., El bakkali L., “Contrôle dynamique d’un bras manipulateur à deux dégrés de liberté par un contrôleur PID”. IN: 11ème Congrès international de Mécanique, Agadir, 23–26 April 2013. [13] Turki Hussein M., Simulation of Robot. Articles

39


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

A New Heuristic Possibilistic Clustering Algorithm for Feature Selection Submitted: 14th September 2013; accepted 10th December 2013

Janusz Kacprzyk, Jan W. Owsinski, Dimitri A. Viattchenin DOI 10.14313/JAMRIS_2-2014/18 Abstract: The paper deals with the problem of selection of the most informative features. A new effective and efficient heuristic possibilistic clustering algorithm for feature selection is proposed. First, a brief description of basic concepts of the heuristic approach to possibilistic clustering is provided. A technique of initial data preprocessing is described and a fuzzy correlation measure is considered. The new algorithm is described and then illustrated on the well-known Iris data set benchmark and the results obtained are compared with those by using the conventional, well-known and widely employed method of principal component analysis (PCA). Conclusions and suggestions for future research are given. Keywords: feature selection, fuzzy correlation measure, possibilistic clustering, heuristic possibilistic clustering, fuzzy cluster

1. Introduction

1.1. Preliminary Remarks

40

The reduction of dimensionality of the feature space analyzed is very important a problem in data analysis. Feature selection is meant here as the dimensionality reduction of the feature space of data that has initially contained a high number of features. The purpose of the future selection process is to choose a minimal number (subset) of the original set of features which still contain information that is essential for the discovering of substructure in the data while reducing the computational complexity implied by using a high number of features in the source problem formulation. Feature selection has been a fertile field of research, and has been development since the 1970s proving to be effective and efficient in removing irrelevant and/or redundant features, increasing the efficiency of learning, improving the learning performance characterized by, for instance, predictive accuracy, and enhancing the comprehensibility of results obtained. Many different feature selection methods have been proposed, cf. for example [1], [2], [3], and [4]. Fuzzy clustering methods can well be applied to solve the problem of feature selection. In particular, a combination of feature selection with feature weights and semi-supervised fuzzy clustering in machine learning is proposed by Kong and Wang [5]. On the other hand, a fuzzy feature selection method based on clustering was proposed by Chitsaz, Taheri,

and Katebi [6]. In the corresponding FACA-algorithm, each feature is assigned to different fuzzy clusters with different grades of membership. This comes from the basic underlying idea that each feature may belong not only to just one cluster, and it is much better to consider an association of each feature with other features in each cluster. Precise relations between features are therefore available during the selection of the most relevant features. An extension of the FACA-algorithm is considered by Chitsaz, Taheri, Katebi and Jahromi [7] who have introduced four different techniques for implementing the stage of feature selection. For example, by applying the chi-square test, their approach considers the dependence of each feature on class labels in the process of feature selection.

1.2. A heuristic Approach to Possibilistic Clustering

The objective function based fuzzy clustering algorithms are the most widely employed methods in fuzzy clustering (cf. Bezdek, Keller, Krishnapuram and Pal [8]). Some heuristic clustering algorithms are based on the definition of the very concept of a cluster and the purpose of these algorithms is to find clusters according how they have been defined. Such algorithms are called direct classification (or clustering) algorithms (cf. Mandel [9]). An outline of a new heuristic method of fuzzy clustering is presented by Viattchenin [10] who has considered a basic version of a direct clustering algorithm, while a version of such an algorithm, called the D-AFC(c)-algorithm, is presented in Viattchenin [11]. The D-AFC(c)-algorithm can be considered as a direct possibilistic clustering algorithm, as in [11]. The DAFC(c)-algorithm has been shown there to be a basis for the family of other heuristic possibilistic clustering algorithms. The direct heuristic possibilistic clustering algorithms can be divided into two types: relational and prototype-based. In particular, the family of direct relational heuristic possibilistic clustering algorithms includes: – The D-AFC(c)-algorithm which is based on the construction of an allotment (to be defined later on in this paper) among an a priori given number c of partially separate fuzzy clusters [10]; – The D-AFC-PS(c)-algorithm which is based on the construction of an allotment among an a priori given number c of partially separate fuzzy clusters in the presence of labeled objects [11];


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

– The D-PAFC-algorithm which is based on the construction of an allotment among an unknown number of at least c fully separate fuzzy clusters [12]. It should be noted that the D-PAFC-algorithm can be applied to solve the problem of informative feature selection. The corresponding method was also proposed by Viattchenin [12]. On the other hand, the family of direct prototypebased heuristic possibilistic clustering algorithms, proposed by Viattchenin [13], includes: – The D-AFC-TC-algorithm which is based on the construction of an allotment among an unknown number c of fully separate fuzzy clusters; – The D-PAFC-TC-algorithm which is based on the construction of a principal allotment among an unknown minimal number of at least c fully separate fuzzy clusters; – The D-AFC-TC(α)-algorithm which is based on the construction of an allotment among an unknown number c of fully separate fuzzy clusters with respect to a minimal value a of a tolerance threshold. The unique allotment among an unknown number ñ of fuzzy clusters can be selected from the set of allotments depending on a tolerance threshold. The main goal of this paper is to propose a new effective and efficient heuristic possibilistic clustering algorithm for solving the feature selection problem. The contents of this paper is as follows: in Section 2 some basic concepts of the heuristic approach to possibilistic clustering are briefly presented. In Section 3 a fuzzy correlation measure is proposed and a suitable method of data preprocessing is shown. In Section 4 the new heuristic possibilistic clustering algorithm is described. In Section 5 the new method is illustrated on the well-known benchmark example of the Iris data set, and a comparison with the results obtained by using the well-known and widely employed method of the principal components analysis (PCA) is presented. Conclusions are given in Section 6.

2. Outline of the Heuristic Possibilistic Clustering Algorithm

Let us remind the basic idea of, and the concepts related to the heuristic approach to possibilistic clustering. The concept of a fuzzy tolerance relation is the basis for the concept of a fuzzy a–cluster and that is why the definition of a fuzzy tolerance relation must be considered in the first place. Let X = { x1 ,...,x n } be the initial set of elements and T : X × X → [0 ,1] be some binary fuzzy relation on X with mT ( x i ,x j ) ∈[0 ,1] , ∀x i ,x j ∈ X , being its membership function. A fuzzy tolerance relation is a fuzzy binary relation which is symmetric, i.e.

mT ( x i ,x j ) = mT ( x j ,x i ) , ∀x i ,x j ∈ X,

and reflexive, i.e. mT ( x i ,x i ) = 1 , ∀x i ∈ X.

(1)

(2)

N° 2

2014

A fuzzy similarity relation S is a fuzzy binary relation which is symmetric (1), reflexive (2), and (maxmin)-transitive, i.e.: mS ( x i ,x k ) ≥ ∨ ( mS ( x i ,x j ) ∧ mS ( x j ,x k )),

x j ∈X

∀x i ,x j ,x k ∈ X .

(3)

Let some fuzzy binary relation be represented by a matrix R of size n×n, and let us define

R1 = R , R n = R n−1 o R , n = 2 ,3 ,K .

(4)

( R = R1 U R 2 UKU R n ,

(5)

Now, the transitive closure of a fuzzy relation R is ( the fuzzy binary relation R defined by

where the operation U for two fuzzy relations R d and R g is defined as

µ

Rd US g

( x i ,x j ) = µ d ( x i ,x j ) ∨ µ g ( x i ,x j ) , R

R

∀x i ,x j ∈ X ,

(6)

and the composition R d o R g of two fuzzy relations R d and R g is defined as

m

Rd oR g

( x i , x k ) = ∨ [ m d ( x i , x j ) ∧ m g ( x j , x k )], x j ∈X

R

R

∀xi , xk ∈ X .

(7)

( It should be noted that the transitive closure T of a fuzzy tolerance relation T is a fuzzy similarity relation S. Let a be the a-level value of the fuzzy tolerance relation T, a ∈(0 ,1] . The columns and rows of the fuzzy tolerance matrix (relation) are fuzzy sets { A1 ,..., An } on X. Let l ∈{1 ,K ,n} , be a fuzzy set on X with m l ( x i ) ∈[0 ,1] , ∀x i ∈ X , being its membership funcA tion. The a-level fuzzy set

{

A(l a ) = ( x i , m l ( x i )) | m l ( x i ) ≥ a , x i ∈ X A

A

}

a fuzzy a -cluster. So, A(l a ) ⊆ Al , a ∈(0 ,1] , Al ∈{ A1 ,K , An } , and m l ( x i ) is the degree of memA bership of the element x i ∈ X in some fuzzy a-cluster l A( a ) , a ∈(0 ,1], l ∈{1 ,K ,n}. This degree of membership will be denoted by mli , for simplicity. The value of a is a tolerance thresh- old of elements of the fuzzy acluster. The membership degree of an element x i ∈ X in some fuzzy a -cluster A(l a ) , a ∈(0,1] , l ∈{1,K , n}, can be defined as a  m l ( x i ), x i ∈ Aal (8) mli =  A otherwise 0, Articles

41


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

where the a-level Aal = { x i ∈ X | m l ( x i ) ≥ a} , a ∈(0,1] , A of the fuzzy set Al is the support of the fuzzy a-clusl ter A( a ). The value of the membership function of each element of the fuzzy a-cluster is the degree of similarity of the object to some typical object (member) of the fuzzy a-cluster. Moreover, this membership degree defines a possibility distribution function for some fuzzy a-cluster A(l a ) , a ∈(0,1] , to be denoted by pl ( x i ). Let { A(1a ) ,..., A(na ) } be the family of fuzzy a-clusters for some a, a ∈(0,1] . The point t el ∈ Aal such that t el = arg max mli , ∀x i ∈ Aal

(9) xi is called a typical point of the fuzzy a-cluster A(l a ) , a ∈(0,1], l ∈[1, n]. Obviously, a fuzzy a-cluster can have several typical points, and therefore the symbol e is the index of a particular typical point. Let Rza ( X ) = { A(la ) | l = 1, c , 2 ≤ c ≤ n} be the family of fuzzy a-clusters, for some value a ∈(0,1] of the tolerance threshold, which are generated by a fuzzy tolerance relation T on the initial set of elements X = { x1 ,..., x n }. If the condition c

∑m

l =1

li

> 0 , ∀x i ∈ X

(10)

is met for all A(l a ) , l = 1, c , c ≤ n , then this family is an allotment of elements of the set X = { x1 ,..., x n } among the fuzzy a-clusters { A(l a ) , l = 1, c ,2 ≤ c ≤ n} , for some value a ∈(0,1] of the tolerance threshold. It should be noted that several allotments Rza ( X ) can exist for any tolerance threshold. That is why the symbol z is used an the index of a particular allotment. The allotment among the fuzzy a-clusters can be considered as a possibilistic partition and the fuzzy a-clusters meant in the sense of (8) are elements of a possibilistic partition (cf. Krishnapuram and Keller [14]). However, our next analyses will proceed using the concept of an allotment as introduced above. A next relevant concept will now be introduced. An allotment RIa ( X ) = { A(l a ) | l = 1, n, a ∈(0,1]} of the set of objects among n fuzzy a-clusters for some threshold a is the initial allotment of the set X = { x1 ,..., x n } . In other words, if the initial data are represented by a fuzzy tolerance matrix (relation) T, then the rows and/or columns of this matrix are fuzzy sets Al ⊆ X , l = 1, n , and the a-level fuzzy sets Al ⊆ X , l = 1, n, and the a-level fuzzy sets A(l a ) , l = 1, n, a ∈(0,1] , are the fuzzy a-clusters. These fuzzy a-clusters constitute an initial allotment for some tolerance threshold a ∈(0,1] and they can be considered as clustering components. If some allotment Rza ( X ) = { A(l a ) | l = 1, c , c ≤ n} is considered to be appropriate for the formulation of a specific problem under consideration, then this allotment is an adequate allotment. In particular, if the conditions c

∑ card( Aal ) ≥ card( X ) , ∀A(la) ∈Rza ( X ) , a ∈(0,1] l =1

42

Articles

and

N° 2

card(Rza ( X )) = c ,

2014

(11)

card( Aal ∩ Aam ) ≤ w , ∀A(l a ) , A(ma ) , l ≠ m , a ∈(0,1]

(12)

are met for all fuzzy a-clusters A(l a ) , l = 1, c , of some allotment Rza ( X ) = { A(l a ) | l = 1, c , c ≤ n} , then this is the allotment among the particular separate fuzzy a-clusters, and w ∈{0,K , n} is the maximum number of elements in the intersection of different fuzzy a-clusters. If w = 0 in the conditions (11) and (12), then this is the allotment among fully separate fuzzy a-clusters. An adequate allotment Rza ( X ) , for some value a ∈(0,1] of the tolerance threshold, is a family of a-clusters which are elements of the initial allotment RIa ( X ) for that value of a , and the family of fuzzy a-clusters satisfies the conditions (11) and (12). Clearly, several adequate allotments can exist. Thus, the problem consists in the selection of the unique adequate allotment R ∗ ( X ) from the set B of adequate allotments, B = {Rza ( X )} , which is the class of possible solutions of the classification problem considered, and B = {Rza ( X )} depends on the parameters of that classification problem. On the other hand, the concept of a principal allotment was introduced by Viattchenin [12] and defined as follows: an allotment RPa ( X ) = { A(l a ) | l = 1, c } of the set of objects among the minimal number c, 2 ≤ c ≤ n , of fully separate fuzzy a-clusters, for some tolerance threshold a ∈(0,1] , is the principal allotment of the set X = {x1 ,..., xn } . The selection of the unique adequate allotment from the set B = {Rza ( X )} of adequate allotments is made on the basis of an evaluation of allotments. The criterion employed for the evaluation of allotments is

1 l ∑ mli − a ⋅ c , l =1 nl i =1 c

F (Rza ( X ), a) = ∑

n

(13)

where c is the number of fuzzy a-clusters in the allotment Rza ( X ) and nl = card( Aal ) , A(l a ) ∈ Rza ( X ) , is the number of elements in the support of the fuzzy acluster A(l a ) . The maximum value of the criterion (13) corresponds to the best allotment of objects among c fuzzy a-clusters. So, the classification problem can be characterized formally as the determination of an optimal solution R ∗ ( X ) satisfying

R ∗ ( X ) = arg max F (Rza ( X ), a) , a ( X )∈B Rz

(14)

where B = {Rza ( X )} is the set of adequate allotments corresponding to the formulation of the particular classification problem considered.

3. A Fuzzy Correlation Measure

A prototype based clustering methods can be applied if the objects are represented as points in some multidimensional space I m ( X ) . The respective data,


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

composed of n objects and m attributes, are then represented as Xˆ n×m = [ xˆ it ] , i = 1,K , n , t = 1,K , m . Let X = { x1 ,..., x n } be the set of objects. Then, the data matrix can be represented as follows: Xˆ n×m

 xˆ 11  xˆ 2 = 2 K  1  xˆ n

xˆ 12 K xˆ 1m  xˆ 22 K xˆ 2m   . K K K  xˆ n2 K xˆ nm 

(15)

that is, be represented as Xˆ = ( xˆ 1 ,K , xˆ m ) containing ndimensional column vectors xˆ t , t = 1,K , m , composed of elements of the t-th column of Xˆ . The data can be normalized as follows [15]: t i

x =

xˆ it − min xˆ it t

max xˆ it − min xˆ it t

t

, i = 1,K , n , t = 1,K , m . (16)

so that each attribute can be interpreted as a fuzzy set x t , t = 1,K, m , with µ t ( x i ) being its memberx ship function. t k Let x and x , t , k ∈{1,K , m} , be two fuzzy sets with their corresponding membership functions m t ( x i ) and m k ( x i ) , respectively. x x A fuzzy correlation measure is defined by Chaudhuri and Bhattacharya [16] as follows:   µ t ( xi ) µ k ( xi )  n t k x x r( x , x ) = 1 −  ∑ − 1 1  i =1  n n l l    l l  x x m ( ) m ( ) ∑ ∑ i i     xt xk  i =1 i =1

l

1

l    ,   

(17)

where 0 < l < ∞ is a parameter. As noted by Chaudhuri and Bhattacharya [16], the computational complexity clearly increases with the increase of l . The matrix of fuzzy correlation coefficients for the data can be normalized as follows [similarly as in (16)]:

 r ( x t , x k ) − min r ( x t , x k )    t ,k   . ~ r (x , x ) = t k t max r ( x , x ) − min r ( x , x k ) t

k

t ,k

t ,k

(18)

so that the matrix of fuzzy correlation coefficients after normalization can be viewed as the matrix of a fuzzy tolerance relation.

4. Description of the New Algorithm

The idea of the proposed new algorithm is that attributes can be classified and a typical point of each fuzzy a-cluster can be considered to constitute an informative attribute. The proposed algorithm finds an unknown number c of disjoint fuzzy a-clusters and assigns each attribute to one of the clusters. The attributes in each cluster should have a high correlation with each other while being poorly correlated with attributes in other clusters. This method uses

N° 2

2014

the fuzzy correlation (17) as the similarity measure. The proposed algorithm can be considered as some modification of Viattchenin’s [11] [12] D-PAFC-TCalgorithm used for informative features selection, and will therefore be called the D-PAFC-TC-FS. The new algorithm is basically a classification procedure which involves 10 steps:

1. Construct the matrix of the fuzzy tolerance relation Tm×m = [ mT ( xt , x k )] , t, k=1, …, m, by normalizing the initial data Xˆ n×m = [ xˆit ] , i = 1,K , n , t = 1,K , m , according to (16) and( (17); 2. Construct the transitive closure T of the fuzzy tolerance relation T due to (4) – (7); 3. Construct the ordered sequence of a -levels, for 0 < a0 < a1 < K < al < K < aZ ≤ 1 ,( by the decomposition of the transitive closure T of the fuzzy tolerance relation T; ( 4. Construct the fuzzy relation T( a1 ) for the consecutive values of a1 , a1 ∈(0,1] ; 5. Construct the initial( allotment RIa ( X ) = { A(l a1 ) } for the fuzzy relation T( a1 ) ; construct the allotments which satisfy the conditions (11) and (12), for w = 0; 6. Construct the class of possible solutions of the classification problem B( a1 ) = {Rza1 ( X )} and calculate the value of criterion (13) for each allotment Rza1 ( X ) ∈ B( a1 ) ; 7. Check: if for some unique allotment Rza1 ( X ) ∈ B( a1 ) the condition (14) is met then the allotment is the result of classification RPa1 ( X ) for the value a1 sought, and STOP else select the subset of allotments B '( a1 ) ⊆ B( a1 ) which satisfy the condition (14) and go to step 8; 8. Perform the following operations for each allotment Rza1 ( X ) ∈ B '( a1 ) : 8.1 Set l := 1 ; 8.2 Find the support Supp( A(l a1 ) ) = Aal 1 of the fuzzy acluster A(ál 1) ∈ Rza1 ( X ) and construct the matrix of attributes X nl ×m = [ x it ] , x t ∈ Aal 1 , i = 1,K , n , for Aal 1 , where nl = card( Aal 1 ) ; 8.3 Calculate a prototype τ l = {x1 ,..., xn } of class Aal 1 according to the formula:

xi =

1 nl

xt ∈Al

a1

x it , i = 1,K, n ;

8.4 Calculate the fuzzy correlation r( t l , t l ) between the typical point t l of the fuzzy a-cluster A(l a1 ) and its prototype t l ; 8.5 Check: if not all fuzzy a-clusters A(l a1 ) ∈ Rca(1z ) ( X ) have been tested then set l := l + 1 and go to step 8.2 else go to step 9 9. Compare the fuzzy a-clusters A(l a1 ) which are elements of different allotments Rza1 ( X ) ∈ B '( a1 ) , and take the allotment Rza1 ( X ) ∈ B '( a1 ) in which the fuzzy correlation r( t l , t l ) is minimal for all fuzzy a-clusters A(l a1 ) obtained as a results of the classification RPa1 ( X ) ; Articles

43


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Figure 1. Results for the Iris data set 10. Select as the most informative attributes the typical points of fuzzy a-clusters of the principal allotment RPa1 ( X ) obtained. The results of the classification sought are the typical points of the obtained principal allotment RPa1 ( X ) = { A(l a1 ) | l = 1, c } among fully separate fuzzy a-clusters and the value of the tolerance threshold a1 ∈(0,1] .

5. An Illustrative Example

44

The application of the D-PAFC-TC-FS-algorithm to feature selection can be illustrated on the well-known Iris data benchmark which was presumably first given by Anderson [17]. The Iris data set concerns the different types of Iris flowers and consists of values of the sepal length, sepal width, petal length and petal width for 150 Iris varieties. The problem is to classify the plants into three subspecies on the basis of this information. Let us consider the problem of most informative feature selection in the setting of this data set. The Iris data set forms the matrix of attributes X 4×150 = [ xˆ it ] , i = 1,K ,150 , t = 1,K ,4 , in which the sepal length vector is denoted by xˆ 1 , the sepal width vector is denoted by xˆ 2 , the petal length vector is denoted by xˆ 3 , and the petal width vector is denoted by xˆ 4 . The D-PAFC-TC-FS-algorithm was applied directly to the normalized matrix of attributes for different values of the parameter l , 0 < l < ∞ . The principal allotment among two fuzzy a-clusters was obArticles

tained in all cases. For example, the results of using the D-AFC-TC-FS-algorithm for l = 1 are presented in Table 1. The corresponding principal allotment RP0.6692 ( X ) was obtained for the value of the tolerance threshold equal a = 0.6692 . By executing the D-PAFC-TC-FS-algorithm for l = 2 we also obtain two fuzzy a-clusters in the principal allotment RP0.5583 ( X ) . The matrix of the principal allotment RP0.5583 ( X ) is presented in Table 2. For l = 3 we obtain the principal allotment RP0.4917 ( X ) among two fuzzy a-clusters. The corresponding matrix is presented in Table 3. The second feature xˆ 2 is the typical point of the first fuzzy a-cluster and the third feature xˆ 3 is the typical point of the second fuzzy a-cluster in each case. Obviously, the features xˆ 2 and xˆ 3 can be selected as the most informative features. So, the two-dimensional projection of the Iris data can be constructed as presented in Figure 1. Two well separated classes can be distinguished, and then visualized. The first class corresponds to the Iris Setosa. The second class corresponds to the Iris Versicolor and Iris Virginica. The objects known to be the Iris Setosa are represented by ∎ in Figure 1, those known to be the Iris Versicolor are represented by ○, and those known to be the Iris Virginica are represented by ∎. It is worth noticing that the result is similar to that obtained by using the conventional method of the prin-


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

Table 1. The memberships values obtained by the D-PAFCTC-FS-algorithm for l=1 Class 1

2

Features

xˆ1

xˆ 2

0.0000

1.0000

0.6692

0.0000

xˆ 3

xˆ 4

0.0000

0.0000

1.0000

0.8347

Table 2 The memberships values obtained by the D-PAFCTC-FS-algorithm for l=2 Features

Class

xˆ1

xˆ 2

xˆ 3

xˆ 4

1

0.0000

1.0000

0.0000

0.0000

2

0.5583

0.0000

1.0000

0.7635

Table 3 The memberships values obtained by the D-PAFCTC-FS-algorithm for l=3 Features

Class

xˆ1

xˆ 2

xˆ 3

xˆ 4

1

0.0000

1.0000

0.0000

0.0000

2

0.4917

0.0000

1.0000

0.7161

Table 4 Factor loads in the principal component analysis Features

xˆ1

xˆ 2 xˆ 3 xˆ 4

Principal components

z1

z2

z3

z4

0.89

0.36

0.28

0.04

-0.46

0.88

-0.09

-0.02

0.96

0.06

-0.24

0.08

0.99

0.02

-0.05

-0.12

cipal component analysis (cf. Sato-Ilic and Jain [18]). An interpretation of the obtained principal components can be made on the basis of factor loading. The factor loading is defined as a correlation coefficient between the v -th principal component zv and the t-th attribute xˆ t , t = 1,K , m , as follows [18]:

f ( zv , xˆ t ) =

cov{ zv , xˆ t }

V { zv }V { xˆ t }

,

(19)

N° 2

2014

where V { zv } is the variance of zv , V { xˆ t } is the variance of xˆ t , and cov{ zv , xˆ t } is the covariance between z v and xˆ t . In Table 4 the values of factor loadings (19) are shown which can represent a relationship between each principal component and each attribute. From the results obtained we can see how each component is explained by the attributes. This is related to the interpretation of each component. In Table 4 the first principal component is mainly explained by the attributes: sepal length, petal length, and petal width. Moreover, we can see a high correlation between the second principal component and the attribute of sepal width. From the comparison between the results obtained and shown in Tables 1, 2, 3 and 4, we can see similar outcomes. In particular, in Table 4, values of the membership function of the first fuzzy a-cluster of the principal allotment in each case can be interpreted as the normalized values of factor loadings f ( z2 , xˆ t ), t = 1,K ,4 , and values of the membership function of the second fuzzy a-cluster of the principal allotment can be considered as the normalized values of factor loadings f ( z1 , xˆ t ) , t = 1,K ,4 .

6. Concluding Remarks

First, from a more general point of view, one should mention that the concepts of a fuzzy a -cluster and allotment have quite a clear epistemological motivation. That is why the results of application of the possibilistic clustering method based on the concept of allotment can be very well interpreted. The D-PAFC-TC-FS-algorithm of possibilistic clustering is proposed in this paper. The determination of an unknown number of the most informative features is the main feature of the proposed algorithm. The DPAFC-TC-FS-algorithm can be considered as a version of the method of extremal grouping of features. The algorithm is based on the concept of a principal allotment among the fuzzy a-clusters, and an unknown minimal number of compact and well-separated fuzzy a-clusters is the result of classification. The typical points of the fuzzy a-clusters can be selected as the most informative features. Moreover, the D-PAFC-TCFS-algorithm does not depend on parameters and can be applied directly to the data given as a matrix of attribute values. The results of application of the algorithm proposed to the Iris data show that the DPAFC-TC-FS-algorithm is an effective and efficient numerical procedure for solving the informative feature selection problem. As for future extensions, for instance, the proposed D-PAFC-TC-FS-algorithm can be extended by including different fuzzy correlation measures exemplified by those proposed by Murthy, Pal and Dutta Majumder [19], and Chiang and Lin [20]. Acknowledgements The research has been partially supported by, first, the National Centre of Science under Grant No. UMO-2011/01/B/ST6/06908 (J. Kacprzyk and J.W. Owsiński), and second, the Mianowski Fund and the Ministry of Foreign Affairs of the Republic of PoArticles

45


Journal of Automation, Mobile Robotics & Intelligent Systems

land (D. Viattchenin). The authors are grateful to Mr. Aliaksandr Damaratski for developing software used for numerical experiments, and to anonymous referees for their valuable comments.

AUTHORS

Janusz Kacprzyk*, Jan W. Owsiński – 1Systems Research Institute, Polish Academy of Sciences 6 Newelska St., 01-447 Warsaw, Poland. E-mails: {kacprzyk, owsinski}@ibspan.waw.pl Dimitri A. Viattchenin – United Institute of Informatics Problems, National Academy of Sciences of Belarus 6 Surganov St., 220012 Minsk, Belarus. E-mail: viattchenin@mail.ru *Corresponding author References

46

[1] Blum A. , Langley P., “Selection of relevant features and examples in machine learning”, Artificial Intelligence, vol. 97, no.1–2, 1997, 245–271. DOI: http://dx.doi.org/10.1016/ S0004-3702(97)00063-5 [2] Kohavi R. , John G., “Wrappers for feature subset selection”, Artificial Intelligence, vol. 97, no. 1–2, 1997, 273–324. DOI: http://dx.doi.org/10.1016/ S0004-3702(97)00043-X [3] Ghazavi S.N., Liao T.W., “Medical data mining by fuzzy modeling with selected features”, Artificial Intelligence in Medicine, vol. 43, no. 3, 2008, 195–206. DOI: http://dx.doi.org/10.1016/j.artmed.2008.04.004 [4] Draminski M., Kierczak M., Nowak-Brzezinska A., Koronacki J., and Komorowski J., “The Monte Carlo feature selection and interdependency discovery is unbiased”, Control and Cybernetics, vol. 40, no. 2, 2011, 199–211. [5] Kong Y.-Q., Wang S.-T., “Feature selection and semi-supervised fuzzy clustering”, Fuzzy Information and Engineering, vol. 1, no. 2, 2009, 179–190. [6] Chitsaz E., Taheri M., Katebi S.D., “A fuzzy approach to clustering and selecting features for classification of gene expression data”. In: Proc. World Congress of Engineering (WCE 2008), 2008, 1650–1655. [7] Chitsaz E., Taheri M., Katebi S.D., Jahromi M.Z., “An improved fuzzy feature clustering and selection based on chi-squared-test”. In: Proc. Int. Multiconference of Engineers and Computer Scientists (IMECS 2009), 2009, 35–40. [8] Bezdek J.C., Keller J.M., Krishnapuram R., Pal N.R., Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Springer Science, New York, 2005. DOI: http://dx.doi.org/10.1007/ b106267. [9] Mandel I.D., Clustering Analysis, Finansy i Statistica, Moscow, 1988. (in Russian) [10] Viattchenin D.A., “A new heuristic algorithm of fuzzy clustering”, Control and Cybernetics, vol. 33, no. 2, 2004, 323–340. Articles

VOLUME 8,

N° 2

2014

[11] Viattchenin D.A., “A direct algorithm of possibilistic clustering with partial supervision”, Journal of Automation, Mobile Robotics and Intelligent Systems, vol. 1, no. 3, 2007, 29–38. [12] Viattchenin D.A., “An algorithm for detecting the principal allotment among fuzzy clusters and its application as a technique of reduction of analyzed features space dimensionality”, J. Information and Organizational Sciences, vol. 33, no. 1, 2009, 205–217. [13] Viattchenin D.A., “Direct algorithms of fuzzy clustering based on the transitive closure operation and their application to outliers detection”, Artificial Intelligence, no. 3, 2007, 205–216. (in Russian) [14] Krishnapuram R., Keller J.M., “A possibilistic approach to clustering”, IEEE Trans. on Fuzzy Systems, vol. 1, no. 2, 1993, 98–110. DOI: http:// dx.doi.org/10.1109/91.227387 [15] Walesiak M., A generalized distance measure in statistical multivarate analysis., Pub. Wydawnictwo Akademii Ekonomicznej im. Oskara Langego, Wrocław, 2002 (in Polish). [16] Chaudhuri B.B., Bhattacharya A., “On correlation between two fuzzy sets”, Fuzzy Sets and Systems, vol. 118, no. 3, 2001, 447–456. [17] Anderson E., “The irises of the Gaspe Peninsula”, Bulletin of the American Iris Society, vol. 59, no. 1, 1935, 2–5. [18] Sato-Ilic M., Jain L.C., Innovations in Fuzzy Clustering: Theory and Applications, Springer-Verlag, Heidelberg, 2006. [19] Murthy C.A., Pal S.K., Dutta Majumder D., “Correlation between two fuzzy membership functions”, Fuzzy Sets and Systems, vol. 17, no. 1, 1985, 23–38. DOI: http://dx.doi.org/10.1016/01650114(85)90004-1 [20] Chiang D.-A., Lin N.P., “Correlation of fuzzy sets”, Fuzzy Sets and Systems, vol. 102, no. 2, 1999, 221–226. DOI: http://dx.doi.org/10.1016/ S0165-0114(97)00127-9


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

Extracting Fuzzy Classifications Rules from Three-way Data Submitted 28th October 2013; accepted 19th January 2013

Janusz Kacprzyk, Jan W. Owsinski, Dmitri A. Viattchenin

DOI: 10.14313/JAMRIS_2-2014/19 Abstract: The paper deals in the conceptual way with the problem of extracting fuzzy classification rules from the three-way data meant in the sense of Sato and Sato [7]. A novel technique based on a heuristic method of possibilistic clustering is proposed. A description of basic concepts of a heuristic method of possibilistic clustering based on concept of an allotment is provided. A preprocessing technique for three-way data is shown. An extended method of constructing fuzzy classification rules based on clustering results is proposed. An illustrative example of the method’s application to the Sato and Sato’s data [7] is provided. An analysis of the experimental results obtained with some conclusions are given. Keywords: three-way data, possibilistic clustering, fuzzy cluster, typical point, fuzzy rule

1. Introduction Some remarks on fuzzy inference systems and a brief review of methods of extracting fuzzy classification rules based on fuzzy clustering are considered in the first subsection. The second subsection includes a discussion of basic types of the uncertainty of the data and specifies the purpose of the paper.

1.1. Preliminary Remarks

Fuzzy inference systems are presumably the best known and most popular applications of fuzzy logic and fuzzy sets theory. They can be employed to perform classification tasks, process simulation and diagnosis, online decision support and process control, to name a few areas. So, the problem of generation of fuzzy classification rules (to be called fuzzy rules here, for brevity) is one of the most relevant problems in the development of fuzzy inference systems. There are a number of approaches to learning fuzzy rules from data, for instance based on various techniques of evolutionary or neural computing, mostly aiming at the optimization of parameters of fuzzy rules. On the other hand, fuzzy clustering seems to be a very appealing and useful method for learning fuzzy rules since there is a close and canonical connection between fuzzy clusters and fuzzy rules. The idea of deriving fuzzy classification rules from data can be formulated as follows: the training data

set is divided into homogeneous groups and a fuzzy rule is associated with (characterizing) each group. Fuzzy clustering procedures are exactly pursuing this strategy: a fuzzy cluster is represented by a cluster center and the membership degree of a datum to the cluster is decreasing with an increasing distance to the cluster center. So, each fuzzy rule of a fuzzy inference system can be characterized by a typical point and a membership function that is decreasing with an increasing distance to the typical point. Let us consider some methods for extracting fuzzy rules from the data using fuzzy clustering algorithms. Some basic definitions should first be given. Suppose that the training set contains n data pairs. Each pair is made up of a m1 – dimensional input vector and a ñ -dimensional output vector. We assume that the number of rules in the rule base of the fuzzy inference system is ñ . A Mamdani type [5] rule l within the fuzzy inference system is written as follows: If xˆ 1 is B l1 and Kand xˆ m1 is B lm1

then y1 is C 1l and Kand y c is C cl

,

(1)

where Blt 1 , t1 ∈ {1,K , m1 } and C ll , l ∈ {1,K , c } are fuzzy sets that define an input and output space partitioning.

A fuzzy inference system which is described by a set of fuzzy rules of the form (1) is a multiple inputs, multiple outputs (MIMO) system. Note that any fuzzy rule of the form (1) can be presented by c rules of the multiple inputs single output (MISO) type: If xˆ 1 is Bl1 and Kand xˆ m1 is Blm1 then y1 is C1l K

If xˆ 1 is Bl1 and Kand xˆ m1 is Blm1 then yc is Ccl . (2)

Let Blt 1 be characterized by its membership function γ B t1 ( xˆ t 1 ) . This membership function can be l triangular, Gaussian, trapezoidal, or of any other suitable shape. In this paper, we consider the trapezoidal and triangular membership functions which are of a particular relevance for applications. Fuzzy classification rules can be obtained directly from fuzzy clustering results. In general, a fuzzy

47


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

2014

clustering algorithm aims at minimizing the objective function [1]

subject to the constraint of a possibilistic partition

c

Q( P , Τ ) =

n

∑∑υ

γ

li d( x i

l =1 i =1

,τ l ) ,

(3)

subject to the constraints

∀i ∈ {1,K , n} ,

and

(4)

c

∑υli = 1 , ∀l ∈ {1,K, c} ,,

(5)

l =1

where X = { x 1 ,K , x n } ⊆ ℜ m1 is the data set, c is the number of fuzzy clusters A l , l = 1,K , c , in the fuzzy c -partition P, υ li ∈[0,1] is the membership degree of object xi to fuzzy cluster Al , τ l ⊆ ℜ m1 is a prototype for a fuzzy cluster Al , d( x i ,τ l ) is a distance between the prototype τ l and object xi , and the parameter γ > 1 is an index of fuzziness. The selection of the value of γ determines whether the cluster tend to be more crisp or fuzzy. The membership degrees can be calculated as following υ li =

1

c

 d( x i ,τ l )     d( x ,τ a )  i   a =1

1 ( γ −1 )

,

(6)

and the prototypes can be obtained from the formula n

l

τ =

∑υ i =1 n

γ

li

⋅ xi

υ liγ

i =1

.

(7)

The expressions (6) and (7) are clearly the necessary conditions for (3) to have a local minimum. However, the condition (5) is hard to satisfy for obvious reasons. So, a possibilistic approach to clustering was proposed by Krishnapuram and Keller [4]. In particular, the objective function (3) is replaced by

c

a =1

(

c

Q( Υ , Τ ) =

l

(

Articles

l =1 i =1

c

ψ li

)

d( x i ,τ l ) + η l (1 − µ li )ψ ,

(8)

> 1 , ∀l ∈ {1,K , c } ,

li

l =1

(9)

where c is the number of fuzzy clusters Al , l = 1,K , c , in the possibilistic partition Y, µ li ∈[0,1] is the possibilistic memberships which are typicality degrees, τ l ⊆ ℜ m1 is a prototype for the fuzzy cluster Al, d( x i ,τ l ) is a distance between the prototype τ l and object xi , and the parameter ψ > 1 is meant analogously as the index of fuzziness. The degrees of typicality can be calculated as follows:

µ li =

1

(

1 + d ( x i ,τ l ) η l

)

1 (ψ −1)

,

(10)

and the parameters η l , l = 1,K, c , are derived by

ηl =

K

n

∑υ ψli i =1

n

∑υ ψli d ( xi ,τ l ) , i =1

(11)

where K = 1 .

The principal idea of extracting fuzzy classification rules based on fuzzy clustering is as follows (cf. Höppner, Klawonn, Kruse and Runkler [2]). Each fuzzy cluster is assumed to be assigned to one class for classification and the membership degrees of the data to the clusters determine the degrees to which they can be classified as members of the corresponding classes. So, with a fuzzy cluster that is assigned to the some class we can associate a linguistic rule. The fuzzy cluster is projected into each single dimension leading to a fuzzy set defined in the real line. From a mathematical point of view, the membership degree t t of the value xˆ 1 to the t1 -th projection γ t1 ( xˆ 1 ) of Bl l the fuzzy cluster A , l ∈ {1,K, c} is the supremum t over the membership degrees of all vectors with xˆ 1 as t1 -th component to the fuzzy cluster, i.e.

or

γ Bt1 ( xˆ t1 ) = sup  1 1 + d ( xi ,τ l ) η l l

n

∑∑ (µ

∑µ

γ Bt1 ( xˆ t1 ) = sup1 ∑ d ( xi ,τ l ) d ( xi ,τ a )

48

N° 2

)

1 (γ −1)

 xi = ( xˆi1 ,K, xˆit1 −1 , xˆit1 , xˆit1 +1 ,K, xˆim1 ) ∈ ℜ m1  

(12)

)

1 (ψ −1)

xi = ( xˆi1 ,K, xˆit1 −1 , xˆit1 , xˆit1 +1 ,K, xˆim1 ) ∈ ℜ m1   (13)


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

in the possibilistic case. An approximation of the fuzzy set by projecting only the data set and computing the convex hull of this projected fuzzy set, or approximating it by a trapezoidal or triangular membership function, is used for the rules obtained [2]. The objective function based fuzzy clustering algorithms are the most widespread methods in fuzzy clustering. However, they may be sensitive to the selection of an initial partition, and the fuzzy rules sought may depend on the selection of the fuzzy clustering method employed. In particular, the GG (GathGeva) algorithm and the GK (Gustafsson-Kessel) algorithms of fuzzy clustering are recommended in [2] for the generation of fuzzy rules. All algorithms of possibilistic clustering are also the objective functions based algorithms. On the other hand, a heuristic approach to possibilistic clustering was outlined by Viattchenin [8] and then further developed in next publications. Moreover, a method for an automatic generation of fuzzy inference systems using heuristic possibilistic clustering was outlined by Viattchenin [13]. This method was extended for the case of the interval-valued data in [14].

1.2. Types of Clustering Structures

Most fuzzy clustering techniques are designed for handling crisp data augmented with their class membership degrees. However, the data can be uncertain. The initial data to be processed by clustering algorithms may be characterized y different types of uncertainty. For example, a brief review of uncertain data clustering methods is given in [11]. An interval uncertainty of the initial data is a basic type of uncertainty in clustering. The interval valued data are a particular case of the three-way data as meant by Sato and Sato [7]. The clustering problem for the case of the three-way data can be formulated as follows [7, 11]. Let X = { x1 ,..., x n } be a set of objects, where objects are indexed by i, i = 1,K , n; each object xi is described by m1 attributes, indexed by t1 , t1 = 1,K , m1 , so that an object x i can be represented by a vector x i = ( x i1 ,K , x it1 ,K , x im1 ) ; each attribute xˆ t1 , t1 = 1,K , m1 , can be characterized by m2 values of binary attributes, so that xˆ it1 = ( xˆ it1 (1) ,K , xˆ it1 (t2 ) ,K , xˆ it1 ( m2 ) ) . Therefore, the three-way data can be represented as follows:

Xˆ n×m1 ×m2 = [ xˆ it1 (t2 ) ] , i = 1,K , n ,

t1 = 1,K , m1 , t 2 = 1,K , m2 .

(14)

In other words, the three-way data are the data which are observed by the values of m1 attributes with respect to n objects for m2 situations. The purpose of the clustering is to classify the set X = { x1 ,..., x n } into c fuzzy clusters and the number of clusters c can be unknown because it can depend on the situation. The initial data matrix (14) can be represented as a set of m2 matrices Xˆ nt2×m1 = [ xˆ it1 ] , i = 1,K , n , t1 = 1,K , m1 and a “plausible” number c of fuzzy clus-

N° 2

2014

ters can be different for each matrix Xˆ nt2×m1 = [ xˆ it1 ] , t 2 ∈{1,K , m2 } . The structure of clustering of the data set depends clearly on the type of the initial data. Three types of the clustering structures were defined by Viattchenin [16]. First, if the number of clusters c is constant for each matrix Xˆ nt2×m1 = [ xˆ it1 ] , t 2 ∈{1,K , m2 } , and the coordinates of prototypes { τ 1 ,K , τ ñ } of the clusters { A1 ,K , Añ } are constant, then the clustering structure is called stable. Second, if the current number of clusters c is constant for each matrix Xˆ nt2×m1 = [ xˆ it1 ] , t 2 ∈{1,K , m2 } , and the coordinates of prototypes of the clusters are not constant, then the clustering structure is called quasi-stable. Third, if the number of clusters c is different for the matrices Xˆ nt2×m1 = [ xˆ it1 ] , t 2 = 1,K , m2 , then the clustering structure is called unstable. The detection of the most plausible fuzzy clusters in the clustering structure sought for the uncertain data set X can be considered as a final goal of classification and the construction of the set of values of the most possible number of fuzzy clusters with their corresponding possibility degrees is an important step in this way. The method of discovering a unique clustering structure which corresponds to most natural allocation of objects among fuzzy clusters for the uncertain data set was proposed by Viattchenin [16]. The main goal of this paper is to present the idea of a novel approach to extracting fuzzy rules from the three-way data. The contents of this paper is as follows: in the second section the basic concepts of the heuristic approach to possibilistic clustering are presented, a preprocessing technique for the three-way data is given and a technique of prototyping fuzzy classification rules from the three-way data based on the heuristic possibilistic clustering is proposed. In the third section an illustrative example of application of the proposed technique to Sato and Sato’s [7] three-way data set is given, and in the fourth section some conclusions are formulated.

2. A Novel Approach to Extracting Fuzzy Rules from the Three-way Data

In the first subsection some basic concepts of the heuristic approach to possibilistic clustering are discussed. The second subsection includes some remarks on the preprocessing of the three-way data. A technique of extracting fuzzy rules from the threeway data is described in the third subsection.

2.1. Basic Concepts of the Heuristic Method of Possibilistic Clustering

Heuristic algorithms of fuzzy clustering are characterized by a low level of complexity and a high level of essential clarity. Some heuristic clustering algorithms are based on the definition of the concept of a cluster and the aim of these algorithms is to detect cluster that conform to a given definition. Due to Mandel [6] such algorithms can be called algorithms of direct classification or direct clustering algorithms. An outline for a new heuristic method of fuzzy clustering was presented by Viattchenin [8] where a basic version of a direct clustering algorithm was Articles

49


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

described, A version of the algorithm that is called the D-AFC(c)-algorithm was given in [9]. The D-AFC(c)algorithm can be considered as a direct algorithm of possibilistic clustering. This fact was demonstrated in [9]. The D-AFC(c)-algorithm is the basis of an entire family of heuristic algorithms of possibilistic clustering. The heuristic approach to possibilistic clustering was further developed in other publications. The direct heuristic algorithms of possibilistic clustering can be divided into two types: relational versus prototype-based. In particular, the family of direct relational heuristic algorithms of possibilistic clustering includes: The D-AFC(c)-algorithm via the construction of an allotment among an a priori given number c of partially separate fuzzy clusters [8]; The D-AFC-PS(c)-algorithm via the construction of an allotment among an a priori given number c of partially separate fuzzy clusters in the presence of labeled objects [9]; The D-PAFC-algorithm via the construction of an allotment among an unknown number of at least c fully separate fuzzy clusters [12]. On the other hand, the family of direct prototypebased clustering procedures, proposed in [10] includes: The D-AFC-TC-algorithm via the construction of an allotment among an unknown number c of fully separate fuzzy clusters; The D-PAFC-TC-algorithm via the construction of a principal allotment among an unknown minimal number of at least c fully separate fuzzy clusters; The D-AFC-TC(α)-algorithm via the construction of an allotment among an unknown number c of fully separate fuzzy clusters with respect to the minimal value α of a tolerance threshold. Let us remind some basic concepts of the heuristic method of possibilistic clustering in question. The concept of a fuzzy tolerance is the basis for the concept of a fuzzy α -cluster. That is why the definition of a fuzzy tolerance must be considered in the first place. Let X = { x1 ,..., x n } be an initial set of elements and T : X × X → [0,1] be some binary fuzzy relation on X with mT ( x i , x j ) ∈[0,1] , ∀x i , x j ∈ X , being its membership function. A fuzzy tolerance is a fuzzy binary intransitive relation that is symmetric

µT ( x i , x j ) = µT ( x j , x i ) , ∀x i , x j ∈ X ,

(15)

µT ( x i , x i ) = 1 , ∀x i ∈ X .

(16)

and reflexive

50

The notions of a powerful fuzzy tolerance, a feeble fuzzy tolerance and a strict feeble fuzzy tolerance were considered by Viattchenin [8] as well. In this context the classical fuzzy tolerance in the sense of (15)–(16) has been called an usual fuzzy tolerance in [8]. However, the essence of the method considered here does not depend on any particular kind of a fuzzy tolerance, and is described for any fuzzy tolerance T. Let α be an α -level value of the fuzzy tolerance T, a ∈(0,1] . Columns or rows of the fuzzy tolerance matrix Articles

N° 2

2014

are fuzzy sets { A1 ,..., An } on X. Let Al , l ∈{1,K , n}, be a fuzzy set on X with m l ( x i ) ∈[0,1] , ∀x i ∈ X , beA ing its membership function. The α -level fuzzy set l A( α ) = ( x i , µ l ( x i ))| µ l ( x i ) ≥ α , x i ∈ X is a fuzzy α A A -cluster. So, A(lα ) ⊆ Al , α ∈(0,1] , Al ∈{ A1 ,K , An } , and m l ( x i ) is the membership degree of the element A x i ∈ X for some fuzzy α -cluster A(lα ) , α ∈(0,1] , l ∈{1,K , n} . This mem- bership degree will be denoted µ li for brevity in further considerations. A value of α is a tolerance threshold of fuzzy α -cluster elements. The membership degree of an element xi ∈ X for some fuzzy α -cluster A(lα ) , α ∈(0,1] , l ∈{1,K , n} , can be defined as

{

}

µ l ( x i ), x i ∈ Aαl , µ li =  A otherwise 0,

(17)

τ el = arg max µ li , ∀x i ∈ Aαl

(18)

where the α -level of a fuzzy set Al , Aαl = { x i ∈ X | µ l ( x i ) ≥ α } , α ∈(0,1] , is the support of A the fuzzy α -cluster A(lα ) . The value of the membership function of each element of the fuzzy α -cluster is the degree of similarity of the object to some typical object of the fuzzy α -cluster. Moreover, the membership degree defines a possibility distribution function for some fuzzy α -cluster A(lα ) , α ∈(0,1] , and this possibility distribution function is denoted π l ( xi ) . Let { A(1α ) ,..., A(nα ) } be the family of fuzzy α -clusters for some α . The point τ el ∈ Aαl , for which

xi

is called a typical point of the fuzzy α -cluster A(lα ) , α ∈(0,1] , l ∈[1, n] . Obviously, a fuzzy α -cluster can have several typical points. That is why the symbol e is the index of a typical point. Let Rzα ( X ) = { A(lα ) | l = 1, c , 2 ≤ c ≤ n} be a set of fuzzy α -clusters for some value of the tolerance threshold α which are generated by a fuzzy tolerance T from the initial set of elements X = { x1 ,..., x n } . If the condition c

∑µ l =1

li

> 0 , ∀x i ∈ X

(19)

is met for all A(lα ) , l = 1, c , c ≤ n , then this set is an allotment of elements of the set X = { x1 ,..., x n } among fuzzy α -clusters { A(lα ) , l = 1, c ,2 ≤ c ≤ n} for some value of the tolerance threshold α . It should be noted that several allotments Rzα ( X ) can exist for some tolerance threshold α . The number of allotments Rzα ( X ) depend on the initial data structure. That is why the symbol z is the index of an allotment. Allotment RIα ( X ) = { A(lα ) | l = 1, n, α ∈(0,1]} of the set of objects among n fuzzy α -clusters for some threshold α is an initial allotment of the set X = { x1 ,..., x n } . In other words, if the initial data are represented by a matrix of some fuzzy T, then rows or columns of the matrix are fuzzy sets Al ⊆ X, l = 1,..., n , and α -level fuzzy sets A(lα ) , l = 1,..., n , a ∈(0,1] , are


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

fuzzy α -clusters. These fuzzy α -clusters constitute an initial allotment for some tolerance threshold and they can be considered as clustering components. If some allotment Rzα ( X ) = { A(lα ) | l = 1,..., n, c ≤ n} is considered appropriate for the problem considered, then this allotment is called an adequate allotment. In particular, if the conditions c

and

∑ card( A l =1

l α

) ≥ card( X ) , ∀A(lα ) ∈ Rzα ( X ) ,

α ∈(0,1] , card(Rzα ( X )) = c ,

card( Aαl ∩ Aαm ) ≤ w , ∀A(lα ) , A(mα ) , l ≠ m , a ∈(0,1] ,

(20)

(21)

are met for all fuzzy α -clusters A(l a ) , l = 1,..., n , of some allotment Rzα ( X ) = { A(lα ) | l = 1,..., n, c ≤ n} , then this allotment is the allotment among particular separate fuzzy α -clusters and w ∈{0,K , n} is the maximum number of elements in the intersection area of different fuzzy α -clusters. If w = 0 in the conditions (20) and (21), then this allotment is the allotment among fully separate fuzzy α -clusters. An adequate allotment Rzα ( X ) for some value of the tolerance threshold a ∈(0,1] is a family of fuzzy α -clusters which are elements of the initial allotment RIα ( X ) for the value of α , and the family of fuzzy α -clusters satisfies the conditions (20) and (21). The problem consists in the selection of an unique adequate allotment R ∗ ( X ) from the set B of adequate allotments, B = {Rzα ( X )} , which is the class of possible solutions of the specific classification problem and B = {Rzα ( X )} depends on the parameters of the classification problem. In particular, the number c of fuzzy α -clusters is a parameter of the D-AFC(c)-algorithm. The selection of the unique adequate allotment among a fixed number c of fuzzy α -clusters from the set B = {Rzα ( X )} of adequate allotments c is to be made on the basis of an evaluation of allotments. The criterion

1 l ∑ µ li − α ⋅ c , l =1 nl i =1 c

F (Rzα ( X ), α ) = ∑

n

(22)

where c is the number of fuzzy α -clusters in the allotment Rzα ( X ) and nl = card( Aαl ) , A(l a ) ∈ Rza ( X ) , is the number of elements in the support of the fuzzy α -cluster A(l a ) , can be used for evaluation of allotments. The maximum value of the criterion (22) corresponds to the best allotment of objects among c fuzzy α -clusters. So, the classification problem can be formally characterized as the determination of a solution R ∗ ( X ) satisfying

R ∗ ( X ) = arg max F (Rzα ( X ), α ) , Rα z ( X )∈B

(23)

where B = {Rzα ( X )} is the set of adequate allotments corresponding to the formulation of a specific classification problem considered.

N° 2

2014

Thus, the problem of cluster analysis can be defined as the problem of discovering an allotment R ∗ ( X ), resulting from the classification process, and the detection of a fixed number c of fuzzy α -clusters can be considered as the goal of classification. A description of the corresponding D-AFC(c)-algorithm is presented in [8, 9, 11, 15]. The most “plausible” number ñ of fuzzy α -clusters in the allotment R ∗ ( X ) sought can be considered as an index for the cluster validity problem for the DAFC(c)-algorithm. Different validity measures for the D-AFC(c)-algorithm were proposed in [15]. In particular, the measure of separation and compactness of the allotment can be defined in the following way: , (24) where Θ is a set of elements x j , j ∈{1,K , n}, in all intersection areas of different fuzzy α -clusters, and the density of fa uzzy α -cluster, D( A(lα ) ), is defined in [15] as follows: D( A(lα ) ) =

1 nl

l xi ∈Aα

µ li,

(25)

where nl = card( Aαl ) , A(lα ) ∈ R ∗ ( X ) and membership degree µ li is defined by the formula (17). The measure of separation and compactness of an allotment, VMSC (R ∗ ( X );c ) , increases when c is closer to n. That is why the optimum value of c is obtained by minimizing VMSC (R ∗ ( X );c ) over where 2 ≤ cmin c < n and max . So, the choice of the measure (24) can be interpreted as the search for an optimal number ñ of fuzzy α -clusters in the allotment R ∗ ( X ) sought.

2.2. Remarks on the Preprocessing of Three-way Data

The D-AFC(c)-algorithm can be applied directly to the data given as a fuzzy tolerance matrix T = [ mT ( x i , x j )], i , j = 1,K , n. This means that it can be used by choosing a suitable metric to measure the similarity. The threeway data can be normalized as follows: x

t1 ( t2 ) i

=

xˆ it1 (t2 ) − min xˆ it1 (t2 ) i ,t2

max xˆ it1 (t2 ) − min xˆ it1 (t2 ) i ,t2

i ,t2

.

(26)

So, each object x i , i = 1,K , n , from the initial set X = { x1 ,K , x n } can be considered as a type-two fuzzy set and x it1 (t2 ) = µ xi ( x t1 (t2 ) ) , i = 1,K , n ; t1 = 1,K , m1 , t 2 = 1,K , m2 , x t1 (t2 ) = µ t1 ( x t2 ) ∈[0,1] , t1 = 1,K , m1 , t 2 = 1,K , m2 ,are its membership functions. In the case of three-way data each object xi , i = 1,K, n can be represented as a matrix X( i )m1 ×m2 = [ x it1 (t2 ) ] , t1 = 1,K , m1 , t 2 = 1,K , m2 . Dissimilarity coefficients between the objects can be constructed on the basis of generalizations of distances between fuzzy sets [11] and these generalizations take into account dissimilarities between the Articles

51


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N° 2

2014

attributes of objects and the situations. In particular, a generalization of the squared normalized Euclidean distance for type-two fuzzy sets can be described by , (27) for all i , j = 1,K , n . In the case of m2 = 1, the formula (27) can be rewritten as the usual squared normalized Euclidean distance [3]: ε( x i , x j ) =

m

(

)

2 1 1 µ xi ( x t1 ) − µ x j ( x t1 ) , ∑ m1 t1 =1

i , j = 1,K , n .

t

(28)

The fuzzy tolerance matrix I = [µ I ( x i , x j )] , i , j = 1,K , n, can be obtained by the application of the complement operation

µT ( xi , x j ) = 1 − µ I ( xi , x j ) , ∀i , j = 1,K , n , (29)

to the fuzzy intolerance matrix I = [µ I ( x i , x j )] . However, the value m2 can be different for different attributes xˆ t1 , t1 ∈{1,K , m1 } , or the value m2 of grades for a fixed attribute xˆ t1 , t1 ∈{1,K , m1 } can be different for different objects x i , i ∈{1,K , n} . So, each object x i , i = 1,K , n , cannot be presented as a matrix X( i )m1 ×m2 = [ x it1 (t2 ) ] , t1 = 1,K , m1 , t 2 = 1,K , m2 , because the value m2 , which is general for all attributes xˆ t1, t1 ∈{1,K , m1 }, must be established. In these cases a general value m2 can be defined as follows: m2 = max m2(t1 ) , t1 = 1,K , m1 ,

(30) t1 where the number of grades of each attribute xˆ t1, t1 ∈{1,K , m1 } , is denoted by m2(t1 ) . However, values x it1 (t2 ) , i ∈{1,K , n} may be unknown for some objects x i ∈ X , i ∈{1,K , n} . In such a case, an unknown values x it1 (t2 ) , i ∈{1,K , n} , can be defined heuristically as follows: x

t1 ( t2 ) i

= max t t1

( t1 ) 2

( t1 ) 2

, i ∈{1,K , n} , t 2 = 1,K , m

. (31)

Obviously, the preprocessing method for the three-way data can be very simply generalized for the case of p -way data, for p >3.

2.3. A Method of Fuzzy Rules Extraction from the Three-way Data

52

Let us consider a method of extracting fuzzy classification rules based on a heuristic method of possibilistic clustering [13]. In the following, we will consider that the Mamdani type fuzzy inference system is a multiple inputs, multiple outputs system (MIMO). The antecedent of a fuzzy rule in the fuzzy inference system defines a decision region in the m1 -dimensional feature space. Let us consider a fuzzy rule t (1) where Bl 1 , t1 = 1,K, m1 , l ∈{1,K, c} , is a fuzzy Articles

Fig. 1. A trapezoidal membership function for an antecedent fuzzy set t

set associate with the attribute variable xˆ 1 . Let Bl 1 be characterized by its trapezoidal membership funct tion γ t1 ( xˆ 1 ) which is presented in Fig. 1. Bl

So, the fuzzy set Blt1 can be defined by four pat t rameters, Blt1 = (a(1l ) , m(1l ) , m(t1l ) , a(tl1) ) . A triangular fuzzy t1 t1 t1 t1 set Bl = (a (l ) , m(l ) , a( l ) ) can be considered as a particular case of the trapezoidal fuzzy set where t m (1l ) = m(tl1) . The idea of deriving fuzzy rules from fuzzy α -clusters was outlined by Viattchenin [13] and this method can be extended to the case of the three-way data as follows. We apply the D-AFC(c)-algorithm to the given three-way data and then obtain for each fuzzy α -cluster A(lα ) , l ∈{1,K , c } a kernel K ( A(lα ) ) and a support Aαl . The value of the tolerance threshold α ∈(0,1] , which corresponds to an allotment R ∗ ( X ) = { A(1α ) ,K , A(cα ) } , is an additional result of classification. The situation of the three-way data can be described by the expression xˆ it1 = ( xˆ it1 (min) , xˆ it1 (max) ) , t1 = 1,K , m1 , i = 1,K , n . In particular, the interval (min) (min) [ xˆ (t1l )min , xˆ (t1l )max ] of values of each attribute t1 t1 (min) ˆx = ( xˆ , xˆ t1 (max) ) , t1 ∈{1,K , m1 } for the support Aαl should be calculated. We calculate the interval (min) (min) [ xˆ (t1l )min , xˆ (t1l )max ] of values of each attribute xˆ t1 , (min) t1 ∈{1,K , m1 } , for the support Aαl . The value xˆ (t1l )min can be obtained ass (min) xˆ (t1l )min = min xˆ t1 (min),

l xi ∈Aα

∀t1 ∈{1,K , m1 }, ∀l ∈{1,K , c },

(32)

(max) and the value xˆ (t1l )max , t1 ∈{1,K , m1 } , as (max) xˆ (t1l )max = max xˆ t1 (max) ,

l xi ∈Aα

∀t1 ∈{1,K , m} , ∀l ∈{1,K , c } .

(33)

t

The parameter a(1l ) can be obtained from γ

t B 1 l

(min) ( xˆ (t1l )min ) = (1 − α ) , γ

t B 1 l

t

(a(1l ) ) = 0 ,

(34)


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

and the parameter a(tl1) from γ

t1 l

B

(max) ( xˆ (t1l )max ) = (1 − α ) , γ

t1 l

B

t1 (l )

(a ) = 0 .

(35)

t (min) We calculate the value xˆ (1l ) for all typical points l τ ∈ K ( A( α ) ) of the fuzzy α -cluster A(lα ) , l ∈{1,K , c }, as follows:

l e

t (min) xˆ (1l ) =

min xˆ t1 (min) , ∀e ∈{1,K , l } , (36)

l ∈K ( Al ) τe (α )

(α )

)

t

Thus, the parameter m(1l ) can be calculated from γ

max xˆ t1 (max) , ∀e ∈{1,K , l } . (37)

l ∈K ( Al τe

t B 1 l

t (min)

( xˆ (1l )

)= γ

t B 1 l

t

(m(1l ) ) = 1 ,

2014

threshold, µ l = min µ li and µ l = max µ li . This case is l xi ∈Aα

l xi ∈Aα

presented in Fig. 2. On the other hand, if A(lá ) and A(má ) , l ≠ m, are two particularly separated fuzzy α -clusters, then the condition w ≠ 0 is met in the equation (21). So, a fuzzy set Cml = (0, 1 − mm , 1 − mm , 1 − a) is the consequent for the variable ym of the l-th fuzzy rule for the case of a low membership degree. The corresponding case is illustrated by Fig. 3. Suppose that the membership functions γ l ( yl )

of the fuzzy sets C ll , l = 1,K , c , are trapezoidal.

C

l

The trapezoidal membership functions γ l ( yl )

and the value xˆ (tl1) (max) can be obtained from xˆ (tl1) (max) =

N° 2

(38)

C

l

for the fuzzy sets C ll , l = 1,K , c , can be constructed on

the basis of the clustering results. The empty set Aαl = ∅, l ∈{1,K , c } , can correspond to some output variable yl , l ∈{1,K , c }. So, the empty fuzzy set C ll will correspond

to the output variable yl , l ∈{1,K , c }, and γ l ( yl ) = 0 C

l

will be the membership function of the corresponding fuzzy set C ll .

and the parameter m(t1l ) can be obtained as γ

t B 1 l

( xˆ (tl1) (max) ) = γ

t B 1 l

(m(t1l ) ) = 1 .

(39)

t (min) t So, the conditions xˆ (1l ) = m(1l ) and xˆ (tl1) (max) = m(t1l ) t1 are met for all input variables xˆ , t1 = 1,K , m1. Let us consider a technique of learning the consequents of the rules. The variables yl , l = 1,K , c , are the consequents of the fuzzy rules (1), represented by the fuzzy sets C ll , l = 1,K , c , with their membership functions γ l ( yl ) . These fuzzy sets C ll , l = 1,K , c , can C

l

be defined on the interval of membership degrees [0,1] and these fuzzy sets can be presented as follows: C ll = (α , µ l , µ l ,1) , where α is the tolerance

Fig. 3. A membership function for a consequent fuzzy set in the case of a low degree of membership A scheme of rapid prototyping of the fuzzy inference system from the three-way data can be described shortly as follows: a stationary clustering structure [16] should be constructed in the first step and fuzzy rules must be derived in the second step using the proposed technique.

3. An Illustrative Example

The Sato and Sato [7] three-way data are described in the first subsection. Illustrative examples of data preprocessing are also considered in the first subsection. The second subsection includes results of numerical experiments for three distance functions.

3.1. The Sato and Sato Three-way Data

Fig. 2. A membership function for a consequent fuzzy set in a case of a high belonginess

The Sato and Sato[7] artificial three-way data are a follow-up of a survey on human physical constitution, involving the height, weight, chest girth and sitting height, which are the measurements of 38 boys at three instants, that is, at the age of 13, 14 and 15 years. These data, which originally appeared in Sato and Sato [7], can be rewritten as shown in Table 1. Articles

53


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N째 2

2014

Table 1. Physical constitution of 38 boys Height, cm

Chest girth, cm

Sitting height, cm

13

14

15

13

14

15

13

14

15

13

14

15

years

years

years

years

years

years

years

years

years

years

years

years

old

old

old

old

old

old

old

old

old

old

old

old

1

147

157

162

40

47

54

70

76

81

80

85

87

2

161

166

167

49

50

52

75

75

79

85

87

88

3

153

159

161

45

48

51

72

75

75

86

90

92

4

155

163

168

51

58

66

77

82

87

85

87

92

5

160

165

167

51

56

61

75

77

82

86

88

89

6

153

159

167

38

43

44

67

70

71

81

84

87

7

166

169

172

67

72

79

86

89

92

89

90

95

8

168

174

175

55

60

65

76

79

81

91

93

95

9

142

149

157

35

39

46

69

68

75

75

78

82

10

151

160

165

44

51

57

72

78

80

79

85

89

11

164

167

169

55

58

65

77

79

80

88

89

93

12

153

163

168

42

46

53

70

73

78

83

88

91

13

148

158

164

41

47

51

72

77

81

78

82

85

14

164

169

171

75

84

88

92

97

102

90

93

95

15

145

151

162

34

39

45

65

68

72

76

80

84

16

151

159

162

51

57

64

80

83

87

81

85

87

17

145

153

162

50

55

59

82

84

82

79

81

86

18

154

163

169

47

53

56

71

75

80

82

86

89

19

156

166

171

48

50

56

73

72

75

81

86

89

20

144

149

157

30

33

37

60

62

66

73

75

79

21

154

164

169

41

49

56

69

76

77

82

88

91

22

155

165

169

43

52

57

71

75

79

82

87

90

23

155

162

166

48

58

60

76

85

84

82

86

89

24

155

162

172

49

55

57

73

76

76

80

84

87

25

156

163

164

48

53

54

76

79

82

81

86

87

26

156

164

172

50

53

56

74

76

79

81

84

87

27

162

168

170

45

48

52

71

71

75

84

88

89

28

147

154

163

37

43

50

71

75

80

79

82

86

29

149

157

166

40

47

53

71

79

78

80

83

87

30

148

155

162

37

41

47

69

70

74

78

81

85

31

156

163

166

52

57

62

75

79

81

83

87

89

32

141

151

159

35

42

48

68

74

79

73

77

82

33

140

147

157

30

34

43

67

70

73

76

77

83

34

146

153

161

49

52

53

76

78

76

80

79

84

35

162

168

161

53

58

53

74

78

76

86

79

84

36

146

158

165

36

44

51

68

75

73

77

85

89

37

141

151

158

41

46

51

71

75

76

76

80

83

38

158

167

171

65

71

79

93

93

90

85

90

91

Boys

54

Weight, kg

Articles


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

Denote the height by xˆ 1, the weight by xˆ 2, the chest girth by xˆ 3 and the sitting height by xˆ 4. Each attribute xˆ t1, t1 = 1,K ,4, is observed at three instants (ages), t 2 = 1,K ,3 . The value of the t1 -th attribute at the t 2 -th moment for the i -th object is denoted by xˆ it1 (t2 ), i = 1,K ,38, t1 = 1,K ,4, t 2 = 1,K ,3. The data were preprocessed according to formulae (26), (27) and (29).

3.2. Experimental Results

Let us consider results of the application of the proposed technique to the Sato and Sato artificial three-way data. The data was processed by the DAFC(c)-algorithm with the number of fuzzy clusters c = 2,3K , using the measure of separation and compactness of an allotment (24). The performance of the validity measure is shown in Fig. 4. The optimal number of fuzzy clusters is equal 3 and this number corresponds to the first minimum of the measure of separation and compactness of the allotment. The corresponding allotment R ∗ ( X ) among three fully separate fuzzy clusters was obtained for the tolerance threshold α = 0.93120. The membership functions of three classes of the allotment are presented in Fig. 5 and he values which equal zero are not shown in this figure. The membership values of the first class are represented by +, the membership Fig. 4. Plot of the measure of separation and compactness for the Sato and Sato threeway data set

N° 2

2014

values of the second class are represented by ■, and the membership values of the third class are represented by × . So, by executing the D-AFC(c)-algorithm for three classes, we obtain that the first class is formed by 3 elements, the second class by 7 elements, and the third class by 28 elements. The value of the membership function of the fuzzy cluster, which corresponds to the first class, is maximal for the fourteenth object and is equal 0.98298. So, the fourteenth object is a typical point of the fuzzy cluster which corresponds to the first class. The membership value of the twentieth object is equal 0.97888 and this value is maximal for the fuzzy cluster which corresponds to the second class. Thus, the twentieth object is a typical point of the fuzzy cluster which corresponds to the second class. The membership function of the third fuzzy cluster is maximal for the fifth object and is equal 0.98392. That is why the fifth object is a typical point of the fuzzy cluster which corresponds to the third class. We could see that the boys in the first cluster have a good physical constitution through all three years of age. Conversely, the boys in the second cluster have a poor constitution. The boys who belong to the third cluster have a standard constitution. So, the results, which are obtained from the D-AFC(c)-algorithm are similar to the results, which were obtained by Sato and Sato [7] using their multicriteria optimization method. The rule base induced by the proposed technique from the clustering result obtained by using the DAFC(c)-algorithm can be seen in Fig. 6 – Fig. 8. In particular, the performance of the fuzzy inference system for the thirty-second boy at the first time measurement is presented in Fig. 6. The total area is zero while using the defuzzification procedure for the output variables Class 1 and

Fig. 5. Membership functions of three fuzzy clusters obtained from the D-AFC(c)-algorithm

Table 2. Results of performance of the generated fuzzy inference system for the data set Classes

Times measurement 13 years old

14 years old

15 years old

1

7, 14, 38

7, 14, 38

4, 7, 14, 38

2

1, 2, 6, 9, 10, 12, 13, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 36

1, 6, 9, 15, 20, 28, 30, 32, 33, 36, 37

9, 15, 20, 30, 33, 37

3

1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 34, 35, 36, 37

1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37

1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 Articles

55


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

N째 2

2014

Fig. 6. The performance of the generated fuzzy inference system for the thirty-second boy at the first time measurement (13 years old)

Fig. 7. The graph of performance of the generated fuzzy inference system for the thirty-second boy at the second time measurement (14 years old)

Fig. 8. The graph of performance of the generated fuzzy inference system for the thirty-second boy at the third time measurement (15 years old)

56

Class 3. That is why the average values of the range of the output variables Class 1 and Class 3 are used as the output values and these values are equal 0.5. These values can be interpreted as uncertain membership degrees. The performance of the fuzzy inference system for the thirty-second boy at the second time measurement is presented in Fig. 7. It should be noted, that Articles

at that time the boy belonged to the second and the third classes. The performance of the fuzzy inference system for the thirty-second boy at the third time measurement is presented in Fig. 8. So, we could see that the thirty-second boy has a tendency of growth during the period from 14 years old to 15 years old.


Journal of Automation, Mobile Robotics & Intelligent Systems

The results of the numerical experiment for all 38 boys at all three times measurement are summarized in Table 2. The Sato and Sato three-way data were classified using the fuzzy inference system constructed. Evidently, the results obtained are correlated with the results obtained from the D-AFC(c)-algorithm. So, the fuzzy inference system is accurate. On the other hand, we can observe the trend of development of each boy. The result which is obtained from the fuzzy inference system can easily be interpreted. Thus, the obtained model is suitable for the interpretation since the consequents of the rules are the same or close to the current class labels, such that each rule can be taken to describe all classes.

4. Conclusions

Many techniques to design fuzzy inference systems from data are available; basically, they all take advantage, explicitly or implicitly, of the property of the fuzzy inference systems to be the universal approximators. This paper presents an extension of an automatic method to design fuzzy inference system for classification via heuristic possibilistic clustering. This method can be considered as an approach to rapid prototyping of the fuzzy inference systems for the case of the three-way data. The proposed method is simple in comparison with other well-known approaches. The results obtained for the well=known Sato and Sato three-way data set show the effectiveness of the proposed method. Some approaches, such as those based on the use of genetic algorithms or neuro-fuzzy techniques can be used for fuzzy rules tuning. On the other hand, a scheme of on-line training of the fuzzy inference system can be developed. These perspectives for further research are of a great interest both from the theoretical and practical points of view. Acknowledgements

The research has been partially supported, first, by the National Centre of \science under Grant No. UMO2012/05/B/ST6/03068 (J. Kacprzyk, J. W. Owsiński), and secod, by the Mianowski Fund and the Ministry of Foreign Affairs of the Republic of Poland (D. Viattchenin). The authors are grateful to Mr. Aliaksandr Damaratski for developing software to implement ad test the tools developed.

AUTHORS

Janusz Kacprzyk, Jan W. Owsinski – Systems Research Institute, Polish Academy of Sciences, 6 Newelska St., 01-447 Warsaw, Poland. E-mails: {kacprzyk, owsinski}@ibspan.waw.pl

Dmitri A. Viattchenin* – United Institute of Informatics Problems, National Academy of Sciences of Belarus, 6 Surganov St., 220012 Minsk, Belarus. E-mail: viattchenin@mail.ru *Corresponding author

VOLUME 8,

N° 2

2014

References [1] Bezdek J.C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. DOI: http://dx.doi.org/10.1007/978-1-4757-0450-1 [2] Höppner F., Klawonn F., Kruse R., Runkler T., Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition, Wiley, Chichester, 1999. [3] Kacprzyk J., Multistage Fuzzy Control, Wiley, Chichester, 1997. [4] Krishnapuram R., Keller J.M., “A possibilistic approach to clustering”, IEEE Trans. on Fuzzy Syst., no. 1, 1993, 98–110. DOI: http://dx.doi. org/10.1109/91.227387 [5] Mamdani E.H., Assilian S., “An experiment in linguistic synthesis with a fuzzy logic controller”, International Journal of Man-Machine Studies, vol. 7, no. 1, 1975, 1–13. DOI: http://dx.doi.org/10.1016/ S0020-7373(75)80002-2 [6] Mandel I.D., Clustering Analysis, Finansy i Statistica, Moscow, 1988. (in Russian) [7] Sato M., Sato Y., “On a multicriteria fuzzy clustering method for 3-way data”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 2, no. 2, 1994, 127–142. DOI: http://dx.doi. org/10.1142/S0218488594000122 [8] Viattchenin D.A., “A new heuristic algorithm of fuzzy clustering”, Control & Cybernetics, vol. 33, 2004, 323–340. [9] Viattchenin D.A., “A direct algorithm of possibilistic clustering with partial supervision”, Journal of Automation, Mobile Robotics and Intelligent Systems, vol. 1, no. 3, 2007, 29–38. [10] Viattchenin D.A., “Direct algorithms of fuzzy clustering based on the transitive closure operation and their application to outliers detection”, Artificial Intelligence, vol. 3, 2007, 205–216. (in Russian) [11] Viattchenin D.A., “An outline for a heuristic approach to possibilistic clustering of the three-way data”, Journal of Uncertain Systems, vol. 3, 2009, 64–80. [12] Viattchenin D.A., “An algorithm for detecting the principal allotment among fuzzy clusters and its application as a technique of reduction of analyzed features space dimensionality”, Journal of Information and Organizational Sciences, vol. 33, no. 1, 2009, 205–217. [13] Viattchenin D.A., Automatic generation of fuzzy inference systems using heuristic possibilistic clustering, Journal of Automation, Mobile Robotics and Intelligent Systems, vol. 4, no. 3, 2010, 36–44. [14] Viattchenin D.A., “Derivation of fuzzy rules from interval-valued data”, International Journal of Computer Applications, vol. 7, no. 3, 2010, 13–20. DOI: http://dx.doi.org/10.5120/1146-1500 [15] Viattchenin D.A., “Validity measures for heuristic possibilistic clustering”, Information Technology and Control, vol. 39, 2010, 321–332. [16] Viattchenin D.A., “Constructing stable clustering structure for uncertain data set”, Acta Electrotechnica et Informatica, vol. 11, no. 3, 2011, 42–50. DOI: http://dx.doi.org/10.2478/v10198-011-0028-5. Articles

57


Journal of Automation, Mobile Robotics & Intelligent Systems

AS R

E L

VOLUME 8,

I S

C

I

EKFE

N∘ 2

2014

SLAM

Submi ed: 31st January 2014; accepted: 10th March 2014

Thomas Genevois, Teresa ZieliĹ„ska DOI: 10.14313/JAMRIS_2-2014/20 Abstract: Localiza on in an unknown environment is one of the major issues faced by autonomous vehicles. The solu on to this problem is delivered by the Simultaneous Localiza on and Mapping techniques, commonly known as SLAM. SLAM is the category of algorithms allowing a robot to map the surroundings and to keep an es mate of its posi on. Nowadays several SLAM methods are widely used. Though, many issues arise when SLAM is applied in a complex and unstructured environment. This ar cle details an implementa on of SLAM using improved Extended Kalman Filter (EKF). The aim is to provide a simple but reliable SLAM technique. The work has been carried out on a robot Seekur Jr, the mapping has been realized with a laser scanner. The applied EKF model with its modiďŹ ca ons is presented. The techniques used to observe the environment and to iden fy the landmarks are outlined. The robustness and consistency of introduced modiďŹ caons were jus ďŹ ed by experiments. Keywords: mobile robot, SLAM, environment recogni on

1. Introduc on 1.1. Historical Context It is often considered that the irst publication about SLAM has been made by Durrant-Whyte and Leonard in 1991 [3]. Among the different possible methods of SLAM, the so-called stochastic mapping methods became very popular despite their heavy computational cost. They have been introduced by Smith, Self and Cheeseman in 1987 [15]. These methods rely on a probabilistic approach. They act upon strong theoretical basis and integrate many pro itable concepts, this results in a good quality SLAM. The popular ones are the techniques using EKF based SLAM or the particle ilter based SLAM. The EKF method is most widely used because its algorithm is simple and has relatively low computational cost. For the same reason the EKF based SLAM was considered in our study. 1.2. State of the Art of the EKF-based SLAM There are many publications devoted to SLAM, the presented solutions depends on utilized sensors and on applied environment recognition and robot localization techniques. Here after are few examples. In paper [2] the vision based SLAM method is discussed, it is based on an estimated global map where a robot inds the path to the user de ined goal. Publication [1] presents ef icient method for building the 3D model of environment basis on 2D LIDAR information for the 58

purpose of SLAM. In many SLAM applications EKF is utilized however it is not free of drawbacks. One of the serious disadvantages is its computational complexity growing unbounded with the number of landmarks [9]. The recent works on EKF based SLAM attempt to improve three aspects of this technique [6, 7, 10]. The irst point is to reduce the computational cost what is crucial for real-time actions. The complexity of the basic EKF based SLAM is O(� ) where � is the size of the map. Some techniques like the Divide and Conquer SLAM [11] reduce it to O(�). The second point is to optimize the data association for the recognition of the features of the environment. This point is critical because a single mismatch in association can cause a complete failure. The different methods available are generally trying to match several features at a time (batch validation) or to ind patterns (especially visual signatures for visual SLAM). Then the third point is the consistency of robot localization. It is the ability to perform large trajectory loops in unknown areas while keeping all positions consistently estimated. Namely, the Hierarchical SLAM [4] is very consistent. Within this context, the purpose of our work is to deliver an EKF based SLAM solution for complex indoor environment. In basic EKF based SLAM algorithm some modi ications coming from the nature of the considered landmarks and the environment were applied. It was aimed to obtain a satisfying technique, in terms of computational cost, data association and consistency. We consider the environment recognition using the laser scanner and the robot odometry basis on data delivered by wheel encoders and gyroscope. The robot positioning and mapping is performed concurrently in the real-time. 1.3. Problem Statement SLAM is the problem faced by a robot in an unknown environment without absolute knowledge of its position. In such a situation, the robot has an estimation of its position provided by odometry but the error of this measure is increasing constantly during the movement. SLAM is a method combining the odometry’s estimation and observations of the environment to keep the position error bounded. Our purpose is to implement an accurate and robust SLAM method based on EKF [7,12], on a Seekur Jr robot. This work deals at the same time with the theoretical and the practical aspects of the problem. First it is necessary to master the localization and the recognition of the landmarks. Then models of the odometry and of the measurement have to be built. The over-


Journal of Automation, Mobile Robotics & Intelligent Systems

all method should naturally discard the disturbances from the environment and the noise of the measurements. The environment considered here is a laboratory classroom, it is a realistic and complex environment, with many obstacles like chairs, tables, equipment or people.

VOLUME 8,

N∘ 2

2014

Cartesian coordinates. Then the segments eventually formed by these points are identi ied as landmarks.

Fig. 2. Representa on of landmarks

Fig. 1. The Seekur Jr from Adept MobileRobots [8] The Seekur Jr (Fig.1) is one of the newest robots from the company Adept MobileRobots. It is dedicated to research and surveillance applications. Its overall kinematics can be approximated by the simple unicycle model. The Seekur Jr has an odometry system relying on encoders on the wheel axis and a gyroscope. The gyroscope’s measurement is merged with the encoders’ data to have an optimized estimation of the position at every instant. This system provides position and heading measurement. This is the source of the odometry readings for the SLAM. Then a laser scanner is used to observe the environment. This sensor has the advantage to be accurate and fast. All sensory readings, from laser scanner and from odometry, are obtained every 100 ms (default period on this robot).

The landmarks will be treated as mathematical lines, supposedly in inite. The representation adopted for the map considers the initial position of the robot as the origin. Then the lines are represented by the projection of the origin on themselves. In this way, each landmark is represented by one single point in the map (Fig.2). For a meaningful map, the end points of the landmarks must be approximated but they are not used for localization purposes.

Fig. 3. Main steps in SLAM

1.4. Main Steps The EKF based SLAM relies on a mathematical model embedding the kinematics of the robot and the odometry’s capabilities. This model is used for prediction of the future position with the current odometry’s measurements. It is often called prediction model. Here it consists of the kinematic model, approximated by the unicycle model, and of an estimation of the odometry errors. The environment is observed through some landmarks, other objects being ignored (feature based SLAM). The choice of the sensors and the type of landmarks used to observe and describe the environment is decisive for the overall SLAM process [13]. The chosen landmarks must be observable, recognizable, easy to observe and stationary. It appears that the best landmarks for the environment considered are the walls of the room and more generally every large vertical plane surface. They have all the qualities already mentioned. The walls appear as straight lines to the laser scanner. The step in SLAM which corresponds to the identi ication of a landmark from the sensory readings is called landmark extraction. The sensory reading from the laser scanner is a set of points in polar coordinates. First, it is necessary to obtain their

The next step in SLAM is the data association. It is the most critical step in the process. Considering an observation of landmark from the landmark extraction, the data association has to identify whether a known landmark has been re-observed or whether a new landmark has been found. The data association tries to match the current observation with the known landmarks and if an observation does not match any landmark it is considered as a new landmark. Taking this into account, the EKF considers the re-observed landmark and updates the position and the map accordingly, or the EKF adds the new landmark to the map. The overall process is represented by the scheme shown in Fig.3.

2. Applied Solu on 2.1. Iden fica on of the Landmarks The extraction of the landmarks and the data association are not complex but they must be realized with great care because the result strongly affects SLAM quality. In this study, it has been chosen to use few landmarks but very reliable ones. Therefore a harsh iltering process is used at all stages from the laser 59


Journal of Automation, Mobile Robotics & Intelligent Systems

scanner reading to the acceptation of a landmark in the EKF state. Initially, the laser scanner reading provides a set of points observed, in polar coordinates. In order to be processed, these points are converted to the Cartesian frame of the mobile robot. Then the line recognition algorithm identi ies the line segments taking into account the obtained set of points. The RANSAC algorithm [5], simple and ef icient, is a reasonable choice. This step belongs to the line extraction action mentioned in section 1.4. The irst iltering is applied here – only the lines satisfying a set of criteria are considered further. In this work, it has been decided to keep lines longer than 40 cm, with at least 6 points, with a maximal distance of a point to the line of 3 cm and a maximum average distance of the points to the line of 2 cm. These criteria have been chosen considering that the most reliable elements in a indoor environment are the walls and the furniture. The minimum length of 40 cm ensures that only these objects will be used to create landmarks, moving objects like human legs or chair legs are discarded. Only the lines of at least 6 aligned points are used because legs of chairs and table can create illusions of lines with 3 or 4 points aligned. Then the 2 last constraints mean that the point must be very well aligned, the line must be neat. This coniguration is convenient for the measurement justi ication because it reduces the overall set of possible landmarks to the few very reliable landmarks. The length condition of the lines’ segments and the requirement for minimum 6 points makes the landmarks localization precise. The main drawback of such choice is that some areas might not have observable landmarks because of blocked ield of view, for example too many chairs or tables can hide the sight of the wall. Such situations occurred in our work, an introduced solution is described in section 2.3.

VOLUME 8,

N∘ 2

2014

match is valid otherwise it is not (Fig.4) and a new line (new landmark) is created. The innovation � is the difference between the observation realized and the expected position of the landmark (from the EKF), it can be computed for every pair line-landmark. In our case, the innovation is a 2 elements vector with the quantities representing difference in the distance from the robot to the lines, and the difference in angle between the lines. � given by the EKF (explained in section 2.2) is the innovation covariance matrix, it is computed for every landmark. Known from literature, the validation gate takes into account the value of � � �. The boundary value equal to 9.0 de ines the range that contains the measurement with a probability of 98.9% [17]. A line is considered as matching if it ful ills the inequality. This validation gate has the advantage to adapt the criterion to the uncertainties of the measure. � : innovation covariance matrix � : innovation � �

(1)

đ?‘Ł<9

This validation gate considers the theoretical in inite lines. If an observed line matches a landmark, it is necessary to check that the segments are also compatible and not only aligned. The segments are compatible if the observed segment is partly coinciding with the segment of the landmark. If the observed segment satis ies the validation gate without having a part coinciding with the landmark, it is considered as a new landmark (which is probably aligned with the irst one). This situation happens often indoors. In order to make the distinction, one must keep an estimation of the end points of the landmarks and update it every time the data association is performed.

Fig. 5. Common landmarks in a 16 m×7 m room (map made by experiment) points : laser scanner’s observa ons lines : SLAM landmarks Fig. 4. Valida on gate In data association all new observed candidates for the line are analyzed one by one. The algorithm tries to match a line candidate with every known landmark. If a line does not match any landmark, it is considered as a potentially new landmark. The matching decision is issued by the so called validation gate. The validation gate considers the distance between the newly observed landmark and an already known landmark. If this distance is smaller than a given threshold, the 60

Every time a line is detected as a potential new landmark, it is safe to keep it in an intermediate state which is used for the data association but not including it yet in the Kalman ilter. Such a landmark would be included in the EKF only if it is observed often enough in a short time period. In this work, relying on tests, we concluded that it must be at least observed 5 times (in the same place) over 15 iterations, otherwise it is deleted. The irst practical aspect of this is that the robot must have a probability of 5/15 = 33% to observe the landmark when it is near : it checks that


Journal of Automation, Mobile Robotics & Intelligent Systems

the landmark is not affected by noise and is not hidden by obstacles. This means that the observation is repeatable. The second aspect is that the displacement of the landmark during 5 time steps (100 ms) must be below the matching criterion (8 cm and 1.25∘ ). Therefore, with respect to a ixed frame, the linear speed of the landmark must be below 16 cm/s and its angular speed below 2.5∘ /s. So the landmark can be considered as ixed. These 2 points make sure that the SLAM discards irrelevant landmarks.

VOLUME 8,

N∘ 2

2014

Fig. 7. State variables this is the prediction step. Let đ??´ be the Jacobian matrix of the prediction model. The prediction model is represented by the function đ?‘“ delivering the expected state at the next iteration according to the kinematic model. The explicit argument of đ?‘“ is the current state, the current linear and angular speeds are also arguments of this function. đ??´ is the Jacobian matrix of đ?‘“. In our notation the derivative is denoted by đ?‘‘. đ?‘‹

,

đ??´

≈ =

đ?‘“(đ?‘‹

, ) ( ( , )) ( , )

(2)

Let � be the covariance matrix of the odometry’s errors. The prediction step is globally de ined by the set of equations (3).

Fig. 6. Environment observa on process Thanks to what is explained above, the landmarks are only the most reliable elements (walls and furniture). Most of all, legs of tables, chairs and persons standing are discarded (because they do not contain straight lines). Moving objects are also discarded (opening doors, animals, other robots). The method can be executed in an of ice or a home environment without needing any special care or requirement. The Fig.5 illustrates which elements of a room are commonly extracted as landmarks. The Fig.6 displays the overall observation process. 2.2. Models for EKF The EKF is the element of the method which realizes the fusion of the odometry’s data and the observations resulting from the process described in section 2.1. It considers the landmarks as in inite lines, each line represented by a point (Fig.2). The state of the system consists in the position of the robot (đ?‘Ľ, đ?‘Ś, đ?œƒ) and the positions of the landmarks (đ?‘Ľ , đ?‘Ś ). At every iteration, the EKF computes the state and its covariance matrix. The covariance matrix holds the information about the uncertainty of every state component and also the correlations between components. Namely the ith diagonal term of the matrix is the square of the estimated standard deviation of the ith element of the state. Let đ?‘‹ be the state and đ?‘ƒ its covariance matrix. Let đ?‘› be the number of landmarks included in the state. Whenever an odometry measurement is released, the EKF has to compute đ?‘‹ and đ?‘ƒ , the parts of đ?‘‹ and đ?‘ƒ related to the robot’s position ( irst 3 components):

đ?‘‹ đ?‘ƒ

, ,

= đ?‘‹ , + Δđ?‘‹ = đ??´đ?‘ƒ , đ??´ + đ?‘„

(3)

Then the correlations with the landmarks also have to be updated. Let đ?‘ƒ | be the 3Ă—2 block of correlation matrix between the robot and the ith landmark. đ?‘ƒ |, đ?‘ƒ| ,

= =

đ??´đ?‘ƒ | , đ?‘ƒ |,

(4)

In order to obtain the best performance for the SLAM, these updates have to be done adequately with the actual behavior of the robot and its odometry. The unicycle model leads to the de inition of matrix đ??´ (5).

đ??´=

1 0 đ?‘‘(đ?‘“(đ?‘‹ , )) = 0 1 đ?‘‘(đ?‘‹ , ) 0 0

−Δđ?‘Ś Δđ?‘Ľ 1

(5)

The matrix Q in equations (3) represents the odometry errors and the un-modeled phenomena. Often only 2 kinds of inaccuracies are considered. There is the position (đ?‘Ľ, đ?‘Ś) error increasing proportionally to the linear speed. Let đ?‘ž be its standard deviation per unit of displacement. Another is the heading (đ?œƒ) error increasing proportionally to the angular speed. Let đ?‘ž be its standard deviation per unit of rotation. The experimental observations with the Seekur Jr led to the addition of a third error in our implementation. It is the heading error increasing proportionally to the linear speed. Let đ?‘ž | be its standard deviation per unit of displacement. Introducing this additional error allows more realistic estimation of the robot performance. It seems that this uncommon kind of error is more important because the Seekur Jr is a skid-steering robot. 61


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

When the odometry delivers the displacement of (Δđ?‘Ľ, Δđ?‘Ś) and rotation of Δđ?œƒ data, the equation (6) provides the de inition of đ?‘„ with the values used in this work. These values result from the robot features including frequency of the odometry readings. The parameters đ?‘ž were obtained experimentally. The global standard deviations (square roots of the 3 irst components of the diagonal of đ?‘„) were obtained after several realizations of a same experiment (for example going straight forward for 20 m or rotating 10 times on its own). Then the đ?‘ž parameters (đ?‘ž , đ?‘ž ) were calculated from the global standard deviations. Matrix đ?‘„ (6) provides good estimation of the uncertainties over short displacements but it is less accurate for larger displacements.

đ?‘ž đ?‘ž đ?‘ž

|

Δ�

đ?‘„ = (đ?‘ž Δđ?‘Ľ) đ?‘ž Δđ?‘ĽÎ”đ?‘Ś đ?‘ž đ?‘ž Δđ?‘ĽÎ”đ?œƒ + đ?‘ž | đ?‘ž Δđ?‘ĽÎ”đ?‘Ą â‹Ż (đ?‘ž Δđ?‘Ś) đ?‘ž đ?‘ž Δđ?‘ŚÎ”đ?œƒ + đ?‘ž | đ?‘ž Δđ?‘ŚÎ”đ?‘Ą symmetric â‹Ż (đ?‘ž Δđ?œƒ) + (đ?‘ž | Δđ?‘Ą)

đ??ť đ??ť

( , ) ( )

= =

−đ?‘ −đ?‘ 0 0 â‹Ž 0

⎥ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢đ?‘ ⎢đ?‘ ⎢ ⎢ ⎣

=

�

=

(

)

(

)

atan2(đ?‘Ś , đ?‘Ľ ) − đ?œƒ

(7)

Let introduceđ?‘ , the side indicator, đ?‘ is equal to 1 when đ?‘‚ and đ?‘ƒ are on the same side of the line and −1 otherwise: đ?‘ = sign(đ?‘Ľ

+đ?‘Ś

0 0 −1 0 ⋎ 0 −

⎤ ⎼ ⎼ ⎼ ⎼ ⎼ ⎼ ⎼ ↌ line 2 + 2� ⎼ ⎼ 0 ⎼ ⋎ ⎌

0 â‹Ž

đ??ś đ?‘†

= cos(đ?œƒ + đ?›ź) = sin(đ?œƒ + đ?›ź)

đ?‘Ľ đ?‘Ś

= đ?‘Ľ đ??ś + đ?‘Ś đ??śđ?‘† + đ?‘ đ?œŒđ??ś = đ?‘Ľ đ??śđ?‘† + đ?‘Ś đ?‘† + đ?‘ đ?œŒđ?‘†

From the derivation of (10), đ??˝ tained:

(6) The EKF must also consider how to initialize and how to update the landmarks. This relies on the way the robot observes the landmarks and the accuracy of the measurement. The landmarks are considered as unlimited lines. Let đ?œŒ and đ?›ź be the range and bearing from the robot to the projection of the robot on the landmark. In this section the observation of a line will always be expressed in such terms of a range and bearing. The coordinates (đ?‘Ľ , đ?‘Ś ) refer to the representative point of the landmark (Fig.2). This point is convenient to deal with the line but it is has no physical meaning and can not be used directly. The points, đ?‘‚, đ?‘ƒ and đ?‘ƒ stand for the origin, the robot and the landmark. The reference frame is still de ined by the initial position of the robot. The couple (đ?œŒ, đ?›ź) is expressed by (7).

−đ?‘Ľ đ?‘Ľ −đ?‘Ś đ?‘Ś )

(8)

Let đ?œŒ denote the distance đ?‘‚đ?‘ƒ , đ??ť, the Jacobian matrix of the measurement is given by (9). When only the ith landmark is observed only the columns 1, 2, 3, 2+2đ?‘– and 3 + 2đ?‘– are non zero. đ??ť is the key matrix in the measurements’ model, it holds the information about relation between the state variables and the measure62

ments.

Δ� + Δ�

đ?œŒ

2014

(9) The coordinates of đ?‘ƒ are expressed by formula (10):

= 0.018 = 0.05 = 0.0045∘ /đ?‘šđ?‘š =

N∘ 2

đ??˝

= đ??ś đ??śđ?‘† đ??śđ?‘† đ?‘†

( , ) ( )

đ??˝

=

( , ) ( , )

=

đ?‘ đ??ś đ?‘ đ?‘†

(10)

and đ??˝ are ob-

= −đ?‘Ś − 2đ?‘Ľ đ??śđ?‘† + 2đ?‘Ś đ??ś − đ?‘ đ?œŒđ?‘† −đ?‘Ľ + 2đ?‘Ľ đ??ś + 2đ?‘Ś đ??śđ?‘† + đ?‘ đ?œŒđ??ś

−đ?‘Ś − 2đ?‘Ľ đ??śđ?‘† + 2đ?‘Ś đ??ś − đ?‘ đ?œŒđ?‘† −đ?‘Ľ + 2đ?‘Ľ đ??ś + 2đ?‘Ś đ??śđ?‘† + đ?‘ đ?œŒđ??ś

(11) These matrices are used for the initialization step, when a new landmark is found. At the initialization step the state đ?‘‹ is expanded adding the position where the new landmark has been observed. Then the P is expanded adding 2 components: they are described by (12). This implementation considers the use of a complete P matrix (and not only its blocks). It has the advantage to allow the obtained landmark observation to introduce modi ication of the correlated landmarks. Due to that, this implementation does not any explicit feedback loop. However it results in bigger computation load – especially with plenty of landmarks. Therefore it might require a sub-mapping strategy [14] decreasing the calculations load. + đ??˝ đ?‘…đ??˝ đ?‘ƒ | =đ??˝ đ?‘ƒ đ??˝ đ?‘ƒ | =đ?‘ƒ | =đ?‘ƒ đ??˝ ∀đ?‘– ≤ đ?‘› − 1, đ?‘ƒ | = đ?‘ƒ | = đ??˝ đ?‘ƒ

(12) |

The update step of the SLAM is realized when a landmark is re-observed. The innovation đ?‘Ł (difference between the observation and expected landmark position) is here considered. The matrix đ?‘… representing the measurement error in terms of range and bearing


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 8,

is also applied. đ?‘… is given by (13). It is important to consider the uncertainty due to motion of the robot and due to laser scanner synchronization inaccuracy. So (đ?‘ž , đ?‘ž ), the standard deviations of the measurement concerning the range and bearing, must be estimated experimentally, and if possible when the robot is in motion. Those quantities correspond to the accuracy in line extraction and the repeatability of the laser scanner positioning. đ?‘ž đ?‘ž

= =

đ?‘…

=

0 đ?‘ž

đ?‘† đ??ž đ?‘‹ đ?‘ƒ

= = = =

đ??ťđ?‘ƒđ??ť + đ?‘… đ?‘ƒđ??ť đ?‘† đ?‘‹ + đ??žđ?‘Ł (1 − đ??žđ??ť)đ?‘ƒ

đ??ž = đ?›žđ?‘ƒđ??ť đ?‘†

(13)

The full update step is described by (14). It updates the state and its covariance using đ??ž – the Kalman gain. Note that đ?‘† is the innovation covariance matrix given by (1). đ?‘† sums up the uncertainties of the robot’s position and the landmark’s position expressed in terms of range and bearing, and adds the uncertainty of the measurement.

(14)

2.3. Addi onal Improvements The implementation with all the elements mentioned so far was leading to a successful mapping of areas with many landmarks, like the area on the right side of the map Fig.5. Though, in areas with fewer landmarks, the behavior was often diverging. This concerned, for example, the left side of the map shown in Fig.5. One of the hypothesis of the EKF based SLAM, is that the landmarks are equally distributed. This is not always true, especially when the robot is acting in not specially arranged environment like that one shown in Fig.5. Two novel modi ications have been introduced in our work extending the EKF method for managing such situations. The unstable situation in EKF based SLAM occurs due to the large variations of the uncertainties in the robot’s position. The validation gate (inequality (1)) used in the data association is lexible. It considers a larger possible innovation when the uncertainties are high. Therefore the data association naturally adapts itself to the variations of uncertainty. But it relies on the EKF estimations. When the robot’s position is uncertain, the EKF is very sensitive and the estimated position might be signi icantly affected by the landmarks. The area considered in the right side of the map Fig.5 contains only 2 landmarks. When the robot is focusing on this side, the EKF is strongly relying on those landmarks, especially if they are observed for the irst time, due to the positioning errors, they can give a wrong reference for the whole procedure. We proposed a solution to this issue. The idea is to detect such �dangerous� situation and to reduce the impact of observations of such landmarks. A dangerous landmark should be initialized

2014

and correlated with the others during irst few observations and then it has to be progressively ignored preventing any damage of the EKF state. The idea is to prevent the dangerous landmarks to affect the EKF by the addition of a gain � in the computation of the Kalman gain (15). � is different for every landmark. This has been inspired by [6] discussing the problem of missed observations what is theoretically equivalent to our problem (some landmarks are observed too often and the others are absent).

80 đ?‘šđ?‘š 1.25∘ đ?‘ž 0

N∘ 2

(15)

It has been observed that the problems occurs when the uncertainty of an observed landmark is much bigger that the uncertainty in the robot position. If the robot moves around such landmarks, their uncertainties are progressively decreasing, but the robot position and other landmarks’ estimated position is negatively affected. In fact, the landmark uncertainties should not decrease because the robot’s position is still uncertain; the observations are not bringing reliable information. To detect such situation a ratio đ?œ‘ is introduced and computed at every observation. đ??ś is a covariance matrix which is sum of the uncertainties of the robot’s position, the uncertainties of the measurement, and a minimal level of acceptable uncertainty of the landmark’s position. This minimal level of acceptable uncertainty is de ined as a fraction of đ?‘… (term (đ?›˝ − 1)đ?‘… in (16)) it expresses the uncertainty of well known landmark. Thanks to đ??ť, the Jacobian matrix of the measurement, these uncertainties are expressed with respect to the measurement. đ??ś represents the minimum uncertainty in an observation, obtained when the landmark’s position is very well known. đ??ś is a covariance matrix which represents the uncertainties in the landmark’s position, expressed with respect to the measurement. Finally đ?œ‘ is de ined as the ratio of the trace of đ??ś đ?‘… over the trace of đ??ś đ?‘… (16).The multiplication by đ?‘… is used to allow the addition of the uncertainties in range and bearing, both expressed by the covariance matrices. đ?œ‘ increases when the landmark’s position is uncertain and when the robot’s position is certain. đ?›˝ đ??ś

=

2

= đ??ťđ?‘ƒ|đ??ť = đ??ť đ?‘ƒ| đ??ť

đ??ś đ?œ‘

=

Tr( Tr(

+ ��

(16)

) )

It has been considered that the minimum level of uncertainty reachable for a landmark is đ?‘… itself. Therefore, according to its de inition, đ?›˝ is equal to 2. Then it has been decided that, the uncertainties of the landmark and the uncertainties of the robot must become as low as đ?‘…. So đ??ś ≈ đ?‘… and đ??ś ≈ (1 + đ?›˝)đ?‘… then đ?œ‘ ≈ 1/3. This reasoning gave the threshold value đ?œ‘ = 1/3. If đ?œ‘ > đ?œ‘ , the landmark is considered dangerous. During experiments it was detected that the criterion is rather conservative, it 63


Journal of Automation, Mobile Robotics & Intelligent Systems

tends to consider many recent landmarks as dangerous, therefore đ?œ‘ must be slightly increased in order to select only the most dif icult situations. With this modi ication our innovation gives good results. Initially đ?›ž is equal to 1 for every landmark. The ratio đ?œ‘ is computed after every landmark’s observation once the update step is completed. For every dangerous landmark đ?›ž is decreased by a certain amount. In our study đ?›ž was decreased by 0.35. When đ?›ž reaches 0 it is locked for 5 s, every new dangerous observation of this landmark resets the time counter. When a landmark is not considered dangerous, its đ?›ž is increased in each step by small amount, for example 0.05. By this way the dangerous landmarks are progressively ignored and remain ignored until they disappear from a sight. After some time, they can be used again. The result of the addition of this element in the EKF based SLAM is such: - The new landmarks are not trusted immediately. - The correlation between landmarks is more important during the exploration. - The SLAM can stay stable longer without observations of previously known landmarks (better consistency). - The complexity is not increased. - The time needed to fully explore an area is slightly increased. Due to the assumption about regularly spaced landmarks, the EKF based SLAM ignores another situation. The validation gate (Fig.4) uses a criterion based on the distance between the landmarks to distinguish them. This distance is compared with the uncertainties of the measurement, landmarks’ positions and robot’s position. When the uncertainties of the measurement are larger than the distance between 2 landmarks, the landmarks can be confused. This can cause severe damage to the SLAM process. Several techniques of data association, like the joint compatibility test [16], allow to reduce the risk of confusion. Though, in complex environments, the risk of data association error can not be fully eliminated. Instead of avoiding the confusion, the approach proposed in this paper tries to minimize the damageable effect of an association mistake. The method applied in this work relies in an elimination of the possible causes of confusions. It is done every 10 iterations (every 1 s). The validation gate is applied to all couples of landmarks which are likely to be observed (currently near to the robot). If 2 of such landmarks match together, it means that there is a danger of confusion. The SLAM could not distinguish these landmarks, therefore it is unsafe to keep both of them. The newest landmark is deleted. The uncertainty of the remaining landmark is increased by the distance between the 2 former landmarks. The addition (17) is performed on đ?‘ƒ – the part of the matrix 64

VOLUME 8,

N∘ 2

2014

đ?‘ƒ related to this landmark. (Δ , Δ ) : distance between the representative points of the landmarks along x and y đ?‘ƒ,

=đ?‘ƒ, +

Δ Δ Δ

Δ Δ Δ

(17) This additional element acts like a local adjustment of the map resolution. In special situations with few landmarks, it maintains the covariance bound to a level of uncertainty which is actually reachable. In the beginning of the mapping, several landmarks can be ignored and only the main elements are placed on the map. Then, when an area starts to be better known, the uncertainty of the robot’s position decreases and it allows mapping the details of the environment. Such approach has good stabilizing effect but it should not be used too often (it has been observed experimentally that it can cause a slight positioning drift). The results are: - The possibly confusing situations are secured. - The behavior with high level of uncertainty is safe (when combined with the irst additional module suggested). - Data association mistakes causing the creation of new landmarks instead of re-observation of a known landmark are solved. - Exploration is realized progressively, irst a general map is built, then the details are added. - Using this module too often will cause all uncertainties remaining on high level, this can cause an eventual slight drift of the map.

3. Results 3.1. Performance In order to prove the ef iciency, the described techniques have been implemented on a Seekur Jr robot. A test has been run in a classroom. The room was not tidy, there were many chairs and tables, and there were some persons walking. The robot was manually driven making loops at medium speed, the SLAM was active. Each experiment lasted 6 minutes. Fig.8 shows a photo of the room and its map. This map is superposition of the reference map and the detected landmarks. The reference map shows the room plan with furniture elements. The landmarks found by SLAM are shown as obtained in the end of the experiment. The point (0,0) is the initial position of the robot. In ideal condition, the reference map and the detected landmarks coincide. The superposition of reference and detected maps show that the landmarks are all relevant and each of them represents a signi icant element in the room. At least a section of each wall is represented by a landmark. Several elements or sections of walls were ignored either because they were too short or because they were hidden behind obstacles. The irrelevant elements have been iltered and all of the reliable ele-


Journal of Automation, Mobile Robotics & Intelligent Systems

Fig. 8. Experiment on Seekur Jr: a – room where the experiment took place. b –posi on of the landmarks a er explora on las ng 6 min. The actual posi on of the walls and furniture is shown as the reference.

ments were used for the SLAM. The landmark extraction, data association and iltering was ef icient. The reference and the landmarks coincide well. The average mapping accuracy was 10 cm.

VOLUME 8,

N∘ 2

2014

dition of the gain đ?›ž (described in 2.3) is that, that the local peaks are higher afterwards the bound decreases more quickly to its previous level. The consistency is increased but the exploration is slower. The graphs in ig.9 show that the inal value for the 2đ?œŽ bounds are 12 cm for the position estimate, 2.2∘ for the heading estimate and 21 cm for the landmarks’ positions estimates. The proof obtained by measurements showed that the robot localization error is about Âą10 đ?‘?đ?‘š and Âą1∘ of heading. The estimated position accuracy is not far from what has been measured. It proves that the kinematic model of the robot (matrices (5) and (6)) is good and embeds properly the important factors. During our experiments, the maximal computational time was 20 ms to perform landmarks’ extraction, data association, prediction and update. It was with 25 landmarks in the state of the EKF and 3 landmarks observed at once. Therefore it can be concluded that the calculations time of the program is rather short, the algorithm is ef icient. 3.2. SLAM Tes ng Using the Simula on Besides of the experiments on the Seekur Jr, a simulation method has been used to test the SLAM method with the modi ications we introduced. The simulation was found to be very useful because it allowed to test our algorithm with an ideal settings, with all elements fully mastered. This gave the opportunity to test more deeply our algorithm. The simulator MobileSim provided by Adept MobileRobots has been used for simulations.

Fig. 9. Covariance bounds during the experiment Fig.9 shows the 2đ?œŽ covariance bound of the estimated robot’s position, heading and the landmarks’ position (averaged) during the experiment. It provides information of how the EKF is certain of the value. These graphs show that the covariance bounds do not diverge and the average bound of the landmarks is slowly decreasing. It means that the SLAM is stable and the EKF is more and more con ident in the landmarks. It can be observed that, locally, the covariance bounds of the position and the heading have some peak values. This happens when the robot explores not well known areas. In this case, the SLAM is less certain about the robot’s position. An effect of the ad-

Fig. 10. Simulated room and trajectory of the robot Fig.10 displays the virtual room considered in simulation. It shows also the trajectory followed by the virtual robot. Point 1 and point 2 in the trajectory are marking 2 situations that will be investigated later. The simulation lasted 2 minutes. The simulation considered the disturbances model matching the features observed in real conditions with the Seekur Jr concerning the laser measurements and the odometry results. The simulated experiment considered the same situation as in the actual experiment, the same SLAM program was used and with the same parameters. In 65


Journal of Automation, Mobile Robotics & Intelligent Systems

simulated room was 1 obstacle (in the middle) blocking the sight of the laser scanner.

Fig. 11. Error and covariance bound during the simula on The graphs in ig.11 display the actual localization error and the 2đ?œŽ covariance bounds of estimated robot position. The actual error is obtained from the simulator. Even if this result is less meaningful than the experiment, it is worth to notice that the error remains below 2đ?œŽ bound (except one very short time period for đ?‘Ś coordinate). So the EKF provides good estimation. The robot made twice the loop around the central obstacle. When the robot was behind this obstacle it loosed the sight of its irst landmarks, this caused a progressive increase of the position’s covariance bound. When the robot passed the obstacle for the second time, the covariance bound increase was lower because the SLAM already knew these landmarks (the gain đ?›ž described in section 2.3 was equal to 1).

VOLUME 8,

N∘ 2

2014

incides well with the real ones. Only the wall in upper part of drawing does not fully coincide with the landmarks. This is due to the fact that, in this wall, there is one section located 50 cm ahead of the rest of the wall. Because of that the wall should be represented by three landmarks while it is represented only by one landmark on the map in igure 12. The selection of only one landmark was controlled by our algorithm. Analyzing together the graphs in ig.11 and robot trajectory ( ig.10) we learnt that before reaching point 1, the inaccuracy in robot’s position estimate increases. Therefore, according to the technique explained in section 2.3, the SLAM was prevented from creating three landmarks for the upper wall. Creating three landmarks brings the risk of confusion between them. Therefore instead, the upper wall is represented by one landmark only, making a kind of compromise. This situation happened only once in this simulated environment but it is a common situation which happened much often in real experiments. Despite of the small inaccuracies possibly caused by the landmarks reduction the overall performance of algorithm can be concluded as being good, with decreased possibility of landmarks confusion.

Fig. 13. Landmarks when the robot is at point 2

Fig. 12. Landmarks when the robot is at point 1 The igure 12 shows the landmarks identi ied by SLAM algorithm when the virtual robot reaches the point 1 (see on ig.10). For the reference the actual position of the walls is also shown. It can be noticed that the largest part of the identi ied landmarks co66

The igure 13 displays the map of the landmarks when the robot reaches point number 2. Then, the upper wall is represented by three distinct landmarks. The landmarks coincide well with the reference but the end points of the segments are mistaken. The distinction of the three landmarks is possible at this point because since passing the point 1 robot is in a known environment. So the estimated positioning error is smaller, therefore, the mapping can use a lower resolution (technique described in section 2.3). Then the irst landmark is kept and the other two landmarks are added to comply with the new sections of observed walls. However, as it has been explained in section 2.1, it is not considered that the segment’s size can be reduced. So the end points are not accurately placed. This is not a problem for robot localization but creates a limit for mapping purposes.


Journal of Automation, Mobile Robotics & Intelligent Systems

4. Conclusions This paper presented a Kalman Filter based SLAM method. The method has been adapted to perform better in complex environment and it was tested on a skid-steering robot. The method succeeded to perform the SLAM in a complex environment, without being obstructed by common obstacles like chairs, tables and people. The method was implemented in the Seekur Jr robot. Tested mapping and the localization performance was found enough accurate. The algorithm is fast to execute. With the improvements, the algorithm is also consistent enough to achieve the satisfactory SLAM while the robot is moving along short loops, like those tested in the experiment and in the simulation. The method is consistent, fast and accurate while it is also rather simple. However for general exploration the method is slightly slow and not all landmarks can not be directly used for mapping purposes (due to the landmarks reduction the segments end points are not always accurate as it was explained). The purpose of this work was to investigate the SLAM only in a single laboratory rooms but not outdoor. In order to extend this to larger areas, it would be needed to combine our method with more complex techniques like the Hierarchical SLAM [4]. This will improve the consistency of landmarks positioning with distributing the computational cost. Elaborated control program was suf icient for presented experiments, however for practical applications the failure recovery technique would be necessary to make sure that the SLAM remains stable with consistent information for long periods of time and over a long distances.

AUTHORS

Thomas Genevois∗ – Faculty of Power and Aerospace Engineering, Warsaw University of Technology, 00-665, Warsaw, Poland, e-mail: thomasgenevois@yahoo.fr. Teresa Zielińska – Faculty of Power and Aerospace Engineering, Warsaw University of Technology, 00665, Warsaw, Poland. ∗ Corresponding author

REFERENCES [1] Berger C., “Toward rich geometric map for SLAM: Online Detection of Planes in 2D LIDAR”, JAMRIS, vol. 7, 2013, pp. 36–41. [2] Berrabah S. A. and Colon E., “Vision-based Mobile Robot Navigation”, JAMRIS, vol. 2, 2008, pp. 2–7. [3] Durrant-Whyte H. and Leonard J., “Mobile robot localization by tracking geometric beacons”. In: Intelligent Robots and Systems’ 91. ’Intelligence for Mechanical Systems, Proceedings IROS’91. IEEE/RSJ International Workshop, November 1991. [4] Estrada C., Tardos J.D., and Neira J., “Hierarchical SLAM: Real-Time Accurate Mapping

VOLUME 8,

N∘ 2

2014

of Large Environments”, IEEE Transactions on Robotics, vol. 21, no. 4, August 2005, DOI: 10.1109/TRO.2005.844673. [5] Fischler M. and Bolles R., “Random Sample Consensus: A Paradigm for Model Fitting with Applicatlons to Image Analysis and Automated Cartography”, Communications of the ACM, vol. 24, no. 6, 1981, pp. 381–395, DOI: 10.1145/358669.358692. [6] Hamzah A. and Namerikawa T., “Covariance Bounds Analysis during Intermittent Measurement for EKF-based SLAM”, International Journal of Integrated Engineering, December 2012, pp. 19–25. [7] Huang S. and Dissanayake G., “Convergence Analysis for Extended Kalman Filter based SLAM”, IEEE Transactions on Robotics, 2007. [8] Adept MobileRobots Inc. “Seekur Jr datasheet”, December 2011. www.mobilerobots.com. [9] Jesus F. and Ventura R., “Simultaneous Localization and Mapping for Tracked Wheel Robots Combining Monocular and Stereo Vision”, JAMRIS, vol. 7, no. 1, 2013, pp. 21–27. [10] Martinez-Cantin R. and Castellanos J.A., “Bounding Uncertainty in EKF-SLAM: The Robocentric Local Approach”, Proceedings 2006 IEEE International Conference on Robotics and Automation (ICRA 2006), 2006, DOI: 10.1109/ROBOT.2006.1641749. [11] Paz L.M., Tardós J.D., and Neira J., “Divide and Conquer: EKF SLAM in O(n)”, IEEE Transactions on Robotics, October 2008, DOI: 10.1109/TRO.2008.2004639. [12] Russell S. and Norvig P. Arti icial intelligence : A Modern Approach, Chapter 15.4 Kalman ilters. Upper Saddle River: Prentice Hall, 3rd edition, 2009. [13] Siegwart R., Nourbakhsh I., and Scaramuzza D., Introduction to Autonomous Mobile Robots, The MIT Press, 2004. [14] Sing Lee C. and Salvi J. “A Review of Submapping SLAM techniques”. Technology University of Girona, 2010. [15] Smith R., Self M., and Cheeseman P., “A Stochastic Map For Uncertain Spatial Relationships”. In: Proceedings of the 4th international symposium on Robotics Research, The MIT Press, Cambridge, 1987, pp. 467–474. [16] Tardos J.D. and Neira J., “Data Association in Stochastic Mapping Using the Joint Compatibility Test”, IEEE Transactions on Robotics, vol. 17, no. 6, December 2001, DOI: 10.1109/70.976019. [17] Zunino G. Simultaneous Localization and Mapping for Navigation in Realistic Environments. PhD thesis, Kungl Tekniska Hogskolan, 2002.

67


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.