Journal of Automation, Mobile Robotics and Intelligent Systems JAMRIS 03/2018

Page 1

www.jamris.org

VOLUME 12 N°3 2018 www.jamris.org

Journal of Automation, Mobile Robotics & Intelligent Systems

pISSN 1897-8649

(PRINT)

/ eISSN 2080-2145

(ONLINE)

VOLUME 12, N° 3

2018

pISSN 1897-8649 (PRINT) / eISSN 2080-2145 (ONLINE)

Publisher: Industrial Research Institute for Automation and Measurements PIAP

pISSN 1897-8649 (PRINT) /eISSN 2080-2145 (ONLINE)

Indexed in SCOPUS Indexed in SCOPUS


Journal of Automation, mobile robotics & Intelligent Systems

Editor-in-Chief

Typesetting:

Janusz Kacprzyk (Polish Academy of Sciences, PIAP, Poland)

PanDawer, www.pandawer.pl

Advisory Board:

Piotr Ryszawa, PIAP

Dimitar Filev (Research & Advenced Engineering, Ford Motor Company, USA) Kaoru Hirota (Japan Society for the Promotion of Science, Beijing Office) Witold Pedrycz (ECERF, University of Alberta, Canada)

Webmaster: Editorial Office:

Co-Editors:

Industrial Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, POLAND Tel. +48-22-8740109, office@jamris.org

Roman Szewczyk (PIAP, Warsaw University of Technology) Oscar Castillo (Tijuana Institute of Technology, Mexico) Marek Zaremba (University of Quebec, Canada)

Copyright and reprint permissions Executive Editor The reference version of the journal is e-version. Printed in 300 copies.

Executive Editor: Anna Ładan aladan@piap.pl

Associate Editor: Piotr Skrzypczynski (Poznan University of Technology, Poland)

Statistical Editor: Anna Ładan

Editorial Board: Chairman - Janusz Kacprzyk (Polish Academy of Sciences, PIAP, Poland) Plamen Angelov (Lancaster University, UK) Adam Borkowski (Polish Academy of Sciences, Poland) Wolfgang Borutzky (Fachhochschule Bonn-Rhein-Sieg, Germany) Bice Cavallo (University of Naples Federico II, Napoli, Italy) Chin Chen Chang (Feng Chia University, Taiwan) Jorge Manuel Miranda Dias (University of Coimbra, Portugal) Andries Engelbrecht (University of Pretoria, Republic of South Africa) Pablo Estévez (University of Chile) Bogdan Gabrys (Bournemouth University, UK) Fernando Gomide (University of Campinas, São Paulo, Brazil) Aboul Ella Hassanien (Cairo University, Egypt) Joachim Hertzberg (Osnabrück University, Germany) Evangelos V. Hristoforou (National Technical University of Athens, Greece) Ryszard Jachowicz (Warsaw University of Technology, Poland) Tadeusz Kaczorek (Bialystok University of Technology, Poland) Nikola Kasabov (Auckland University of Technology, New Zealand) Marian P. Kazmierkowski (Warsaw University of Technology, Poland) Laszlo T. Kóczy (Szechenyi Istvan University, Gyor and Budapest University of Technology and Economics, Hungary) Józef Korbicz (University of Zielona Góra, Poland) Krzysztof Kozłowski (Poznan University of Technology, Poland) Eckart Kramer (Fachhochschule Eberswalde, Germany) Rudolf Kruse (Otto-von-Guericke-Universität, Magdeburg, Germany) Ching-Teng Lin (National Chiao-Tung University, Taiwan) Piotr Kulczycki (AGH University of Science and Technology, Cracow, Poland) Andrew Kusiak (University of Iowa, USA)

Mark Last (Ben-Gurion University, Israel) Anthony Maciejewski (Colorado State University, USA) Krzysztof Malinowski (Warsaw University of Technology, Poland) Andrzej Masłowski (Warsaw University of Technology, Poland) Patricia Melin (Tijuana Institute of Technology, Mexico) Fazel Naghdy (University of Wollongong, Australia) Zbigniew Nahorski (Polish Academy of Sciences, Poland) Nadia Nedjah (State University of Rio de Janeiro, Brazil) Dmitry A. Novikov (Institute of Control Sciences, Russian Academy of Sciences, Moscow, Russia) Duc Truong Pham (Birmingham University, UK) Lech Polkowski (University of Warmia and Mazury, Olsztyn, Poland) Alain Pruski (University of Metz, France) Rita Ribeiro (UNINOVA, Instituto de Desenvolvimento de Novas Tecnologias, Caparica, Portugal) Imre Rudas (Óbuda University, Hungary) Leszek Rutkowski (Czestochowa University of Technology, Poland) Alessandro Saffiotti (Örebro University, Sweden) Klaus Schilling (Julius-Maximilians-University Wuerzburg, Germany) Vassil Sgurev (Bulgarian Academy of Sciences, Department of Intelligent Systems, Bulgaria) Helena Szczerbicka (Leibniz Universität, Hannover, Germany) Ryszard Tadeusiewicz (AGH University of Science and Technology in Cracow, Poland) Stanisław Tarasiewicz (University of Laval, Canada) Piotr Tatjewski (Warsaw University of Technology, Poland) Rene Wamkeue (University of Quebec, Canada) Janusz Zalewski (Florida Gulf Coast University, USA) Teresa Zielinska (Warsaw University of Technology, Poland)

Publisher: Industrial Research Institute for Automation and Measurements PIAP

If in doubt about the proper edition of contributions, please contact the Executive Editor. Articles are reviewed, excluding advertisements and descriptions of products. All rights reserved © Articles

1


JOURNAL of AUTOMATION, MOBILE ROBOTICS & INTELLIGENT SYSTEMS VOLUME 12, N° 3, 2018 DOI: 10.14313/JAMRIS_3-2018

CONTENTS 43

3

Fuzzy Logic Simulation as a Teaching-Learning Media for Artificial Intelligence Class Tresna Dewi, Pola Risma, Yurni Oktarina DOI: 10.14313/JAMRIS_3-2018/13 10

Chemical Reaction Algorithm for Type-2 Fuzzy Control Optimization in Mobile Robots David de la O, Oscar Castillo, José Soria DOI: 10.14313/JAMRIS_3-2018/14 20

Deep Reinforcement Learning Overview of the State of the Art Youssef Fenjiro, Houda Benbrahim DOI: 10.14313/JAMRIS_3-2018/15 40

Mobile Ferrograph System for Ultrahigh Permeability Alloys Tomasz Charubin, Michał Nowicki, Andriy Marusenkov, Roman Szewczyk, Anton Nosenko, Vasyl Kyrylchuk DOI: 10.14313/JAMRIS_3-2018/16

2

Articles

Comparative Study of PI and Fuzzy Logic Based Speed Controllers of an EV with Four In-Wheel Induction Motors Drive Abdelkader Ghezouani, Brahim Gasbaoui, Nouria Nair, Othmane Abdelkhalek, Jemal Ghouili DOI: 10.14313/JAMRIS_3-2018/17 55

Sliding Mode Control for Longitudinal Aircraft Dynamics Olivera Iskrenovic-Momcilovic DOI: 10.14313/JAMRIS_3-2018/18 61

Comparative Study of ROS on Embedded System for a Mobile Robot Min Su Kim, Raimarius Delgado, Byoung Wook Choi DOI: 10.14313/JAMRIS_3-2018/19 68

A Wavelet Based Watermarking Approach in Concatenated Square Block Image for High Security B. Sridhar DOI: 10.14313/JAMRIS_3-2018/20


Journal of Automation, Automation,Mobile MobileRobotics Robotics&&Intelligent IntelligentSystems Systems

VOLUME 2018 VOLUME 1212, N°N°3 3 2018

����� ����� S��������� �� � ��������-�������� M���� ��� A��������� I����������� ����� ��bm��ed: 2nd March 2018; accepted: 6th November 2018

Tresna Dewi, Pola Risma, Yurni Oktarina DOI: 10.14313/JAMRIS_3-2018/13 Abstract: �olytechnic as a �oca�onal educa�on ins�tu�on has to �eep upda�ng its curriculum with inno�a�on and the latest technology applica�on in industry and domes�c. Teaching learning process in the classroom also has to be based on technology applica�on. The teachinglearning process will be more e�ec��e and interes�ng by in�ol�ing the students to be more pro- ac��e and crea��e. To engage the students more, the teacher needs a teaching-learning media, and one of these media is simula�on so�ware. The �isualiza�on of the simula�on a�racts students� interest and enhances students� crea��ity. This paper proposes the applica�on of an open source and low-cost so�ware simula�on as a teaching-learning media to create an interac��e and e�ci�ng robo�cs class. This study will show the design and applica�on of fuzzy logic controller in a mobile firefighter robot, and simulate the design in SCILAB, an open source so�ware, and in �obotSim, a low-cost so�ware. This alterna��e of open source so�ware can be as good as the high- end ones. This paper shows that the applica�on of fuzzy logic controller can be fun and �ariant for students to en�oy the class. The contribu�on of this research is to show and encourage teachers and students to learn robo�cs and ar�ficial intelligence in an interac��e classroom using free so�ware and also encourage them to search more alterna��e open source so�ware. Keywords: fuzzy logic controller, mobile open-source so�ware, teaching-learning media

robot,

�� ��trod�c�o� The polytechnic curriculum has to reflect the update innovation in the modern technology industry and social activity. Vocational education, unlike conventional education, is required to develop a curriculum that based on applied science, indicated with more laboratory and workshop than classroom time. Teaching has to focus on forming the knowledge, abilities, and competencies enabling the graduates to be successfully integrated into the modern socio-technical systems. Therefore, Teaching learning process in a classroom has to more based on technology application [1], [2], [3] [4]. The teaching-learning process will be more effective and exciting by involving the students to be more pro-active and creative. The interactive class should be two ways discussion by engaging the students more, facilitated by a teaching-learning

media such as simulation software. It can visualize the content of textbook/lecture notes more than teachers’ explanation. This visualization of the simulation attracts students’ interest, and students can apply what they have learned from a textbook [5], [6], [7], [8]. There is some reliable software for simulation purpose, well designed and user-friendly with a certain price [9], [10]. However, not all universities or polytechnics in a developing or third world country can provide high price simulation software such as MATLAB for teachers and students. Teachers have to be more pro-active to search for low price or even open source software that has the ability close to the high-end software, as presented in [11]. Robotics related subjects have to be included in the Electrical Engineering Department to provide students with basic and latest applications in industry. Robotics learning is also known as educational robotics [12], [13], [14], [15]. The design and implementation are possible with simulation for subjects taught in the classroom. Assaf et al. 2012 [16] and Ĺopez-Rodŕiguez et al. 2016 [17] presented the robot kits for educational robotics, however, with simulation the students can focus more in learning and designing without having to deal with the complexity of the real system [18]. One of the robotics-related subjects is artificial intelligence (AI), and the basis of this subject is best learned by simulation. The difficulty and complexity of the teaching materials can grow each week, and students’ involvement is very crucial to ensure the successfulness of the teaching-learning process. One of the most studied subject in artificial intelligence is the Fuzzy Logic Controller (FLC) [19]. Due to the broad application of FLC, it has to be inserted in the AI lesson plan [20], [21]. FLC has been widely applied in planning and control of robot since it can approximate any nonlinear function within any level of accuracy. FLC is very useful for modeling a complex system that is not easy to be modeled with exact mathematical equations, such as a mobile robot that suffers the non-holonomic constraints [22][31]. This paper proposed the application of low cost and open source software simulation as a teaching- learning media to create an interactive and interesting educational robotics. This paper shows that the low cost and open source software can be the substitute for high-end software such as MATLAB. The effectiveness of the proposed method 33


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

is demonstrated by designing FLC for a mobile robot and simulating it with two softwares [32], [33]. The rules and robot scenario are kept simple to give room for students to develop and be creative with the controller design. The contribution of this research is to show and encourage teachers and students to learn robotics and artificial intelligence in an interactive classroom using free/low-cost software, and also encourage them to search more alternative open source software.

2. Research Method 2.1. Mobile Robot Modeling Kinematics modeling is to design robot motion in the robot coordinates frame relative to the work coordinate frame [30] [31]. Fig. 1 shows the most applied a two-wheel differential driven mobile robot with the pose (position and orientation) given as   x q = y (1) ϕ where x and y are robot’s position in x and y-axis, ϕ is the orientation of the robot, and XW and YW are the world coordinates frame. The pose given in (1) gives the translational and rotational velocities as following   ẋ (2) q̇ =  ẏ  ϕ̇

where ẋ·� and ẏ are the translational velocities in x and y-axis respectively, ω = ϕ̇ is the rotational velocity, L is the half width of the robot, r is the wheels’ radius, and θ̇R and θ̇L are right and left tire’s velocities. In order to get the value of tires orientation and velocity, the inverse kinematics of the robot is derived as follow [ ] ( ) θ̇R = f ẋ, ẏ, ϕ̇ (3) θ̇L

&ŝŐ͘ ϭ͘ dǁŽͲǁŚĞĞů ĚŝīĞƌĞŶƟĂů ĚƌŝǀĞŶ ŵŽďŝůĞ ƌŽďŽƚ ŝŶ ŝƚƐ ĐŽŽƌĚŝŶĂƚĞ ĨƌĂŵĞ ƌĞůĂƟǀĞ ƚŽ ǁŽƌůĚ ĐŽŽƌĚŝŶĂƚĞ ĨƌĂŵĞ 4

4

Articles

VOLUME 12, N° 3 2018 VOLUME 12, N° 3 2018

θ̇R =

1 · vR 2π r

and

θ̇L =

1 · vL 2π r

(4)

where vR and vL are the translational velocities of robot’s tires. The relation between robot’s translational v and rotational ω velocities and both tires velocities are v=r

θ̇R + θ̇L , 2

and

ω=

) r ( θ̇R − θ̇L 2L

(5)

The pose in (1) and velocities in (4) are given the non-holonomic constraint of this type of mobile robot in v = ẋ cos θ + ẏ sin θ (6) The non-holonomic constraints means that the robot can only move in curvature motion and not in lateral sideward motion, therefore in lateral motion the velocity of the robot is 0 = ẏ cos θ − ẋ sin θ

(7)

Finally, the modeling of robot shown in Fig. 1 is given by     ẋ cos θ 0 [ ]  ẏ  =  sin θ 0 v (8) ω 0 1 ϕ̇

The derivation of v and ω is necessary to show the relation between the modeling and rule base of FLC design in mobile robots, and in this study, v and ω is defined as the control inputs. Fig. 1 also shows the proximity sensors arrangement indicated by FS (the proximity sensor attached to the front side of the robot), LS (the proximity sensor attached to the left side of the robot), and RS (the proximity sensor attached the right side of the robot). 2.2. Fuzzy Logic Controller Considerable research in navigating a mobile robot in an uncertain environment has been conducted. One of the developed controllers is soft computing, such as Fuzzy Logic Controller [21]- [29]. The improvement in computational method enables to design a controller based on designed robot’s behavior without going through a specified complex model of the robot and the world. The mobile robot is designed to use perception derived from a natural language the way a human does. The fuzzy logic system represents a linguistic modeling that permits robot designer intuitively defining abstract behavior. This approach works well in sensor-based navigation control. The fuzzy logic controller design is given in Fig. 2. Fuzzy inputs from sensors are fed to the fuzzy controller represented by the membership function. Once the inputs are fuzzified, the rules applied to determine a response to inputs. The main objective of this paper is to show the application of FLC simulation as the teaching-learning media. Therefore the application variations are required. Rules are set based on the inputs and inputted to the inference


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

engine. Rules are set based on the behavior design of the robot and inputs from the applied sensor. FLC gives room for students to modify the rules based on the kind of robot they want to realize. The simulation can provide an interesting learning environment. The result from the rule evaluation is translated into a crisp value in the defuzzification stage. FLC in this paper consists of fuzzification, fuzzy rule base, fuzzy interference engine, and defuzzification. The fuzzification proses measure the values from input variables (in this case data from sensors), creates a scale mapping to transfer the ranges of input variable into the corresponding the universes of discourse, and converts those inputs data into suitable linguistic values that given as fuzzy sets. The rule base is the database and linguistic control where the database provides necessary definitions which are used to define linguistic control rules and fuzzy data manipulation in an FLC, characterizes the control goal and setting the linguistic control rules. The defuzzification performs a scale mapping that converts the range of values of output variables into the corresponding scale of the universe and yields a non-fuzzy control action from an inferred fuzzy control action. Inference engine creates a fuzzy output by finding the firing level of each rule, the output of each rule, and the individual rules outputs to obtain the overall system output. Fig. 2 shows that FLC ensures the robot to track the robot’s desired position given by the reference inputs. The steps to get the desired position are 1) Obtain the position of the robot and the obstacles/target given by sensors 2) Fuzzify the result of these measurements 3) Set the rules based on step no 2 4) Set a priority value for each position 5) Repeat the algorithm for all the positions until reaching the target. In this paper, FLC rules and the relationship between inputs and outputs are shown by SCILAB (Fig. 3a) [32] and the simulation complete with the environment shown by MobotSim (Fig. 3b) [33].

3. Result and Discussion

The scenario taken for this paper is a firefighter robot equipped with four proximity and temperature sensors for navigating to the target, a burning candle. Proximity sensors arrangement is shown in Fig. 1.

Fig. 2. Fuzzy logic controller applica�on for a mobile robot equipped with sensors

VOLUME 12, N° 3 2018 VOLUME 12 N° 3 2018

�a� SCILAB simula�on interface

�b� MobotSim simula�on interface

Fig. 3. Fuzzy logic editor interface in SCILAB [32] and MobotSim [33] Firefighter robot is popular among students due to the popularity of firefighter robot contest. A fan is attached to the robot and designed to be ”on” when the robot detects a burning candle in a certain distance. Table. 1 shows the rules when the temperature sensor detection is cold and table. 2 is the rules when the temperature sensor detects the burning candle, where FS is the proximity sensor attached to the front side of the robot, LS is the proximity sensor attached to the left side of the robot, RS is the proximity sensor attached to the right side of the robot, TS is temperature sensor, RM is robot motion, F is fan, C is close, M is medium, F is far, St is ”stop”, S is ”straight motion”, TR is ”turn right”, SL is ”slightly turn left”, GTW is ”go to the wall”, and SS is ”straight slowly”. Fig. 4 shows the inputs membership function for firefighter robot and Fig. 5 shows robot’s output. Fig. 6 shows the relationship between front sensors and the right sensor with robot navigation. The right sensor functions as a wall detection and uses input from the right sensor to follow the wall. The robot follows the wall within a safe designed distance. If the front sensor detects an obstacle, the robot will turn left and consider the obstacle like a wall. The rules are applied in a MobotSim BASIC programming to show the 2D robot motion in its design environment shown Fig. 7. Robot moves from the start point, follows the wall and scans each rooms looking for the burning candle (target). If

Articles

5

5


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12, N° 3 2018 VOLUME 12, N° 3 2018

the robot finds the burning candle, it will stop at a safe distance, and the fan will be on to turn off the fire. As the temperature sensor does not detect the fire anymore (detection is cold), the robot returns to the starting point. Fig. 7a to 7d are the screenshots from the simulation in MobotSim. The green line is the robot’s trajectory as the robot moves from the starting point to the target and returns to the starting point. ”Point” in Fig. 7 is the checkpoint to show that the robot has scanned the room, and the green dot is the unlit candle. The environment and robot motion is designed identically to a firefighter robot contest environment. The simulation shows the ��b. �. Rul�s �h�� t��p���tu�� s��so�s d�t�c�o� is cold Fig. �. ��puts ���b��ship �u�c�o�

Fig. 5. Robot outputs

Fig. �. �h� ��l��o�ship ��o�g ��o�t ��d �ight s��so�s �ith �obot ���ig��o�

6

6

Articles

NO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

FS C C C M M M F F F C C C M M M F F F

LS C C M M F F C C M M F F C C M M F F

RS C M F C M F C M F C M F C M F C M F

TS Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold Cold

RM St S TR SL S GTW SS S GTW SL S GTW SS TR GTW SS S GTW

F Off Off Off Off Off Off Off Off Off Off Off Off Off Off Off Off Off Off

��b. �. Rul�s �h�� th� t��p���tu�� s��so� d�t�c�o� is hot NO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

FS C C C M M M F F F C C C M M M F F F

LS C C M M F F C C M M F F C C M M F F

RS C M F C M F C M F C M F C M F C M F

TS Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot

RM St St St St St St St St St St St St St St St St St St

F On On On On On On On On On On On On On On On On On On


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12, N° 3 2018 VOLUME 12 N° 3 2018

(a) Robot starts

(a) Robots start

(b) Robot reaches the target and stops for a moment

(b) Robots ret�rn to the star�ng point

Fig. 8. Leader-follower robots

(c) Robot is on the wa� to the star�ng point

(d) Robot ret�rns to the star�ng point

Fig. �. Robot sim�la�on in �obot�im robot moves smoothly as designed, from one room to another until finally reaching the target and return to the starting point. This proposed FLC designed can be extended to be leader-follower formation robots as shown in Fig. 8. The leader robot motion is the same with single robot motion in Fig. 7 and the follower robot

front sensor is designed to detect the leader robot and keep a safe distance with the leader. Fig. 8 is to show the versatility of the proposed method that can be extended to any designs, therefore will enhance students’ creativity and at the same time will increase their involvements in the teaching-learning process. The leader-follower robots’ environment in Fig. 8 is slightly different to show that the students can create any environments by setting the target(s), obstacles, and programming in BASIC. The results show that the proposed method allows the students to learn about fuzzy logic interestingly by drawing their attention and participating actively in the teaching-learning process in the classroom. This method had been applied to robotics class in Electrical Engineering Department of Politeknik Negeri Sriwijaya.

4. Conclusion

The interactive class involving students more in the teaching-learning process is essential in improving the academic atmosphere. More interested students mean better class outcomes. The interactive course can be achieved by creating project-based learning and encourage the students to develop Articles

7

7


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

projects. Simulation is the best option for learning robotics in a classroom since the students can focus more on designing the controllers without having to care about the complexity of the real system. However, not all the polytechnics and universities in some countries can afford the high-end software. The alternative of open source and low-cost software can be as good as the high-end ones. This paper has presented the feasibility of using SCILAB, an open source software, and MobotSim, a low-cost software, for designing and creating projects applying fuzzy logic controller. The firefighter robot contest scenario is designed and discussed. The proposed method for a single robot is extended to the leader-follower robot without really changing the FLC design. The result showed that the application of fuzzy logic controller could be fun and variant for students to enjoy the class. This method had been applied to robotics class in Electrical Engineering Department of Politeknik Negeri Sriwijaya.

AUTHORS

Dewi∗

Tresna – Politeknik Negeri Jalan Srijaya Negara Palembang, e-mail: tresna_dewi@polsri.ac.id, www.polsri.ac.idg/~tresna_dewi. Pola Risma – Politeknik Negeri Jalan Srijaya Negara Palembang, e-mail: polarisma@polsri.ac.id, www.polsri.ac.idg/~polarisma. Yurni Oktarina – Politeknik Negeri Jalan Srijaya Negara Palembang, e-mail: yurni_oktarina@polsri.ac.id, www.polsri.ac.idg/~yurni_oktarina. ∗ Corresponding

Sriwijaya, Indonesia, www: Sriwijaya, Indonesia, www: Sriwijaya, Indonesia, www:

author

REFERENCES

[1] W. Jianyu, L. Xi, and X. Chen, “Vocational Ability Oriented Modularized Curriculum for Advanced Vocational School”, Proceeding of 2012 International Conference on Future Computer Supported Education, IERI Procedia, vol. 2, Seoul, 2012, 897-900. DOI: 10.1016/j.ieri.2012.06.188. [2] X. Kuangdi, “Engineering Education and Technology in a Fast-Developing China”, Technology in Society, vol. 30 issue. 3-4, 2008, 265-274. https://doi.org/10.1016/j.techsoc.2008.04.024. [3] N. Liu, and L. Liang, “The Research and Implementation of the Vocational Curriculum Design and Construction of the Repository”, Proceeding of 2012 International Conference on Future Computer Supported Education, IERI Procedia, vol. 2, Seoul, 2012, 133-136. https://doi.org/10.1016/j.ieri.2012.06.063. [4] E. Ospennikova, M. Ershov, and I. Iljin, “Educational Robotics as an Innovative Educational Technology. Worldwide trends

8

8

Articles

VOLUME 12, N° 3 2018 VOLUME 12, N° 3 2018

in the development of education and academic research”, Procedia-Social and Behavioral Sciences, vol. 214, 2015, 18-26. https://doi.org/10.1016/j.sbspro.2015.11.588. [5] R. Y. Kezerashvili, “Teaching RC and RL Circuits Using Computer-Supported Experiment.” Proceeding of 2012 International Conference on Future Computer Supported Education, IERI Procedia, vol. 2, Seoul, 2012, 609-615. DOI: 10.1016/j.ieri.2012.06.142. [6] J. Wang, L. Hung, H. Hsieh, J.Tsai, and I. Lin, “Computer Technology Integration and Multimedia Application for Teacher Professional Development: The Use of Instructional Technology in the Classroom Settings”, Proceeding of 2012 International Conference on Future Computer Supported Education, IERI Procedia, vol. 2, Seoul, 2012, 616-622. https://doi.org/10.1016/j.ieri.2012.06.143. [7] M. Liao, and J. Li, G, “Goal-Oriented Method and Practice in Experimental Teaching”, Proceeding of 2012 International Conference on Future Computer Supported Education, IERI Procedia, vol. 2, Seoul, 2012, 480-484. https://doi.org/10.1016/j.ieri.2012.06.120. [8] J. Arlegui, M. Moro, and A. Pina, “Simulation of Robotic Sensors in BYOB”, Proceeding of 3rd International Conference on Robotics in Education,Prague, 2012, 25-32. ISBN 978-80-7378-219-1. [9] M. Casini, and A. Garulli, “MARS: a Matlab simulator for mobile robot experiments.”, IFAC-PaperOnline, Proceeding of 11th IFAC Symposium on Advances in Control Education ACE 2016, vol. 49, no. 6, Bratislava, 2016, 069-074. https://doi.org/10.1016/j.ifacol.2016.07.155. [10] A. Pandey, and D. R. Parhi, “MATLAB Simulation for Mobile Robot Navigation with Hurdles in Cluttered Environment Using Minimum Rule Based Fuzzy Logic Controller”, Proceeding of 2nd International Conference on Innovations in Automation and Mechatronics Engineering ICIAME 2014, Procedia Technology, vol. 14, Vallabh Vidyanagar, 2014, 28-34. https://doi.org/10.1016/j.protcy.2014.08.005. [11] T. Dewi, P. Risma, Y. Oktarina, and M. Nawawi, “Neural Network Simulation for Obstacle Avoidance and Wall Follower Robot as a Helping Tool for Teaching-Learning Process in Classroom”, Proceeding of 2017 1st International Conference on Engineering & Applied Technology (ICEAT), Mataram, 2017, 705-717. http://conference.fgdt-ptm.or.id/index.php/ iceat/index.


Journal of Automation, Mobile Robotics & Intelligent Systems Journal of Automation, Mobile Robotics & Intelligent Systems

[12] T. T. T. Baros, and W. F. Lages, “Development of a Firefighting Robot for Educational Competition”, Proceeding of 3rd International Conference on Robotics in Education, Prague, 2012, 47-54. ISBN 978-80-7378-219-1. [13] C. Rodŕiguez, J. L. Gusmán, M. Berenguel, and S. Dormido, “Teaching real-time programming using mobile robots”, IFAC-PapersOnLine, vol. 49, issue 6, 2016, 1015. https://doi.org/10.1016/j.ifacol.2016.07.145 [14] P. Petrovic, “Having Fun with Learning Robots”, Proceeding of 3rd International Conference on Robotics in Education, Prague, 2012, 105-112. ISBN 978-80-7378-219-1. [15] A. Eguchi, “Educational Robotics for Promoting 21st Century Skills”, Journal of Automation, Mobile Robotics & Intelligent Systems, vol. 8, no. 1, 2014, 5-11. DOI: 10.14313/JAMRIS_1-2014/1. [16] D. Assaf, J. C. Larsen, and M. Reichardt, “Extending Mechanical Construction Kits to Incorporate Passive and Compliant Elements for Educational Robotics”, 3rd International Conference on Robotics in Education, Prague, 2012, 33-40. ISBN 978-80-7378-219-1. [17] F. M. Ĺopez-Rodŕiguez, F. Cuesta, and Andruino-A1, “Low-Cost Educational Mobile Robot Based on Android and Arduino”, J Intell Robot Syst., vol. 81, no. 1, 2016, 63-76. DOI 10.1007/s10846-015-0227-x. [18] A. Liu, J. Newsom, C. Schunn, and R. Shoop, “Students Learn Programming Faster Through Robotic Simulation”, techdirection, 2013, pp. 16-19. [19] L. A. Zadeh, “Fuzzy Sets. Information Control”, vol. 8, 1965, 338-353. [20] P. Shakouri, O. Duran, A. Ordys, and G. Collier, “Teaching Fuzzy Logic Control Based on a Robotic Implementation”, IFAC Proceedings Volumes, vol. 46, issue 17, 2013, pp. 192-197. https://doi.org/10.3182/20130828-3-UK -2039.00047. [21] I. Rodríques-Fdez. M. Mucientes, and A. Bugar��n, “Learning Fuzzy Controller in Mobile Robotics with Embedded Preprocessing”, Applied Soft Computing, vol. 26, 2015, 123-142. https://doi.org/10.1016/j.asoc.2014.09.021. [22] O. Obe, and I. Dumitrache, “Fuzzy Control of Autonomous Mobile Robot”, U.P.B. Sci. Bull, vol. 72, no. 3, Series C, 2010, 173-186. [23] A. S. Al Yahmedi, and M. A. Fatmi, “Fuzzy Logic Based Navigation of Mobile Robots”, Recent Advances in Mobile Robot, Intechopen, A. Tapalov (Ed), ISBN: 978-953- 307-909-7, DOI: 10.5772/25621.

VOLUME 12, N° 3 2018 VOLUME 12 N° 3 2018

[24] A. Pandey, “Path Planning Navigation of Mobile Robot with Obstacles Avoidance using Fuzzy Logic Controller”, 2014 IEEE 8th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, 2014, 36-41. DOI: 10.1109/ISCO.2014.7103914. [25] D. N. M. Abadi, and M. H. Khooban, “Design of Optimal Mamdani-type Fuzzy Controller for Nonholonomic Wheeled Mobile Robots”. Journal of King Saud University Engineering Sciences, vol 27. no. 1, 2015, 92-100. https://doi.org/10.1016/j.jksues.2013.05.003. [26] S. T. Mitrovic, and Z. Djurovic, “Fuzzy-Based Controller for Differential Drive Mobile Robot Obstacle Avoidance”, ”7th IFAC Symposium on Intelligent Autonomous Vehicle”, vol. 7, Lecce, 2010, 67-72. https://doi.org/10.3182/20100906-3-IT -2019.00014. [27] M. S. Masmoudi, N. Krichen, M. Masmoudi, and N. Derbel, “Fuzzy Logic Controller Design for Omnidirectional Mobile Robot Navigation”, Applied Soft Computing, vol. 49, 2016, 901-919. https://doi.org/10.1016/j.asoc.2016.08.057. [28] S. Nurmaini, B. Tutuko, K. Dewi, V. Yuliza, and T. Dewi, “Improving Posture Accuracy of Non-Holonomic Mobile Robot System with Variable Universe of Discourse”, Telkomnika, vol. 15, no.3, 2017, 1265-1279. DOI: http://dx.doi.org/10.12928/telkomnika.v15i3. 6078. [29] R. H. Abiyev, I. Gunse., N. Akkaya, E. Aytac, A. Cagman, and S. Abizada, “Robot Soccer Control”, Proceeding of 12th International Conference on Application of Fuzzy Systems and Soft Computing ICAFS 2016, Procedia Computer Science, vol. 102, 2016, 477-484. https://doi.org/10.1016/j.procs.2016.09.430. [30] S. G. Tzafestas, “Introduction to Mobile Robot Control”, First Edition. Elsevier, 2014, 31-98. https://doi.org/10.1016/B978-0-12-417049 -0.00002-X. ISBN 9780124171039. [31] C. R. C. Torrico, A. B. Leal, and A. T. Y Watanabe, “Modeling and Supervisory Control of Mobile Robots: A Case of a Sumo Robot”, IFAC-PapersOnline, vol. 49, no. 32, 2016, 240-245. https://doi.org/10.1016/j.ifacol.2016.12.221. [32] https://www.scilab.org/ accessed on october 25th 2017. [33] http://www.mobotsoft.com/ october 25th 2017.

accessed

on

Articles

9

9


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Chemical Reaction Algorithm for Type-2 Fuzzy Control Optimization in Mobile Robots Submitted: 25th August 2018; accepted: 20th September 2018

David de la O, Oscar Castillo, José Soria

DOI: 10.14313/JAMRIS_3-2018/14 Abstract: In this work the optimization process of the tracking and reactive controllers for a mobile robot are presented. The Chemical Reaction Algorithm (CRA) is used to find the optimal parameter values of the membership functions and rules for the reactive and tracking controllers. In this case, we are using five membership functions in each variable of the fuzzy controllers. The main goal of the reactive controller is aimed at providing the robot with the ability to avoid obstacles in its environment. The tests are performed on a benchmark maze problem, in which the goal is not necessarily to leave the maze, but rather that the robot avoids obstacles, in this case the walls, and penalizing for unwanted trajectories, such as cycles. The tracking controller’s goal is for the robot to keep into to a certain path, this in order that the robot can learn to react to unknown environments. The optimization algorithm that was used is based on an abstraction of chemical reactions. To perform the simulation we use the “SimRobot” toolbox, the results of the tests are presented in a detailed fashion, and at the end we are presenting a comparison of results among the CRA, PSO and GA methods. Keywords: Chemical Reaction Algorithm, control, fuzzy logic, robotics

1. Introduction

10

Lofti Zadeh (1965) proposed fuzzy logic and rulesbased procedures as a means to model and capture the human knowledge and deal with uncertainty in the real word. These methods have been applied to ill defined industrial processes, since these methods are usually based on experienced people who usually obtain good results, regardless of whether they receive imprecise information [10–14, 23]. The methods have also been applied to Control of a Mobile Robot using Fuzzy Bee Colony Optimization Algorithm [2, 8], Particle Swarm [3, 16–18, 21–24, 26, 42–46], Genetic Algorithms [11, 15, 29, 47], Differential evolution [30] and Ant Colony Optimization [25, 30, 37, 41]. The origin of these impreciseness can be related to a variation of time concerning the application of a control signal and the warning of its effect [2], and nonlinearities in the dynamics of the system or sensor degradation [21]. The processes in which the fuzzy rule-based approximation have been applied include the auto-

mated process of the Operation of a Public Transport System [38], water tank control [1], [20] and sewage treatment plants [47]. We use the word fuzzy because fuzzy systems have to be precisely defined, a fuzzy controller operates as a non-linear controller that is defined with precision. Essentially what we want to emphasize is that although the phenomenon described by this theory may be fuzzy, the theory itself is accurate. The CRA optimization algorithm was originally developed by Astudillo et al. [6], which is based on a metaheuristic of a population that does not change in size, in addition to applying a generalization of chemical reactions as exploration and exploitation of mechanisms. The algorithm uses chemical reactions by changing at least one of the substances (element or compound), changing their composition and sets of properties. We take as a basis the tests performed by Melendez et al. [32–36] and de la O [13], in which a fuzzy system is designed for the navigation of an Automonous Mobile Robot, it uses 2 controllers, a reactive controller and a tracking controller, and then optimizes the parameters and rules of the controllers using GA [32–36]. This work is organized as follows: Section 2 refers to the concepts of fuzzy logic systems, Section 3 describes the Chemical Optimization Paradigm used in the present paper. Sections 4 and 5 define the fundamental methodology of this work and the benchmark functions used. Section 6 shows the results of the simulations, comparisons and Section 7 presents the conclusion.

2. The CRA Paradigm

The algorithm of chemical reactions was developed by Astudillo et al. in 2011 [6], this algorithm is a new paradigm which is inspired by natural behavior of the chemical reactions, makes the population come together to find an optimal result in the search space supported by several intensifier/diversifier mechanisms. One might think that chemical theory and it is descriptions are difficult and that have no relation with the optimization theory, but only the general scheme is considered as the basis of the chemical reaction optimization algorithm. Astudillo et al. [4–7], defined the elementary terminology for characterizing and classifying artificial chemicals. Because the laws of reaction and rep-


Journal of Automation, Mobile Robotics & Intelligent Systems

resentation of the elements/compounds are of statistical and qualitative character, then the algorithm is a representation of the general procedure of the chemical reactions. The initial description of the elements/compounds depends on the problem. These elements/compounds can be symbolized as binary, integer, floating, numbers etc. The relationship between the elements/compounds is indirect: The interaction does not take into account the rules of interaction and molecular structure and as a consequence does not include values of temperature, pH, pressure, etc. The Chemical reaction algorithm is a metaheuristic that explores all possible solutions that exist for a defined search space. This optimization algorithm uses an element (or compound) to represent a possible solution for a problem and the objective function measures the performance capacity of the element. The algorithm ends when the objective is achieved or the number of scheduled iterations have been reached. The CRA does not use the external values (conservation of masses, thermodynamic characteristics, etc.), and this represents an advantage when compared to other optimization algorithms, as it is a very direct method which takes into account the main features of chemical reactions (synthesis, decomposition, substitution and double substitution) to obtain the optimal solution in a search space.

2.1. Elements or Compounds

The algorithm makes an analogy to natural chemical reactions, therefore it represents a possible solution to the problem using an element, which is initialized with values that depend on the problem to solve, and these values can be binary numbers, integers, floating, etc. These elements will interact with each other indirectly. That is, the interaction is independent of the actual molecular structure. This approach does not take into account other molecular properties, such as potential and kinetic energies, among others.

2.2. Chemical Reactions

A Chemical reaction is a chemical process in which the two substances, the so-called reactants, by the action of an energy factor, become other substances designated as compounds. Taking this process into account, chemical reactions as intensifying (substitution, double substitution reactions) and diversification (synthesis, decomposition reactions) mechanisms can be used. These four chemical reactions considered in this approach are synthesis, decomposition, single and double substitution. With these operators new solutions within a defined search space can be explored, the algorithm allows to define the percentage of the elements to be evaluated in each chemical reaction, it can be from 0% to 100%.

VOLUME 12,

N° 3

2018

2.2.1. Combination Reactions In this type of reactions, two of the substances that can be elements or compounds are combined to form the product. Reactions of this type are classified as combining synthesis, and are generally represented as follows: B + X → BX

(1)

BX → B + X

(2)

2.2.2. Decomposition Reactions In a decomposition reaction, a single substance decomposes or breaks, producing two or more distinct substances. The starting material must be a compound and the products can be elements or compounds. The general form of this equation is the following: 2.2.3. Substitution Reactions In a simple substitution reaction an element reacts with a compound and takes the place of one of the elements of the compound, producing a different element and also a different compound. The general formula for this reaction is:

X + AB → AX + B

(3)

AB + CD → AC + BD

(4)

2.2.4. Double-substitution Reactions In a double substitution reaction, two compounds exchange pairs with each other to produce distinct compounds. The general form of these equations is [4]: The flowchart for this optimization method can be found in Figure 1, and the following list of steps is presented: – We start by generating an initial set of elements/ compounds. – We evaluate the original set of elements, to obtain a fitness values. – Based on the above evaluation, we select some of the elements/compounds to “induce” a reaction. – Taking into consideration the result of the reaction, evaluations of the news element/compounds are obtained and selected elements are those of greater fitness – Repeat the steps until the algorithm meets the terminating criteria (the desired result in the maximum number of iterations is reached) [6].

Fig. 1. Flowchart of the CRA Articles

11


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

This algorithm consists of a metaheuristic based on a static population, and applies an abstraction of chemical reactions as intensifying mechanisms and diversification. It also uses an elitist reinsertion strategy which allows for the perpetuity of the best elements and, therefore, the average fitness of the whole set of elements increases with each iteration. The reactions of synthesis and decomposition are used for exploration in the search space of the solutions: These procedures demonstrate to be effective and promptly lead to the results of a desired optimal value. The single and double substitution reactions allow the algorithm to search for obtaining optimal values around a previously found solution. We start the algorithm by randomly generating a set of elements/compounds under the uniform distribution space of possible solutions, and this is represented as follows:

X = {x1, x2, …, xn},

(5)

Where xn is used to represents the element/compound.

The total number and the representation of the original elements depend on the complexity of the problem that is solved. In order to find the best possible controllers we use a metaheuristic strategy, which has proven to produce good results, and this is achieved by applying the CRA (see Fig. 2). In this case the algorithm will search the solution space of the problem to be solved. Combining the values of the best controllers and generating new controllers. The goal is to optimize the parameters of the membership functions and fuzzy rules.

N° 3

2018

trolled and sensed wheels, in addition to an uncontrolled and un-sensed wheel. Figure 3 illustrates this type of robot. This robot has two degrees of freedom: y-translation and x-translation or z-rotation.

Fig. 3. Kinematic coordinate system [27] The kinematic equations of the mobile robot are as follows: Equation 6 shows the sensed forward velocity solution

(6)

Equation 7 shows the Actuated Inverse Velocity Solution:

(7)

where (in the metric system): are the translational velocities of the robot’s body [m/s], is the robot’s z-rotational velocity [rad/s], are the wheels’ rotational velocities [rad/s], R the actuated wheel radius [m], la, lb are the distances between the wheels and the robot’s axes [m].

4. Simulations and Tests

Fig. 2. General flowchart of the chemical reaction algorithm optimizing the fuzzy controllers

3. Description of the Tool for Simulation

12

The SIMROBOT toolbox software [30] enables performing robot simulations and is used for testing the fuzzy controllers. The mobile robot has two conArticles

Two different control tests were performed to experiment with the performance of the algorithm in control problems, and each test is described as follows. We used the Chemical Reaction Algorithm (CRA) to optimize the parameters and rules of the reactive and tracking fuzzy controllers, one test was to optimize only the parameters of the fuzzy controller, leaving the controller’s fuzzy rules fixed. A second test was to optimize parameters and fuzzy controller rules, and for this test we executed two CRAs simultaneously, one that optimizes the parameters and another one that optimizes the rules, alternating in each iteration. The CRA performs the task of initializing the parameters of each fuzzy controller, selecting the elements and the chemical reactions that will be applied, evaluating the results and, through the simulation, performing an elitist reinsertion and putting them back into the population.


Journal of Automation, Mobile Robotics & Intelligent Systems

4.1. Reactive Controller The reactive control has the purpose to achieve the same capability that a person has when is driving, that is, to react to unanticipated circumstances, road traffic congestions, traffic of the signs, etc., but in a more elementary level. We use a maze to test possible solutions, in the which the objective is not to guide the robot through the maze to the exit, but rather obstacle avoidance. The objective is to optimize the robot controller to find the maze output, use the maze to optimize the reactive control due to the characteristic of the situation of the simulation, i.e. it is a confined space in which the robot can not move easily and each wall is considered as an obstacle for the robot to avoid them while moving. We use a Mamdani type FIS, which consists of 3 inputs, that are the distances obtained by the robot sensors, as mentioned in Section 2, and 2 outputs that control the speed of the servo motors in the robot, and all this information is encoded in each element.

VOLUME 12,

N° 3

2018

The controller is a Mamdani fuzzy system and it has 3 inputs (the sensors of the robot) and two outputs which control the speed of each servomotor of the robot. This is illustrated in Figure 6.

4.1.2. Reactive Controller with Type-2 Fuzzy Logic We encode each membership function of the fuzzy reactive controller, represented by an element, into 50 positions of a vector of real values, which represent the values of each parameter of the triangular membership function, which has five membership functions in each of its variables (see Fig. 7).

4.1.1. Reactive Controller with Type-1 Fuzzy Logic We encode each membership function of the fuzzy reactive controller, represented by an element, into 25 positions of a vector of real values, which represents the values of each parameter of the triangular membership function, which has five membership functions in each of its variables (see Fig. 4).

Fig. 7. Element Encoding to fuzzy parameters We encode the values of the rules of the fuzzy reactive controller, represented by an element, into 250 positions of a vector of integer values, which represent the values of the rules set of the fuzzy controller (see Fig. 8). Fig. 4 Structure of the element to fuzzy parameters We encode the values of the rules of the fuzzy reactive controller, represented by an element, into 250 positions of a vector of integer values, which represent the value of the rules set of the fuzzy controller (see Fig. 5). Fig. 5. Structure of the element to fuzzy rules

Fig. 6. Fuzzy reactive control inputs

Fig. 8. Structure of the element to fuzzy rules The controller is a Mamdani fuzzy system and it has 3 inputs (the sensors of the robot) and two outputs which control the speed of each servomotor of the robot, respectively. This is illustrated in Figure 9.

Fig. 9. Fuzzy reactive control inputs Articles

13


Journal of Automation, Mobile Robotics & Intelligent Systems

4.2. Tracking Controller The goal of the tracking controller is to keep the robot on the right path, in a given reference. The robot will be able to move about the reference and stay on the road, being able to move from point A to B, without obstacles present in the path.

4.2.1. Tracking Controller with Type-1 Fuzzy Logic We encode each membership function of the fuzzy tracking controller, represented by an element, into 20 positions of a vector of real values, which represents the values of each parameter of the triangular membership function, which has five membership functions in each of its variables (see Fig. 10).

VOLUME 12,

N° 3

2018

To calculate the performance of the controller we use the equation of the mean square error between the reference and the path of the robot.

4.2.2. Tracking Controller with Type-2 Fuzzy Logic We encode each membership function of the fuzzy tracking controller, represented by an element, into 40 positions of a vector of real values, which represent the values of each parameter of the triangular membership function, which has five membership functions in each of its variables (see Fig. 13).

Fig. 10. Structure of the element to fuzzy parameters We encode the values of the rules of the fuzzy reactive controller, represented by an element, into 50 positions of a vector of integer values, which represent the values of rules set of the fuzzy controller (see Fig. 11). Fig. 11. Structure of the element to fuzzy rules

Fig. 13 Structure of the element to fuzzy parameters We encode the values of the rules of the fuzzy reactive controller, represented by an element, into 50 positions of a vector of integer values, which represent the values of rules set of the fuzzy controller (see Fig. 14). Fig. 14 Structure of the element to fuzzy rules The controller will take into account the errors (Δep, Δθ) in its minimum values, Figure 7, the minimum values to which we refer are the relative error of the orientation of the left front and the relative error of the position. We used a Mamdani Fuzzy system and its 2 inputs are (Δep, Δθ) and two outputs which control the speed of each servomotor of the robot and this is illustrated in Figure 15.

Fig. 12. Fuzzy controller inputs ℮p, ℮θ

14

The controller will take into account the errors (Δep, Δθ) in its minimum values, Figure 12, the minimum values to which are refert to the relative error of the orientation of the left front and the relative error of the position. We used a Mamdani Fuzzy system and its 2 inputs are (Δep, Δθ) and two outputs which control the speed of each servomotor of the robot and this is illustrated in Figure 12. Articles

Figure 15. Fuzzy controller inputs ℮p, ℮θ


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Figure 16. Fitness Function for the Reactive Controller To calculate the performance of the controller we use the equation of the mean square error between the reference and the path of the robot.

4.3. Objective Function for Both Controllers

The CRA starts by creating elements to be evaluated by the Simrobot toolbox which will assign a crisp value that will represent the performance of the controller taking into account the criteria that we want to achieve. To achieve this, we must provide the CRA with a good evaluation criterion which is capable of penalizing undesirable behaviors and rewarding with higher fitness values those elements that yield the performance we desire in the controller. If we do not provide a correct evaluation method, we can guide the population of elements to suboptimal solutions or even not to a solution at all [5], [17]–[21], [31]. The algorithm has fixed parameters for the chemical reactions, for our tests and based on the proposed by Astudillo et al., we use for each reaction a value of 0.2, which corresponds to take 20% of the amount of elements and react in each of the 4 reactions. 4.3.1. Reactive Controller Objective Function In order to measure the performance of the controller, we will use the following criteria: – Distance traveled, – Time used to travel the distance, – Battery life. In order to measure these criteria we will use a Fitness FIS, which will provide the desired fitness value, adding very basic fuzzy rules that will give greater fitness to the controller that provided the longer trajectories in smaller times and a longer battery life. This seems to be a good strategy that will guide the algorithm to evolve and provide optimal control, but we have noticed that this strategy is not able to do just that on its own: it is also necessary to have a robot trajectory supervisor to make sure that there is a forward movement path and free of loops. For this purpose, it uses a neural network (NN) that is capable of detecting trajectories with cycles that do not have the desired forward displacement behavior,

assigning a low activation value and higher activation values to those that are cycle free. The NN consists of two inputs and one output, and two hidden layers, see Figure 16. To perform the evaluation of the reactive controller we will use the method of integrating both the FIS and the NN where the final fitness value for each element will be calculated with Equation 7. Taking into account the NN response, the activation value is set to 0.35, this means that any activation less than 0.35 will be penalized in the ability given by the FIS. Equation 8 expresses how to calculate the fitness value of each individual

(8)

where: fi – fitness value of the i-th individual, fv – crisp value out of the fitness FIS, nnv – looping trajectory activation value.

4.3.2. Tracking Controller Objective Function To measure the performance of the tracking controller we use the root-mean-square error (RMSE) between the given reference and the path achieved by the robot. We will perform the test three times for each element and take the average of the three tests. The initial position of the robot with respect to the reference is random, but it ensures that a vertical position of the robots is above the reference and in another test it is below it (Fig. 17) [32–36].

Fig. 17. Fitness Function for the Tracking Controller Articles

15


Journal of Automation, Mobile Robotics & Intelligent Systems

5. Simulation Results We present the results of the tests performed for each of the controllers: reactive and tracking. In order to perform these tests we have used as mentioned before the SimRobot Software and the Matlab language. To determine the suitability of each controller we use the simulation software. In the tests of the reactive controller, the robot must be able to react in a closed environment, avoiding hitting the obstacles (walls). In the tracking controller test the robot must be able to stay above the given reference. The results will be presented in two subsections: – Reactive Controller, – Tracking Controller.

5.1. Reactive Controller

In this section, we show the results of the tests with the reactive controller, and to determine the suitability of each controller we use the simulation tool. In the tests of the reactive controller the robot must be able to react in a closed environment, avoiding hitting the obstacles (walls).

5.1.1. Reactive Controller with Type-1 Fuzzy Logic We can find the configuration of the CRA and the results of the simulation tests in Table 1, where we can find the fitness value obtained in each of the experiments. We can also find statistical values which are the mean, variance, best and the worst obtained values.

VOLUME 12,

N° 3

2018

Table 2. Summary of type 2 reactive controls results Element 20 1 2 3 4 5 6 7 8 9 10 Average Best Poor Std Dev

Iteration 1000 Fitness 0.3299 0.3226 0.3143 0.3126 0.33 0.3607 0.3304 0.3299 0.3179 0.33 0.32783 0.3607 0.3126 0.01352

We can find the configuration of the CRA and the results of the simulation tests in Table 3, and we can also find the fitness value obtained in each of the experiments. We can also notice statistical values which are the mean, variance, best and the worst obtained values. Table 3. Summary of Tracking Results

Table 1. Summary of type 1 reactive controls results

5.1.2. Reactive Controller with Type-2 Fuzzy Logic We can find the configuration of the CRA and the results of the simulation tests in Table 2, where we can find the fitness value obtained in each of the experiments. We can also find statistical values which are the mean, variance, best and the worst values obtained.

5.2. Tracking Controller

16

In this section, we show the results of the tests of the tracking controller, and to determine the suitability of each controller we use the simulation tool. The goal of the tracking controller is to keep the robot on the right path, with respect to a given reference. Articles

We can state that the results are good, because the average of the results is 0.2880, the best result is 0.2260, and when comparing with other algorithms (i.e. PSO) the result of the CRA is better.

5.3. Comparison of Results

We compare against Melendez et al. [4], and Tables 4 and 5 summarize the results presented in [4], where we have added the results obtained using the CRA.

5.3.1. Reactive Controller In this section we do a comparison of the CRA, Genetic Algorithm(GA) and Particle Swarm Optimization. We can find the parameters used in the CRA, GA and PSO in Table 3, we note that the parameters that the CRA uses less iterations than the GA and PSO, and also a smaller population size.


Journal of Automation, Mobile Robotics & Intelligent Systems

Table 4. Parameters of the CRA, GA and PSO

We can find the results of CRA, GA and PSO in Table 5, where we have the best, the worst, standard deviation and the average of each method. Table 5. Comparison of Results of CRA and GA

VOLUME 12,

N° 3

2018

6. Conclusion In the present work, we used the CRA to optimize the parameters and rules of the fuzzy controllers, both for reactive and tracking behaviors. The reactive controller aims at giving the robot the ability to avoid obstacles in its environment. The tests were performed in a maze, in which the goal is not to leave the maze, but that the robot avoids obstacles, in this case the walls, and penalizing the unwanted trajectories as cycles. The tracking controller’s goal is for the robot to be able to stay on a certain path, this test was performed 3 times for each element, this in order that the robot can react to unknown environments. After performing the tests, analyzing and comparing the results, we can notice that the algorithm is not statistically better GA, and its performance is similar to the PSO, because it is a newly created algorithm. We propose the following tasks that could be performed to improve the performance of the algorithm: to use a fuzzy controller that controls the parameters of the chemical reactions, since these values are fixed during the execution of the test, with this we will be able to do at the beginning an exploration and at the end focus on the exploitation. It is important to mention that the advantage of the algorithm is that we use it for minimizing time, less population (elements) and less iterations.

AUTHORS

David de la O, Oscar Castillo*, José Soria – Tijuana Institute of Technology, 22379, Tijuana, Mexico. Emails: ddsh@live.com, ocastillo@tectijuana.mx, jsoria57@gmail.com To compare the algorithms we performed an ANOVA test for the three samples with a significance level a of 0.05. Using the following parameters for the ANOVA test. H0: All means are equal H1: At least one mean is different α = 0.05. To perform the analysis, the variances are assumed to be the same. We can conclude that the results of the ANOVA test is to reject the null hypothesis. Because of this, the Tukey test is also performed. Tukey’s comparisons indicated with 95% confidence that PSO was statistically better than GA and CRA, which were assumed to be statistically the same. 5.3.2. Tracking Controller We can find the parameters used for CRA, GA and PSO in Table 6, we can note that the parameters of CRA uses less iterations than the GA and PSO, and also a smaller population size. Table 6. Comparison of Results of CRA and PSO

*Corresponding author.

REFERENCES

Amador-Angulo, L., Castillo, O., Comparison of the optimal design of fuzzy controllers for the water tank using ant colony optimization. Series: Recent Advances on Hybrid Approaches for Designing Intelligent Systems. Springer International Publishing, 2014, 255–273. DOI: 10.1007/978-3-319-05170-3_18. 2. Amador-Angulo, L., Castillo, O., “A Fuzzy Bee Colony Optimization Algorithm Using an Interval Type-2 Fuzzy Logic System for Trajectory Control of a Mobile Robot”. In: Mexican International Conference on Artificial Intelligence, Springer International Publishing, October 2015. DOI: 10.1007/978-3-319-27060-9_38. 3. Amin S., Adriansyah A., “Particle Swarm Fuzzy Controller for Behavior-based Mobile Robot”. In: ICARCV’06. 9th International Conference on Control, Automation, Robotics and Vision, Dec. 2006, pp. 1, 6, 5–8. DOI: 10.1109/ICARCV.2006.345293.  4. Astudillo L., Castillo O., Aguilar L., R. Martínez, “Hybrid Control for an Autonomous Wheeled Mobile Robot Under Perturbed Torques”. In: IFSA (1) 2007, chapter 59, 594–603. DOI: 10.1007/978-3-540-72950-1_59. 1.

Articles

17


Journal of Automation, Mobile Robotics & Intelligent Systems

18

5. Astudillo L., Castillo O., Aguilar L., “Intelligent Control of an Autonomous Mobile Robot using Type-2 Fuzzy Logic”. In: IC-AI 2006, 565–570.  6. Astudillo, L., Melin, P., Castillo, O., “A new optimization method based on a paradigm inspired by nature”. In: Soft Computing for Recognition Based on Biometrics, Springer Berlin-Heidelberg, 2010, 277–283.  7. Astudillo, L., Melin, P., Castillo, O., “Nature inspired chemical optimization to design a type2 fuzzy controller for a mobile robot”. In: IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013, 1423–1428. DOI: 10.1109/IFSA-NAFIPS.2013.6608610.  8. Caraveo, C., Valdez, F., Castillo, O., “Optimization of fuzzy controller design using a new bee colony algorithm with fuzzy dynamic parameter adaptation”, Applied Soft Computing, vol. 43, 2016, 131–142. DOI: 10.1016/j.asoc.2016.02.033.  9. Cardenas S., Garibaldi J., Aguilar L., Castillo O, “Intelligent Planning and Control of Robots Using Genetic Algorithms and Fuzzy Logic”. IC-AI, 2005, 412–418. 10. Castillo O., Martinez R., Melin P., Valdez F., Soria J., “Comparative study of bio-inspired algorithms applied to the optimization of type-1 and type-2 fuzzy controllers for an autonomous mobile robot”, Information Sciences, vol. 192, 2012, 19–38. DOI: 10.1016/j.ins.2010.02.022. 11. Cervantes L., Castillo O., “Design of a Fuzzy System for the Longitudinal Control of an F-14 Airplane”. In: Soft Computing for Intelligent Control and Mobile Robotics, 2011, 213–224. DOI: 10.1007/978-3-642-15534-5_13. 12. de la O D., Castillo O., Soria J., “Optimization of Reactive Control for Mobile Robots Based on the CRA Using Type-2 Fuzzy Logic”. In: de la O., Nature-Inspired Design of Hybrid Intelligent Systems, chapter 33, 2017, 505–518. DOI: 10.1007/978-3-319-47054-2_33. 13. de la O D., Castillo O., Melendez A., Astudillo L., “Optimization of a reactive controller for mobile robots based on CRA”. In: Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC), 2015 Annual Conference of the North American IEEE, 1–6. 14. de la O D., Castillo O., Astudillo L., Soria J., “Fuzzy chemical reaction algorithm with dynamic adaptation of parameters”. In: Fuzzy Logic in Intelligent System Design, Patricia Melin, Oscar Castillo, Janusz Kacprzyk, Marek Reformat, William Melek (eds.), Springer International Publishing, 2018, 122–130. DOI: 10.1007/978-2-319-67137-6_13. 15. De Santis E., Rizzi A., Sadeghiany A, Mascioli F.M.F., “Genetic optimization of a fuzzy control system for energy flow management in micro-grids”. In: IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), June 2013 pp. 418, 423, 24–28 June 2013. DOI: 10.1109/IFSA-NAFIPS.2013.6608437 16. Dongyun Wang, Guan Wang, Rong Hu, “Parameters optimization of fuzzy controller based on PSO”. In: ISKE 2008. 3rd International Conference on Intelligent System and Knowledge Engineering, 17th–19th Nov. 2008 vol. 1, 599, 603. Articles

VOLUME 12,

N° 3

2018

17. Esmin A. A A, Aoki A. R., Lambert-Torres G., “Particle swarm optimization for fuzzy membership functions optimization”. In: 2002 IEEE International Conference on Systems, Man and Cybernetics, 6th–9th Oct. 2002, vol. 3, p. 6. DOI: 10.1109/ICSMC.2002.1176020. 18. Fierro R., Castillo O., “Design of Fuzzy Control Systems with Different PSO Variants”. In: Recent Advances on Hybrid Intelligent Systems, chapter 6, 2013, 81–88. DOI: 10.1007/978-3-642-33021-6_6. 19. Fierro R., Castillo O., Valdez F., “Optimization of fuzzy control systems with different variants of particle swarm optimization”. In: 2013 IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA), 51–56. DOI: 10.1109/HIMA.2013.6615022. 20. Fierro R., Castillo O., Valdez F., Cervantes L., “Design of optimal membership functions for fuzzy controllers of the water tank and inverted pendulum with PSO variants”. In: IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint (pp. 1068–1073). IEEE. 21. Gu Fang, Ngai Ming Kwok, Quang Ha, “Automatic fuzzy membership function tuning using the particle swarm optimization”. In: 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 2008, vol. 2, 324–328, 2008. DOI: 10.1109/PACIIA.2008.105. 22. H. X. Li, H. B. Gatland, “A New Methodology for Designing a Fuzzy Logic Controller”, IEEE Trans. On Sys., Man, and Cybernetics, vol.25, no. 3, March 1995, 505–512. 23. Kamejima T., Phimmasone V., Kondo Y., Miyatake M., “The optimization of control parameters of PSO based MPPT for photovoltaics”. In: 2011 IEEE 9th International Conference on Power Electronics and Drive Systems (PEDS), 5th–8th Dec. 2011, 881–883. DOI: 10.1109/PEDS.2011.6147358. 24. Lizarraga E., Castillo O., Soria J., Valdez F., “A Fuzzy Control Design for an Autonomous Mobile Robot Using Ant Colony Optimization”. In: Recent Advances on Hybrid Approaches for Designing Intelligent Systems, chapter 20, 2014, 289–304. DOI: 10.1007/978-3-319-05170-3_20 . 25. Martínez R., Castillo O., Soria J., “Particle Swarm Optimization Applied to the Design of Type-1 and Type-2 Fuzzy Controllers for an Autonomous Mobile Robot”. In: Bio-inspired Hybrid Intelligent Systems for Image Analysis and Pattern Recognition, 2009, 247–262. 26. Martínez R., Castillo O., Aguilar L., “Optimization of interval type-2 fuzzy logic controllers for a perturbed autonomous wheeled mobile robot using genetic algorithms”, Information Sciences, 2009, vol. 179, vol. 13, 2158–2174. DOI: 10.1016/j.ins.2008.12.028. 27. Martinez-Soto R., Castillo O., Aguilar L., Baruch I., “Bio-inspired optimization of fuzzy logic controllers for autonomous mobile robots. In: 2012 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS), 2012, 1–6. DOI: 10.1109/NAFIPS.2012.6291053. 28. Martínez-Soto R., Castillo O., Aguilar L., Melin P., “Fuzzy Logic Controllers Optimization Using Ge-


Journal of Automation, Mobile Robotics & Intelligent Systems

29.

30.

31. 32.

33. 34. 35. 36.

37. 38. 39.

netic Algorithms and Particle Swarm Optimization”. In: Advances in Soft Computing, chapter 41, 2010, 475–486. DOI: 10.1007/978-3-642-16773-7_41. Martinez-Marroquin R., Castillo O., Soria J., “Parameter tuning of membership functions of a fuzzy logic controller for an autonomous wheeled mobile robot using ant colony optimization”. In: FUZZ-IEEE 2009. IEEE International Conference on Fuzzy Systems, 2007–2012. Measurement and Instrumentation, Faculty of Electrical Engineering and Computer Science, Brno University of Technology, Czech Republic Depart-ment of Control. Autonomous Mobile Robotics Toolboxfor Matlab 5. Online. http://www. uamt.feec.vutbr.cz/robotics/simulations/amrt/ simrobot en.html, 2001. Melendez A., Castillo O., “Optimization of type2 fuzzy reactive control-lers for an autonomous mobile robot”. In: 2012 Fourth World Congress on Nature and Biologically Inspired Computing (NaBIC), 2012, 207–211. Melendez A., Castillo O., “Evolutionary optimization of the fuzzy integrator in a navigation system for a mobile robot”. In: Castillo O., Melin P., Janusz Kacprzyk (eds.), Recent Advances on Hybrid Intelligent Systems, 2013, vol. 451 of Studies in Computational Intelligence, 21– –31. DOI: 10.007/978-3-319-47054-2-43 . Melendez A., Castillo O., Soria J., “Reactive control of a mobile robot in a distributed environment using fuzzy logic”. In: Fuzzy Information Processing Society, 2008. NAFIPS 2008. Annual Meeting of the North American, 2008 1–5. DOI: 10.1109/NAFIPS.2008.4531341. Melendez A., Castillo O., Garza A., Soria J., “Reactive and tracking control of a mobile robot in a distributed environment using fuzzy logic”. In: IEEE International Conference on Fuzzy Systems, 2010, 1–5. DOI: 10.1109/FUZZY.2010.5583955. Melendez A., Castillo O. “Hierarchical genetic optimization of the fuzzy integrator for navigation of a mobile robot”. In: Soft Computing Applications in Optimization, Control, and Recognition, chapter 4, 2013, 77–96. DOI: 10.1007/978-3-642-35323-9_4. Porta García M. A., Montiel O., Castillo O., Sepúlveda R., Optimal Path Planning for Autonomous Mobile Robot Navigation Using Ant Colony Optimization and a Fuzzy Cost Function Evaluation. Analysis and Design of Intelligent Systems using Soft Computing Techniques, 2007, 790–798. Milla F., Sáez D., Cortés C.E., Cipriano A., “Bus-Stop Control Strategies Based on Fuzzy Rules for the Operation of a Public Transport System”, IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 3, pp. 1394–1403. DOI: 10.1109/TITS.2012.2188394. Montiel O., Camacho J., Sepúlveda R., Castillo O., “Fuzzy System to Control the Movement of a Wheeled Mobile Robot”. In: Soft Computing for Intelligent Control and Mobile Robotics, 2011, 445–463. Ochoa P., Castillo O., Soria J., “Differential evolution with dynamic adaptation of parameters for the optimization of fuzzy controllers”. In: Recent

VOLUME 12,

40. 41. 42. 43. 44.

45. 46.

N° 3

2018

Advances on Hybrid Approaches for designing intelligent systems, chapter 19, 2014, 275–288. DOI: 10.1007/978-3-319-05170-3_19. Porta M., Montiel O., Castillo O., R. Sepúlveda, Melin P., “Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation”, Appl. Soft Comput., 2009, vol. 9, no. 3, 1102–1110. Vaneshani S., Jazayeri-Rad H., “Optimized fuzzy control by particle swarm optimization technique for control of SCTR”, International Journal of Electrical and Computer Engineering, 2011, vol. 11, no. 5, 464–470. DOI: scholar.waset.org/1307-6892/391. Aguas-Marmolejo S.J., Castillo O., “Optimization of Membership Functions for Type-1 and Type 2 Fuzzy Controllers of an Autonomous Mobile Robot Using PSO”, Recent Advances on Hybrid Intelligent Systems, vol. 451, 2013, 97–104. DOI: 10.1007/978-3-642-33021-6_8. Wong S., Hamouda A., “Optimization of fuzzy rules design using genetic algorithm”, Advances in Engineering Software, vol. 31, issue 4, April 2000, 251–262, ISSN 0965-9978. DOI: 10.1016/S0965-9978(99)00054-X. Yen J., Langari R., Fuzzy Logic: Intelligence, Control, and Information, Prentice Hall, 1999. Ying Bai, Hanqi Zhuang, Zvi. S. Roth, “Fuzzy Logic Control to Suppress Noises and Coupling Effects in a Laser Tracking System”, IEEE Trans on Control Systems Technology, vol.13, no.1, January 2005, 113–121. Zafer B., Oğuzhan K., “A Fuzzy Logic Controller tuned with PSO for 2 DOF robot trajectory control”, Expert Systems with Applications, vol. 38, issue 1, January 2011, 1017–1031, ISSN 09574174. DOI: 10.1016/j.eswa.2010.07.131.

Articles

19


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Deep Reinforcement Learning Overview of the State of the Art Submitted: 5th October 2018; accepted: 16th November 2018

Youssef Fenjiro, Houda Benbrahim

DOI: 10.14313/JAMRIS_3-2018/15 Abstract: Artificial intelligence has made big steps forward with reinforcement learning (RL) in the last century, and with the advent of deep learning (DL) in the 90s, especially, the breakthrough of convolutional networks in computer vision field. The adoption of DL neural networks in RL, in the first decade of the 21 century, led to an end-toend framework allowing a great advance in human-level agents and autonomous systems, called deep reinforcement learning (DRL). In this paper, we will go through the development Timeline of RL and DL technologies, describing the main improvements made in both fields. Then, we will dive into DRL and have an overview of the state-ofthe-art of this new and promising field, by browsing a set of algorithms (Value optimization, Policy optimization and Actor-Critic), then, giving an outline of current challenges and real-world applications, along with the hardware and frameworks used. In the end, we will discuss some potential research directions in the field of deep RL, for which we have great expectations that will lead to a real human level of intelligence. Keywords: reinforcement learning, deep learning, convolutional network, recurrent network, deep reinforcement learning

1. Introduction

20

Reinforcement learning [1], [2] is an AI sub-domain allowing agent to fulfill a given goal while maximizing a numerical reward signal. It was developed within three main threads. The first is the concept of learning by trial and error, discovered during researches undertaken in psychology and neuroscience of animal learning. The second concept is the problem of optimal control developed in the 1950s using a discrete stochastic version of the environment known as Markovian decision processes (MDP) and adopting a concept of a dynamical system’s state and optimal return function (Reward) and defining the “Bellman equation” to optimize the agent behavior over the time (Dynamic programming). The last concept concerns the temporal-difference methods, which become the mainstream, and was boosted by the actor-critic architecture. This topic is detailed in the first section. Deep Learning (DL) [3] is a machine learning sub-domain, based on the concept of artificial neural networks that imitates human brain while processing

data and creating patterns for use in decision-making. DL enables automatic features engineering and endto-end learning through gradient descent and backpropagation. There are many types of DL nets, which usage depend on their application and the nature of the problem being treated. For time sequences like speech recognition, natural language processing we use recurrent neural network. For extracting visual features, like image classification and object detection, we use convolutional neural network. For data pattern recognition like classification and segmentation, we use feed-forward networks, and for some complex tasks like video processing, object tracking and image captioning, we use a combination of those nets. This topic is detailed in the second section. The link between RL and DL technologies was made, while AI researchers were seeking to implement a single agent that can think and act autonomously in the real world, and get rid of any hand-engineered features. In fact, in 2015, Deepmind succeed to combine RL, which is a decision-making framework and DL [4], which is a representation learning framework allowing visual features extraction, to create the first end-to-end artificial agent that achieves human-level performance in several and diverse domains. This new technology named deep reinforcement learning is used now, not only to play ATARI games, but also to design the next generation of intelligent self-driving cars like Google with Waymo, Uber, and Tesla. In summary, this paper will give an outline of RL and DL technologies (in sections II and III respectively), which are the basis of the deep RL. Section IV will focus on dissecting the different approaches and improvements that had a significant impact on building a human-level autonomous agents, by giving (a) an overview of state-of-the-art deep RL algorithms and achievements in recent years and (b) an outline of the current challenges of DRL and its applications in industry, and (c) an introduction to the latest toolkits and framework libraries that can be used to develop deep RL approaches. Finally, we open a discussion related to deep RL and then inherently raise different directions for future studies in the conclusion.

2. Preliminary: Reinforcement Learning

In this section, we begin with an introduction to the fundamental concepts of reinforcement learning [1] like Markov decision process, Bellman equation,


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

and exploration vs exploitation dilemma. Then we will review the main algorithms and methods developed that represent the key breakthroughs of contemporary RL, allowing autonomous human-level agents to reach the actual state-of-the-art DRL.

2.1. Reinforcement Learning and Markov Decision Process

Reinforcement Learning is an AI domain inspired by behaviorist psychology, it is based on a mechanism that learns through trial and error by interacting with a stochastic environment. It is built up on the concept of Markov Decision Process MDP (see Fig. 1), a sequential decision-making problem based, defined by a 5-tuple: A set of states and actions (S,A), reward model R, state transition probability matrix P (from all states s to all their successor s’) and discounted factor γ ϵ [0,1], which allows to give more importance to recent rewards compared to future rewards. An environment is said to be MDP when the state S contains all information the agent needs to act optimally.

Vπ ( s= ) Eπ (Gt | S= s= ) Eπ t

If MDP is episodic, the state is reset after each episode of length T. Reward Rt defines what are the good and bad events for the agent, and Cumulative reward Gt (2) is the discounted sum of reward accumulated throughout an episode of T steps:

Gt= Rt +1 + γ Rt +2 + γ 2 Rt +3 + ...=

T

k =0

γ k Rt += Rt +1 + γ Gt +1 k +1

(2)

π is the policy function that maps each possible state s of the agent to its selective action a, π: S → p(A = a|S). The agent try to learn an optimal policy π* in order to take the best actions that maximize the cumulative reward Gt [reinforcement feedback from the environment].

2.2. Reinforcement Learning and Bellman Equations

To find the optimal policy π* that achieves the maximum cumulative reward, RL algorithms involve estimating the following value functions:

(∑

k =0

for all s ∈ S

)

γ k Rt +k +1 | S= s ,

(3)

• Action-value function Q(s,a) (4) how good (future rewards) it is to perform an action a in a state s. Q(s,a) under policy π is denoted : Qπ ( s , a= ) Eπ (Gt | S= s, A= a= ) t t Eπ

(∑

)

γ k Rt +k +1 | S t = s ,

for all s ∈ S and a ∈ A (4) k =0

From the two relation above, we infer the Bellman equations that break RL problems into sub-problems by expressing an iterative relationship between the value of a state st and the values of its successor states st+1, with rt the expected reward from st to st+1 by following at, we have the equations (5):

∑ π (a | s ) ∑ P( s at

t

∑ P( s

Fig. 1. Markov decision process

′ | S t s= P(= S t +1 s= at ,= S t −1 st −1 , = At −1 at −= t , At 1 , ...) ′ | S t s= P(= S t +1 s= at ) (1) t , At

2018

• State-Value function V(s) (3) estimate how good (future rewards) it is, to be in a state s. V(s) under policy π is denoted Vπ(s):

= Vπ ( st )

The state transitions of an MDP is memoryless, so, we say that it satisfies the Markov property (1). RL agents behave under this assumption, so the effects of an action taken in a state depend only on that state and not on the prior history:

N° 3

t

t +1

, rt | st , at )[rt + γ . Vπ ( st +1 )]

Qπ ( st , at ) = ∑ π (at | st ) at

t +1

st +1, rt

st +1, rt

, rt | st , at )[rt + γ . Qπ ( st +1 , at +1 )]

(5)

We then infer the Bellman optimality equations (6) (7) for V and Q under the optimal policy as below:

= V *(st ) max = π [Vπ ( st )]

max at

st +1, rt

P( st +1 , rt | st , at )[rt + γ , Vπ ( st )]

= Q *(st , at ) max = π [Qπ (st , at )] s ′, r

P( st +1 , rt | st , at )[rt + γ , max at +1 Qπ (st +1 , at +1 )]

(6) (7)

Optimal Value function = immediate reward r + discounted value of successors state γV(St+1). The optimal policy can be found using two different modes depending on: • On-policy agent learns policy, and action is performed by the current policy instead of the greedy policy. • Off-policy agent learns the optimal values (V,Q), and action is performed by the greedy policy (max operator in Bellman equation → Optimal policy). In RL, there are two main types of algorithms: • Model-based algorithms that use a model to predict the unobserved portion of the environment like Dynamic Programming, but they suffer from Bellman’s curse of dimensionality problem (use full-width backups), since knowing all the elements of the MDP is a tough task, especially when we have infinite states or almost. • Model-free algorithms that skip learning a model and directly learn what action to do and when, by estimating the value function of a certain policy without a concrete model. The most known Articles

21


Journal of Automation, Mobile Robotics & Intelligent Systems

methods are Monte-Carlo, Temporal difference learning and its variants Q-learning and SARSA. In this paper, we will focus on model-free RL.

2.3. The Exploration/Exploitation Dilemma

In real life, the best long-term strategy may involve short-term sacrifices to gather enough information so as to make the best overall decisions. For RL problems, to avoid being stuck in a local maximum, we have to balance between the two concurrent approaches Exploration and Exploitation (see Fig. 2): • Exploit (using deterministic search): in this case, the search is deterministic and the RL agent chooses actions that he has already attempted in the past (from the history of trials) and which maximized the cumulative reward and proved to be the most efficient. • Explore (using non-deterministic search): to gather more information and discover such actions, it has to try actions (weighted by a probability of correctness) that it has not selected before, and so allow the exploration of the other possibilities, in order to make better action selections in the future.

Fig. 2. Exploitation vs Exploration The most known approaches to exploration use the following action-selection strategies [5]: • Greedy Approach: Agent is exploiting its current knowledge to choose at any time the action which he expects to provide the greatest reward. • ε-greedy Approach: forces the non-greedy actions to be tried (exploration) with no preference for nearly greedy ones or particularly uncertain (chooses equally among all actions). ε is the probability of exploration typically 5 or 10% (see Fig. 3).

VOLUME 12,

22

Articles

2018

[6]; thus, the distribution variance provides an estimate of the uncertainty of each action.

2.4. Monte-Carlo Learning

Monte Carlo method [1] relies on repeated random sampling to obtain numerical results. By the law of large numbers, the expected value of a random variable can be approximated, by taking the sample mean of independent samples of the variable. MC methods are used in RL to solve episodic problems by averaging sample returns and learning directly from complete episodes of experience without bootstrapping. MC methods are insensitive to initial value since they learn only from complete sequences, the return is known only at the end of the episode and not before. MC is used for prediction by learning the state-value function Vπ(s) following a given policy π, and for control by estimating the policy using Generalized Policy Iteration GPI and the action-value function Q. The MC control algorithm starts with an arbitrary policy π and iterates between the two steps until converging toward the optimal policy π*: • Policy evaluation: use the current policy π to estimate Qp or Vp. • Policy improvement: making a better policy π by deterministically choosing actions with maximal action-value: π(s) = arg maxa q(s,a).

2.5. Temporal-Difference Learning

Temporal-Difference Learning [1], [6] is model-free methods that act by deriving its information from experiences without having complete knowledge of the environment. TD combines Monte Carlo methods, by learning directly from raw experience, with dynamic programming methods, by updating value function estimates through bootstrapping from its current estimate. TD updates values using recent trends so as to capture the effect of a certain state. It learns online after every step from incomplete episodes of experience by sampling the environment according to a given policy and approximating its current estimate based on previously learned estimates. The general rule (8) can be summarized as follow: VNew ← VOld + StepSize* [Target − VOld]

Fig. 3. ε-greedy approach • Softmax Approach: All the actions are ranked and weighted according to their values estimates, but the selection probability of Greedy actions is the highest. A random action is selected, by taking into account the weight of each action. In practice, we use an additional temperature parameter (τ) applied to Softmax, to lower the low probabilities and higher the high probabilities. • Bayesian Approach: we add a probability distribution to the neural network weights by repeatedly sampling from a network with dropout

N° 3

V ( S t ) ← V ( S t ) + α (Rt +1 + γ V ( S t +1 ) − V ( S t )    TD target   TD Error

(8)

In TD learning, instead of computing the update every step, we can postpone it after N steps. The N-step return, in this case, is calculated as follows:

Gt( n )= Rt +1 + γ Rt +2 + ... + γ n−1 Rt +n + γ nV (Rt +n ) (9)

N-Step TD formula becomes:

V ( S t ) ← V ( S t ) + α [Gt( n ) − V ( S t )]

(10)


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

In TD learning, we also integrate a neurologic phenomenon called eligibility trace (ET), This RL mechanism uses a short-term memory that stores the steps state-action history called traces. Those traces mark the state as eligible for learning, reinforcing the event that contributed to getting to the reward. It decays gradually over time if the given state is not enough visited. So, ET extends what the agent learned at t+1 also to previous states, by tracking where he has been (prior states) and back-upping of reward for a longer period, so to reinforce the most visited states and accelerate learning. Eligibility traces [1] implement a memory trace that is usually an exponential function, with a decay parameter. The three most known Eligibility traces implementations are: • Accumulating traces: accumulate each time the state is visited, then fades away gradually when the state is not visited.

γλet −1( s ) if s ≠ st et ( s ) =  st γλet −1( s ) + 1 if s =

(11)

• Replacing traces: each time a state is visited, the trace is reset to 1, regardless to present or prior trace

γλet −1( s ) if s ≠ st et ( s ) =   1 if s = st

(12)

• Dutch traces: intermediate between accumulating and replacing traces, depending on the step-size parameter a

γλet −1( s ) if s ≠ st  et ( s ) =  st (1 − α )γλet −1( s ) + 1 if s =

(13)

et(s) is the eligibility trace function for state s and time t, On each step, it decays by gl for all non-visited states.

2.6. Q-learning

Q-learning [7] is an Off-Policy algorithm for TD-Learning control (MDP environment) used in reinforcement learning. The learned action-value function Q approximates directly the optimal action-value function Q*, regardless of the policy being followed. One-step Q-learning is defined by (9):

Q( S t , At ) ← Q( S t , At ) + α {Rt +1 +

γ max a [Q( St +1 , a)] − Q( St , At )}

(14)

Q-learning combined with eligibility trace become Q(λ):

Q( S t , At ) ← Q( S t , At ) + α et ( s , a){Rt +1 +

γ max a [Q( St +1 , a)] − Q( St , At )}

N° 3

2018

which doesn’t distinct between exploratory and greedy actions and Naï�ve Q(λ) which is similar to Watkins’s method, except that the traces are not set to zero on exploratory actions.

γλet −1 ( s, a), if ∀s ∈ S , s ≠ S t , a ≠ at  ( S t , at ) and γλet −1 ( s, a) + 1, if ( s , a) = et ( s, a) =  (16) Qt −1 ( st , at ) = max a [Qt −1 ( st , a)] 0, if Q ( s , a ) ≠ max [Q ( s , a)] t −1 t t a t −1 t 

2.7. SARSA SARSA (State–Action–Reward–State–Action)[9] is an on-Policy algorithm for TD-Learning control (MDP environment) used in the reinforcement learning, it learns an action-value function of [state, action] pairs that depends on the quintuple (st, at, rt, st+1, at+1). What makes the difference with Q-Learning, is that with SARSA, the maximum reward for the next state is not necessarily used for updating the Q-values; instead, a new action (& reward), is selected using the same policy that determined the original action:

Q( S t , At ) ← Q( S t , At ) +

α [Rt +1 + γ Q( St +1 , At +1 ) − Q( St , At )]

(17)

SARSA combined with eligibility trace become SARSA(λ):

Q( S t , At ) ← Q( S t , At ) +

α et ( s , a)[Rt +1 + γ Q( St +1 , At +1 ) − Q( St , At )] (18)

In SARSA, the policy π is updated at each visit choosing the action with the highest state-action value argmaxaQ(st, at) making the policy greedy.

2.8. Actor-Critic

Actor-Critic (AC) algorithms were inspired by neuroscience and animal learning [10], it’s a hybrid control methods that combine the policy gradient method and the value function method together. The Actor-Critic algorithm (see Fig. 5) introduces a Critic that judges the actions of the actor. The Actor is the source of high variance and the critic provides low-variance feedback on the quality of the performance, which balanced the equation. Adding the Critic component reduces variance and higher the likelihood of convergence of the policy gradient methods.

(15)

With three known implementations of eligibility trace [1] for Q(λ): Watkins Q(λ) (10) [8], Peng Q(λ)

Fig. 5. Actor-Critic Algorithm steps Articles

23


Journal of Automation, Mobile Robotics & Intelligent Systems

2.9. RL limitations and Function approximator

In RL the value functions V(s) or Q(s,a) are represented by a state transition table (lookup table), where every state s, has an entry V(s) or every state-action pair (s,a) has an entry Q(s,a). When the Markov decision process is large, we will have too many states and actions to store in memory and it is too slow to learn the value of each state individually. For instance, in Computer Go, we use 106 parameters to learn about 10170 positions[11], [12]. Instead of having a lookup table with explicit values for all the (state, action) space, the idea is to use a function approximator like a neural network that will replace this lookup table. Therefore, we will estimate value functions with function approximation Vˆ ( s , w ) ≈ Vπ ( s ) or qˆ( s ,a, w ) ≈ q1 where π indicate the neural network. The gain is that it allows generalizing from seen states to unseen states and reuse reinforcement learning framework (MC, TD learning,…) to update the weights w [13]. With the breakthrough made in deep learning in computer vision, we won’t be only using a Feed-forward neural network to approximate the value functions used in RL, but also a convolutional neural network that allows to get ride off hand-engineered visual features, and directly capture the environment visual state. Optionally, we can also use a recurrent neural network to keep in memory the relevant events during the agent life cycle, which can help to get an optimal experience.

3. Preliminary: Deep Learning

Deep learning is a branch of machine learning based on deep (> 2 hidden layers) and wide (many input/hidden neurons) neural networks, that model high-level abstractions in data, based on an architecture composed of multiple non-linear layers of neurons. Each neuron of the hidden layers performs a linear combination of its inputs and applies a non-linear

Fig. 6. neural network learns to separate classes with complex curves, thanks to the hierarchy of layers 24

Articles

N° 3

2018

function (Relu, Softmax, Sigmoid, tanh, …) to the result, which allows neurons from the next layer to separate classes with a curve (hypercurve/hyperplane) and no more with a simple line (see Fig. 6), thus, hidden layers learn hierarchical features. The deeper the layers, the more complex the learned features are [14].

3.1. Backpropagation and Gradient Descent

Unlike machine learning where features are crafted by hand, with deep learning, features are automatically learned to be optimal for the task. To achieve the process of learning, DL uses cost/loss function like the mean square error MSE or Cross entropy CE (19): • MSE Loss:

= LMSE

• CE Loss: LCE =

1 2

[target( y ) − prediction( yˆ )]

(19)

∑[target( y ) ∗ log( prediction( yˆ ))]

We use these losses to measure how well the neural network performs to map training examples to correct output (in the classification case), and then tweak his parameters (weights and biases) using backpropagation processes based on gradient descent (GD) optimization methods [15] that finds the minimum error: • Batch gradient descent: calculate gradient for the entire training dataset to update parameters • Stochastic gradient descent (SGD): calculate the gradients for each training sample xi of the dataset • Mini-batch gradient decent: tradeoff of the two methods, mini-batch sizes range є [50, 256] (can vary).

3.2. Learning Rate

Learning rate is a hyper-parameter used by GD methods to control the adjustment rate of the network’s weights with respect to the loss gradient. The learning speed is slow when the rate is low, but can diverge when the rate is too high, the most popular learning rates are: • Momentum [16]: accelerates SGD convergence in the relevant direction while reducing oscillations, by adding a parameter γ (usually 0,9) of the updated vector of the previous step to the current update vector. Vt = γ.Vt-1 + η.∇θ L(θ, xi) and θ = θ − Vt (20)

where θ is the vector that represents the network’s parameters and L is the loss function • Adagrad [17]: adapts the learning rate to the parameters, by making larger updates for infrequent parameters (small historical gradients) and smaller updates for frequent ones (bigger historical gradients).

η

θt += θt , i − 1, i

where g= t ,i

Gt , ii +

AC methods are considered part of TD methods since the critic is an implementation of the TD(0) algorithm and it is updated following the same rule: • Actor: policy function, produces the action for a given input “current state” s of the environment • Critic: value function, criticizes the actions made by the actor. Input information obtained through sensors (state estimation), and it receives rewards

VOLUME 12,

gt , i

∑ ∇θ L(θ ) i

i

i

2

(21)


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

• RMSprop [18] (Root Mean Squared): is an adaptive learning rate method, an extension of Adagrad proposed by Geoffrey Hinton (γ = 0.9 , η = 0.01)

η

E[ g2 ]t +

gt ,

θt +1,=i θt −

E= [ g2 ]t γ E[ g2 ]t −1 + (1 − γ )gt2 (22)

• Adadelta [19]: is an improvement of Adagrad that prevent learning rate from convergence to zero with time. It restricts the accumulated past gradients to only a recent time window of fixed size.

E  g2  +

θt += θt − 1

E  ∆θ 2  + t

t

gt ,

E= [ g ]t γ E[ g ]t −1 + (1 − γ )g , 2

2

2 t

E[∆θ 2 ]t = γ E[∆θ 2 ]t −1 + (1 − γ )∆θt2

(23)

• Adam [20] (Adaptive Moment Estimation): computes adaptive learning rates for each parameter. It keeps an exponentially decaying = β1mt −1 + (1 − β1 ) gt, average of past gradients m t similar to momentum. It stores both exponentially decaying average of past gradients and squared gradients like Adadelta.

t

E  g2  + t

θt += θt − 1

E  ∆θ 2  +

gt ,

E= [ g2 ]t γ E[ g2 ]t −1 + (1 − γ )gt2 ,

E[∆θ 2 ]t = γ E[∆θ 2 ]t −1 + (1 − γ )∆θt2 (24)

3.3. Hyper-Parameters Optimization

For DL, Hyper-parameters include the number of layers, the size of each layer, nonlinearity functions, weights initialization, decay term, learning rate, loss function, and input batch size. Optimization is done by measuring performance on independent data set and choose the optimal ones that maximize this measure, Most known Optimization algorithms are Grid search, Random search [21], Bayesian optimization [22], Gradient-based optimization [23], Genetic algorithms.

3.4. Neural Networks and Overfitting

Neural networks fight overfitting by applying commonly used approaches like validation data, data augmentation or early stopping (during training phase), in addition, it uses two main methods that became widespread: • Dropout method applied at every forward pass during the training process, by randomly dropping nodes and their connections from hidden or input layers (with the same probability), which prevent

N° 3

2018

the network from becoming sensitive to the weights of nodes and make it more robust. • Regularization using batch normalization that normalizes the inputs of each layer before applying the activation function, in order to have a mean output activation of zero and standard deviation of one.

3.5. Neural Networks Main Types

In this section, we will give a brief outline of three types of DL used in DRL: feed-forward neural network, convolutional neural network, and recurrent neural network, to introduce thereafter, in the next section, the deep reinforcement learning concepts.

3.5.1. Feed-forward Neural Network A feedforward neural network [24] (Multilayer Perceptron) is a non-linear artificial neural network where the information moves in only one direction, it solves classification problems and is composed of three main parts: the input layer, N hidden layers, and an output layer. Each layer can contain a given number of neurons. Neurons of hidden & output layers use non-linear activation function, to distinguish data that is not linearly separable, for this purpose, we use mainly Relu or sigmoid functions. The learning is carried out through minimization of the loss function, using cross-entropy or mean square error functions. An appropriate decaying learning rate is used to avoid local minima issues and backpropagation of the error to change connection weights using gradient descent algorithm (most of time Stochastic gradient descent or Mini-batch gradient descent) in a way to get the best fit values of those weights that will lead to an optimal error.

3.5.2. Convolutional Neural Network Convolutional neural network (CNN, or ConvNet) [25], [26] is a class of deep feed-forward artificial neural network that has successfully been applied to analyzing visual imagery. CNN is used in supervised learning for classification and object recognition/detection purposes, in unsupervised learning for image compression and image segmentation, and finally as visual features extractor in deep reinforcement learning. CNN is composed of four basic components : • Convolutional layers: the layer is no more fully connected like in feed-forward nets, instead, it learns 2D square-shaped matrix of neurons called filters (or kernels, Eg. 9 neurons for a kernel of 3×3 pixels), that scans the whole image searching for a pattern (localized feature), by applying effects such as blurring, sharpening, outlining, embossing, etc., to extract visual features. Each neuron of a kernel in the hidden layer will be connected to a small region (E.g. 3×3 pixels) of the input image (Ex. 200×200 pixels) given by the previous layer, called the local receptive field. Each kernel leverage these ideas : – 2D Convolution: Convolution is an image processing operation that is a weighted multiplication between the image matrix Articles

25


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

representation i with the kernel’s matrix (filter k of size Nk*Nk) to extract visual features from the image, by generating an output image iconv called feature map:

Nk −1

Nk −1

xk 0= yk 0 =

iconv ( x , y ) =

i( x − x k , y − yk )* k( x k , yk )

(25)

Two additional parameters for 2D convolution: o Zero Padding: Put zeros on the image border to allow the convolutional output size to be the same as the input size image. o Strides: How many pixels the kernel window will slide at each step while scanning the image.

N° 3

2018

amount of translation invariance. For instance, each unit of pooling layer summarizes a region of N×N neurons in the previous layer (ex. 3×3). There are several implementations of pooling like Average pooling which calculates the average value of the N×N matrix and Max-pooling (26) which is the most common and takes the max value of the N×N matrix and Mixed Pooling.

imax −pool ( x , y ) = max( x , y )∈region NxN i( x , y )

(26)

No learning takes place on the pooling layers. With back-propagation, only convolutional layers are concerned and we do not use pooling when we want to “learn” some object specific positions like in reinforcement learning. • Fully connected layer: is a normal feed-forward network layer that makes the connection between each input dimension (pixel location) and each output class, mixes signals received from feature learning layers and decides on classification based on the whole image • Normalization layer: apply batch normalization [27] on input and hidden layers to rescale the input data xˆ =

x − E( x )

, which helps to avoid Var( x ) vanishing/exploding gradient descent problem and to have deeper a network.

Fig. 7. Feature maps are the result of the convolution of two matrices (image and kernel)

26

– Parameter sharing: neurons of the same kernel share the same weights – Local connectivity: each input neuron in kernels only receives input from a small local group of the pixels in the input image called local receptive field by Cutting connections to the other pixels. input neurons represent overlapping receptive fields that form a complete map of visual space. – Feature extraction: each kernel can detect just a single kind of localized feature. So, if we want to look for 10 different patterns we must have 10 kernels in the convolutional layer, each one looking for a particular pattern on the image. – Hierarchical features learning: inspired by the organization of the animal visual cortex, multi-convolutional layers network allow to learn hierarchical visual features, the deeper is the layer, the more complex is the detected feature. • Pooling layers: are used immediately after convolutional layers to shrink the output image using non-linear down-sampling and add to some Articles

3.5.3. CNN Improvements and CapsNet The first functional Conv net was Lenet [28] implemented by Yann Lecun in 2006. Then came the AlexNet [29] in 2012 a deeper and much wider version (5 Convolutional + 3 Maxpooling + 3 Fully­‐connected) of the LeNet, which integrate RELU (Rectified Linear Unit) activation function and Reduce the over-fitting, by using a Dropout layer after every FC layer. In 2014 VGGNET [30] adapts more layers (16 Convolutional + 5 Maxpooling + 3 Fully­‐connected) and lower dimension for convolution filters are 3×3 (instead of Instead of the 9×9 or 11×11 filters for AlexNet). In the same year, Google came out with GoogLeNet (22 Convolutional layers) [31], which reuses 1x1 convolutions, introduced by NiN [32] to perform dimensionality reduction, and bring in the new concept of inception module (see Fig. 8), which allow CNN network to use many kernels dimensions (5×5, 3×3, 1×1)

Fig. 8. Inception module


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

and pooling methods in the same layer and choosing itself the best filter through backpropagation process. In 2015, ResNets [33] launched by Microsoft allows deeper neural networks (152 convolutional layers) by adding Identity connections to the traditional neural network, every two convolutional layers. The layers can start as the identity function and gradually transform to be more complex and more efficient. Even if CNN has made a great breakthrough in the computer vision domain, some drawbacks remain: • Orientation and relative spatial relationships between hierarchical features are not important, for example, the two representations below are both faces. • Pooling layers don’t have learnable parameters that learn how to pool in a dynamic way, that predict which low-level features (ex. nose, eyes, mouth) would be routed to the higher level features (ex. face). • Training needs a large amount of data to reach an acceptable accuracy. Capsule network [35] came as a solution to solve these problems with an architecture composed from an Encoder (1 Conv + 2 Capsule layers) and decoder (3 FC), and the use of two principles : • Activity vector & Equivariance: neuron are replaced by capsules (a group of neurons) and activity vector for object detection with additional equivariant features (orientation, lighting, …). Changes in object position lead to changes in the orientation, without any change in vector length and probability. • Dynamic routing: It replaces max pooling by adding an intermediate level a weight matrix Wij, that learns how to pool using dynamic routing of the capsule of layer N to the appropriate parent in the layer N+1, and encodes the relationship between the input features ui to get their predicted position relative to each other. The higher level capsules combine objects parts and encode their relative positions, so an object can be accurately detected not only from the presence of the parts but also their right locations.

ous inputs, like for video tracking, Image captioning, Speech-to-text, Translation, Stock forecasting, etc. RNN neuron uses its internal memory to maintain information about the previous inputs and update the hidden states accordingly, which allows them to make predictions for every element of a sequence. RNN maintain a state vector st = g (xt ,xt-1, xt-2, …, x2; x1) that contains data features with the history of all previous input sequences. RNN can be converted into a feedforward network by unfolding it over the time to many layers that share the same weights W:

Fig. 9. The two figures are the same for CNN

Fig. 11. RNN Cell vs LSTM Cell

3.5.4. Recurrent Neural Network RNN is a deep network that extracts temporal features while processing sequences of inputs like text, audio or video. It’s used when we need history/context to be able to provide the output based on previ-

3.5.5. Transfer Learning for Deep Learning Transfer learning [38], [39] (TL) is the ability of a system to apply knowledge and skills learned in previous tasks to novel tasks in new domains. In DL, we reuse pre-trained models as a starting initialization for

(Uxt + Wst ) and o = g(Vst ) (27) st =

xt: input at time t st: hidden state at time t (memory of the network) f: is an activation function (e.g, tanh() and ReLUs) U, V, W: network parameters (same across time) g: activation function for the output layer (softmax)

Fig. 10. RNN Cell unfolded

RNNs can learn to use the past information when the context is small, as that gap grows, RNNs become unable to learn to connect the information, due to Vanishing and Exploding gradient problem. LSTM (LONG SHORT-TERM MEMORY) [36], [37] is a variant of RNN, that came with a solution, by replacing simple RNN node by a complex cell composed of 4 layers, which allow to remove or add information to the cell state, judiciously regulated by three gates that conditionally decides what information to keep, what information to update, and what information to throw away: • Input Gate: selectively update cell state values by adding information about the new input. • Forget gate: forget irrelevant parts of previous states, depending on the relevance of the stored information. • Output Gate: select information from the current cell state and show it out.

Articles

27


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

new models to speed up training phase, which become a tuning phase where the new model is refined on the input-output pair data available for the new task. The TL process tends to work if the features are general and both original and target tasks are not too far. For CNN, in case training data of the new model is similar to pre-trained model data, so all CNN Layers are fixed and only the FC layers are trained, otherwise, only lower CNN Layers that contain basic features are fixed and higher CNN layers and FC layers are trained on the new model dataset.

3.6. Deep Learning Challenges

Even if Deep Learning has made great steps, it always requires large dataset, hence long training period and big clusters of CPUs and GPUs (graphics processing units). Moreover, the learned features are often difficult to understand. In addition, we have to pay attention to overfitting, notably, when the number of parameters greatly exceeds the number of independent observations. Finally, DL is sensitive to what we call adversarial attacks, when small variations in the input data, leads to radically different outcomes, causing a lack of robustness and making them unstable.

4. Deep Reinforcement Learning: Literature Review

28

As seen in paragraph III, traditional reinforcement learning use lookup table to store states and actions, which is too slow, since it learns the value of each state individually, and it is memory consuming, especially when we deal with large or infinite problems, and this is due to what Richard Bellman called the curse of dimensionality. The solution is to estimate value function using differentiable function approximators, trained using reinforcement learning algorithm. By leveraging deep learning algorithms, especially, convolutional neural networks, it became possible for RL algorithm not only act but to be totally autonomous and learn to see and act, a new technology is born called Deep Reinforcement Learning (DRL) (see DL, RL and DRL Timeline Fig. 12). We have three main types of DRL algorithms: • Value optimization: the algorithm optimizes the Value function V or Q, or the advantage function A. • Policy optimization: the algorithm optimizes the policy directly function π(θ) representing the neural network. • Actor-critic incorporates the advantages of each of the above, by learning value function with Implicit policy: – Policy gradient component “Actor” which calculates policy gradients – Value function component “Critic” that observes the performance of the actor and decides when the policy needs to be updated and which action should be preferred In the following section, we will have an outline on the Value optimization and Actor-critic algorithms (see Fig. 13), and try to understand their mechanisms and functioning. Articles

Fig. 12. DL, RL and DRL Timeline

Fig. 13. Deep Reinforcement Learning algorithms

4.1. Value Optimization Algorithms 4.1.1. Deep Q-Learning (DQN in detail) Deep Q Learning [4] is the first application of Q-learning to deep learning, performed by Google DeepMind in 2015, it succeeded to play 2600 Atari games at expert human level. DQN is a concentrate of technologies that uses many tips and tricks: • Tricky Architecture network: in the standard Q-learning algorithm, the input is composed of the state s and the action a, which will require a separate forward pass to compute Q-value Q(ai) of each action ai. Instead, we will use the state s as the only input, with as many outputs as possible actions ai. Therefore, the network will generate a Q-value probability for each available action, immediately with a single forward pass. • The neural network as a function approximator: three convolutional layers to detect visual features and to learn a hierarchical representation of the state space + two fully connected layers to estimate Q values from images, pooling layers is not used in DQN because we want CNN to be sensitive to the location of objects in the image. • 3D convolution: process a 2D convolution of the four frames of the input, then average them all. Frame skipping [40] (see Fig. 14): as an initial input we have a video stream of 30 screenshots/s 210×160×3 pixels of 128 colors, which we crop, shrink and turn into greyscale to have 84×84. But, processing all the 30 image/s of the video stream is not really relevant and also needs more computation and time, so the trick is to take only 2 consecutive frames frame of each N frame and skip the others, and for these 2


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

frames we apply the component-wise maximum function to get 1 frame: FrCW (i,j)=max(Fr1(i,j), Fr2(i,j)) •

Fig. 14. image processing of the input video, before feeding the DQN • Phi length parameter: to help the network to Detect the motion and catch speed information, we stack a number of frames “Phi length” of a history to produce the input of the DQN network, most of time Phi length = 4 or 5. • Target Network (see Fig. 15): at every training step, the DQN’s values must shift due to backpropagation that changes the network’s weights, but shifting constantly the set of values to adjust the network will destabilize it, which will fall into feedback loops between the target and estimated Q-values. The idea is to use a separate network to estimate the target-Q values that will be used to compute the loss for every action:

Q( S t , At ) ← Q( S t , At ) + α {Rt +1 +

γ max a [Q( St +1 , a)] − Q( St , At )}

1 (28) {Rt +1 + γ max a [Q( S t +1 , a)] − Q( S t , At )}2 LMSE = 2 This target network has the same architecture as the function approximator but with fixed weights, every T steps (ex. 1000), weights from the Q network are copied to the target network, which provides more stability to the DQN. An improvement of this update has been applied by using “soft” target updates [41], rather than directly copying weights, Target network weights slowly track the learned networks: θ ← τθ + (1 − τ )θ’ with τ << 1. • Action Repetition: define the granularity at which agents can control gameplay, by repeatedly executing a chosen action A for a fixed number of time steps k (instead of every frame), the last action is repeated on skipped frames. Computing the action once every k time steps and hence operate at higher speeds, thus achieving real-time performance. Two modes can be used, Static frame skip rate where Action output from the network is repeated for a fixed number of frames regardless of the current state, and Dynamic Frame skip which

N° 3

2018

is an improvement [42] of the first mode which makes the frame skip rate a dynamic learnable parameter, choose the number of times an action is to be repeated based on the current state. Clipping Rewards [–1, 1]: due to the high variance of score from game to game in ATARI, all positive rewards are fixed to 1 and all negative rewards to −1, leaving 0 rewards unchanged, this technic limits the scale of the error derivatives and makes it easier to use the same learning rate across multiple games, but the major drawback is that Agent doesn’t differentiate between rewards of different magnitude. Experience replay: DQN suffers from 2 main problems, the first is that in online learning, data are not i.i.d, samples frames arrive in the order, so they are highly correlated, which leads the network to overfitting and failure to generalize properly, the second concern Catastrophic interference [43] where Neural Network abruptly forgets what was previously learned when learning new things. To address those issues, instead of learning online, by updating Network from the last transition, we store agent experience (st,at,rt,st+1) in replay memory D, then we train our network on random mini-batch of transitions (s, a, r, s′) as input, which are sampled from the replay memory D. Experience replay break Similarity of subsequent training samples that might drive the network into a local minimum and solves the challenge of ‘data correlation’ and ‘non-stationary data distributions. No-ops vs human starts: two modes are possible to initialize and populate the Experience replay memory: First, we have the no-ops mode, where actions are provided randomly at the beginning, until the Memory Replay is full enough to sample from it, and second, we have the human start mode, where actions are provided by a human user at the beginning (an expert), until the Memory Replay is full enough to sample from it. This last mode gives the network a more efficient initialization that helps to accelerate learning. Actions Selection: for Exploration vs Exploitation dilemma, DQN uses ε-greedy Approach which forces the non-greedy actions to be tried (exploration) with no preference for nearly greedy or particularly uncertain ones (chooses equally among all actions). ε is the probability of exploration (typically 5 or 10%). Most of the times ε decay through time, example: ε =ε min + (ε max − ε min ).e − λt , where λ controls the speed of decay.

Fig. 15. DQN global architecture Articles

29


Journal of Automation, Mobile Robotics & Intelligent Systems

4.1.2. GORILA GoRiLa [44] is a General Reinforcement Learning architecture and a massively distributed and parallelized version of the DQN algorithm, achieved by introducing parallelization along three axes: • Actors: Gorila supports Nact actors operating in parallel on Nact instantiations of the same environment. Each actor’s experience can be stored in local/global memory. • Learners: Gorila supports Nlearn concurrent learners sample experience from the local or global store. Learners apply RL (DQN) to a replica of Q-network to generate gradient gi that updates master Q-net • Parameter server: Nparam master nodes Servers maintains a distributed Q-net(θ) splitted across the Nparam servers, they receive learner’s gradient & applies appropriate updates to the subset of θ and periodically sends an updated copy of the Q-net to each learner. 4.1.3. Deep Recurrent Q-Network(DQRN) DQRN [45]: A DQN agent can only see its closest area. by augmenting DQN with Recurrent neural nets and replacing DQN’s last fully connected layer with recurrent LSTM layer of the same dimension, DRQN agent remembers the bigger picture and where things are, in fact, LSTM provides a selective memory of past game states allowing to improve the agent experience and efficiency. With this LSTM layer, the agent receives only one frame at once from the environment, and thanks to the hidden state of the LSTM, it can change its output depending on the temporal pattern of observations it receives.

4.1.4. Double DQN Double DQN [46]: Being very noisy, DQN tends to overestimate action values as the training progresses. Due to the max term in the Bellman equation, the highest positive error is selected and this value is subsequently propagated further to other states. To overcome this issue, Double DQN uses two function approximators, Network QA and Network QB, one for selecting the best action and the other for calculating the value of this action; the two networks are symmetrically updated by switching their roles after each training step of the algorithm (see Fig. 16). By decoupling the maximizing action from its value, we can eliminate the maximization bias.

Fig. 16. Double DQN use 2 Networks QA and QB that switch their roles after each training step

30

4.1.5. Prioritized Experience Replay (PER) PER [47]: Neuroscience has shown that the brain “replays” the past experience during awake resting Articles

VOLUME 12,

N° 3

2018

or sleep, and more frequently sequences which are linked to the reward and to unexpected transition that have largest TD-error which have the highest opportunity of learning progress. PER increase the replay probability of transitions with the highest |TD-errors|, by changing the sampling distribution, and then store experience in priority queue ranked using the criterion TD-error.

4.1.6. Dueling DQN Dueling DQN [48]: the goal is to produce separate estimations of state value V(s) which shows how good it is to be in any given state and advantage A(s,a), which shows much better it is, taking a certain action a in a state s, than was expected on average (see Fig.17). To achieve it, we use a single Q-net with 2 streams V and A (see Fig. 18). This decomposition allows a more robust estimate of state value by decoupling it from the necessity of being attached to specific actions. Dueling reuse also the Double DQN and PER principles.

Fig. 17. The relation between the Action-value Q(s,a), the state value V(s) and the Advantage A(s,a)

Fig. 18. Dueling DQN architecture 4.1.7. Noisy Nets for Exploration Noisy Nets for Exploration [49]: for tackling Exploration/Exploitation dilemma in RL, there are two most commonly used ways, either we introduce a decaying randomness in the choice of the action (ex. Epsilon greedy), or we punish our model for being too certain in its actions (ex. Softmax with temperature parameter τ), this 2 methods have their drawbacks, since they need to be adjusted to the environment and don’t take into account the current situation agent is experiencing. Noisy Nets came with a 3rd approach, by introducing a Gaussian noise function (σ,μ) that perturbs the last (fully-connected) layers of the network, with 2 ways: • Independent Gaussian Noise: every weight of noisy layer is independent and has its own µ and σ, learned by the model. • Factorised Gaussian Noise: we multiply 2 noise vectors, which respectively have the length of the input and output of the noisy layer, the result is used as a random matrix, which is added to the weights.


Journal of Automation, Mobile Robotics & Intelligent Systems

4.1.8. Rainbow RAINBOW [50] is made by combining the following Improvements in Deep Reinforcement Learning, Double Q-learning, Prioritized replay, Dueling networks, Multi-step learning, Distributional RL and Noisy Nets. A ranking based on the degree of influence that has been made, by removing elements one by one from Rainbow. Experience shows that Prioritized replay and multi-step learning were the two most crucial components of Rainbow. Removing either component caused a large drop in performance. Then comes, the distributional Q-learning ranked immediately below, but have no influence on early learning stage. In the third place, we have Noisy nets and dueling network and double Q-learning that haven’t much significant impact on the whole model.

Prioritized replay > Multistep >> Distributional > Noisy Nets >> dueling net > double DQN

4.2. Policy Optimization Algorithms (Actor-Critic) Policy optimization is RL techniques that aim to optimize a parameterized policies π(θ), represented by a neural network, with respect to the expected return by using 1st or 2nd order optimization methods. For Neural networks, gradient descent methods applied to the loss function, based on first order approximation, and used to update weights through backpropagation, reached their limits in term of performance. Optimization methods using Newton methods with second-order Taylor polynomial as a better approximation of the loss function and adopting various approximation of the Hessian H are explored as an alternative for more improvements (28):

= L(θ k + δ )

1 T δ B(θ k )δ + ∇L(θ k )T δ + L(θ k ) (28) 2

where B is an approximation of the hessian. However, the calculation of the Hessian approximation (Generalized Gauss-Newton matrix, Fisher information matrix, Hessian-free, …) remain complex and time-consuming, especially for high dimension space environment. The use of second-order optimizers like with Natural gradient descent algorithm, significantly reduces the number of iterations, with the high-quality curvature matrices, it passes from ~102 iterations, instead of 104 iterations with SGD (stochastic gradient descent). In the following section, we will have an overview of seven algorithms, of which five are first order: A3C, UNREAL, DDPG, PPO and ACER and two are second order: TRPO and ACKTR. 4.2.1. Advantage Asynchronous Actor-Critic Agents (A3C) Advantage Asynchronous Actor-Critic Agents (A3C) [51] is a DRL algorithm that relies on the following principles:

VOLUME 12,

N° 3

2018

• Asynchronous: by reusing Gorila parallelization and running multiple agents in parallel (see Fig. 19), each with its own copy of the environment, so their experiences are diverse, independent and not correlated, as result, we don’t need experience replay memory anymore.

Fig. 19. A3C network architecture • Generalized Advantage Estimation GAE [52]: by reusing Duel DQN principal, since we won’t be determining the Q values directly in A3C, we use the discounted returns R as an estimate of Q(s,a) to allow us to generate an estimate of the advantage: R = r + γV(s’) ~ Q(s,a) à A(s,a) = Q(s,a) – V(s)= r + γV(s’) – V(s), and using GAE to reduce variance, by taking exponentially weighted average λ:

n−1

Aˆ t( n ) =∑ γ k rt +k + γ nV ( st +n ) − V ( st ) and

k =0 (1) ˆ Aˆ t= At + λ Aˆ t(2) + λ 2 Aˆ t( n ) + …= δ t + γλ Aˆ tλ+1 (29)

λ

• Exploration: H is the entropy of the policy π is used as a mean of to improving exploration, by encouraging the model to be conservative regarding its sureness of the correct action: Hentropy(π) = -Σ(P(x) log(P(x)). H is the entropy of the policy π, which reflect the spread of action probabilities, the entropy will be high when we have similar probabilities, and will be low when we have a single action with a large probability. • Actor-critic: Each agent is sharing two networks: the Critic Net evaluates the present states using the value function V(s), while the Actor Net evaluates the possible values in the present state to make decisions using π(s). The global loss includes 2 parts: the value loss related to the predictions of the critic and the policy loss (which include H entropy) related to the predictions of the actor. The policy loss then combine the 2 loss in the Global Loss, with Lvalue is set to 50% to make policy learning faster than value learning: Hentropy(π) = -Σ(P(x) log(P(x)), Lvalue = Σ(R – V(s))² and Lpolicy = -log(π(a|s)) * A(s) – β*H(π)

= Lglobal

1 Lvalue − Lpolicy = 0.5 * Σ(R — V(s))² 2

– log(π(a | s)) * A(s) - β*H(π(a | s))

(30)

These two losses will be backpropagated into the neural network, and then reduced with an optimizer through stochastic gradient descent. Articles

31


Journal of Automation, Mobile Robotics & Intelligent Systems

4.2.2. Umsupervised Reinforcement and Auxiliary Learning (UNREAL) UNSUPERVISED Reinforcement and AUXILIARY Learning (UNREAL) [53]: the idea is to augment the on-policy A3C with off-policy auxiliary tasks to learn a better representation without influencing directly the main policy control. These tasks share the same network parameters [CNN, FC, LSTM], but with different outputs. The network is composed of 4 modules of which A3C is the main one: • A3C Module: is the main on-policy module that feeds the Experience replay memory, from which the auxiliary tasks get their inputs. • Pixel Control Module: it learns how your actions affect what you will see rather than just prediction, to change different parts of the screen, and how to control the environment. It’s based on the idea that changes in the perceptual stream often correspond to important events in an environment. Auxiliary policies Qaux produced using Deconvolutional [54] neural network (which was used first for image segmentation), are trained to maximize the change in pixel intensity of different regions of input. Auxiliary control loss LPC = ∑L(Qc ) c

• Reward Prediction Module: learn to predict future reward based on rewarding histories. Auxiliary reward prediction loss LRP is optimized from rebalanced replay data. • Value Function Replay: predicts the n-step return from the current state to promote faster value iteration. Replayed value loss LVR is optimized from replayed data. A global loss function:

LUNREAL = LA3C + λVR LVR + λPC LPC + λRP LRP (30)

With lVR, lPC, lRP are weighting terms on the individual loss components.

VOLUME 12,

2018

state and minimize separately their two losses LActor and LCritic and follow Estimates a deterministic target policy. DDGP re-use DQN tricks: • Experience replay buffer to solve the issue related to correlated data • Target Network, make copies (Q’,µ’) of the Actor and Critic networks (Q,µ)and soft updates to enable training stability: θ Q ′ ← τ .θ Q + (1 − τ )θ Q ′ and θ µ ′ ← τ .θ µ + (1 − τ )θ µ ′ with τ  1 • Exploration by adding noise to actor actions µExploration = ( st ) µθ ( st ) + Nt . Even if DDPG has shown good performance but it needs to tweak the step size manually, so as to fall into the right range (too small → slow – too large → overwhelmed by the noise, bad performance).

Fig. 21. DDPG network architecture

4.2.4. ACER (Actor-Critic with Experience Replay) ACER [55]: is a model-free, off-policy, Asynchronous multi-agent, continuous control algorithm with actor-critic architecture. It is the off-policy counterpart of the A3C, with the addition of Experience replay memory. ACER uses the Retrace(λ) [56], which is an off-policy, Multi-step, value-based RL algorithm that reweights samples with a truncated importance  π (as | x s )  sampling coefficient C s = λmin  1,  to esti µ(as | x s )  mate Qπ, and thus, ensure low variance and safe efficient updates :

Qk += 1 ( x , a ) Qk ( x , a ) +

π k (as |x s )  ×  µk (as |x s )  

32

N° 3

α k ∑ t ≥0 γ t ∏ 1≤s≤t λmin  1 ,

( r + γ Eπ ( Q ( x t

k

t +1

,.)) − Qk ( xt , at ))

(31)

4.2.5. TRPO (Trust Region Policy Optimization) TRPO [57] is a model-free, on-policy, continuous control algorithm that works for both discrete and continuous action space with actor-critic architecture. TRPO doesn’t support including noise (e.g.dropout) or parameter sharing (between policy and value Fig. 20. UNREAL network architecture function, or auxiliary tasks). TRPO use natural gradient algorithm [58] to choose automatically the right step to apply for updating the policy network, which was done manually in DDPG. ∞ 4.2.3. Deep Deterministic Policy Gradients (DDPG) TRPO uses an objective function= η(π θ ) η(π θ old ) + E s0, a0 ,…  ∑ t =0  ∞ t = η(π θ ) η(π θ old ) + E s0, a0 ,…  ∑ t =0 γ Aπ old ( st , at ) , which is the expected reDDPG (Deep Deterministic Policy Gradients)   [41] is an actor-critic, off-policy gradient RL algoturn of policy π in terms of the advantage Aπ over the rithm for continuous action space. It uses two neuold policy πold, and with MM algorithm principle, ral networks, one for the critic and one for the actor TRPO create and try to maximize a surrogate function which compute action predictions for the current L(π) which is a local first order approximation of η(πθ) Articles


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

π θ ( s|a) with an importance sampling term to reπ θ old ( s|a ) duce variance.

The objective function is optimized when the surrogate function is optimized.  π ( s|a) πθ old  L(π θ ) = Eπ old  θ A ( s | a)  π θ old ( s|a)  ∇θ L(π θ )|θold = ∇θη(π θ )|θold

Maximize L(πθ) under :

(

) ∑π

= DKL π θold , π θ

i

θold

(32)

 π θ (i )  (i )log  old  ≤ δ (33)  π θ (i )   

∇L(θ ) + λ F ∆θ = 0 ⇒ θ = θold −

θnew

1

λ

N° 3

F (θold )−1 ∇L(θold )

1 = θold + F (θold )−1 .∇θ Lθold (π θ )

λ

2018

(36)

TRPO resolved the step size problem but suffers from its extremely complicated computation and implementation, especially with FIM and CG.

4.2.6. PPO & PPO2 (Proximal Policy Optimization) PPO & PPO2 [59] get rid of the computations in TRPO created by KL divergence constraint during the optimization process, as it proposes a new surrogate objective function LCLIP(θ ) by clipping the probability ratio rt(θ ), which removes the incentive for moving rt outside of the interval [1 − ε, 1 + ε], it modifies TRPO’s objective function by adding a penalty that sanction large policy updates :

So that the approximation remains valid and also avoid dramatic decrease in performance due to large changes from the previous policy, TRPO limits the= size LCLIP (θ ) Et min(rt (θ )Aˆ t , clip[rt (θ ),1 − ε ,1 + ε ]Aˆ t  , of the update step of the policy network’s parameters, π (a |s ) by applying KL divergence constraint, that measures (37) with rπθ = θ t t π a s ( | ) the average distance between output distribution of old t t θ the old policy network π θ old and new policy network PPO switch between sampling data from the poliπ θ . KL constrain keep the step size within a “trust cy and performing several epochs of optimization on region” defined by δ, and allows modifying network the sampled data, while optimizing the policy. parameters unequally, each one changes according PPO2 is the GPU-enabled implementation of PPO to how much it affects the net output distribution rethat runs roughly 3X faster than the original version garding the KL constrain. So, KL divergence between of PPO. the two networks will be as high as the difference between the outputs probabilities.

(

)

Maximize L (π θ ) under , DKL π θold ,π θ = π

θ ( ) ∑π θ ( i ) log  π ( i )  ≤ δ i

old

old

θ

i

(34)

By using the 2nd order Taylor series approxima-

1 tion for the KL divergence : DKL (π θ , π θ +∆θ ) ≈ ∆θ T F ∆θ , 2 with F, is the fisher information matrix (FIM) as the Hessian, and the 1st order Taylor series of L(πθ) is L(θ ) ≈ L(θold ) + (θ − θold ).∇L(θold ) . So we a have a constrained problem to optimize, and then we turning it to an unconstrained one using Lagrangian multipliers method:

L(θold ) + (θ − θold ).∇L(θold ) and 1 (θ − θold )T F (θ − θold ) ≤ δ 2 1 T L(θ ) + (θ − θold ).∇L(θ ) + λ(θ − θold ) F (θ − θold ) 2 (35)

To minimize the quadratic function, we use conjugate gradient algorithm (CG) that allows to approximately solve the equation without forming the full FIM matrix, followed by a line search.

4.2.7. ACKTR (Kronecker-Factored Approximation) ACKTR [60] uses Natural gradient with K-FAC applied on the whole network (convolution layers and fully connected layers) [61], [62], which is a sophisticated approximation to the Fisher information matrix used in TRPO, to optimize both the actor and the critic. Combined with A2C architecture, where the two networks, Actor and Critic, share lower-layer representations but have distinct output layers to avoid instability during the training. • Actor: use natural gradient with KL divergence constrain to update the network within a trusted region, adopting the same approach of TRPO with Fisher matrix, conjugate gradient, and line search. The K-FAC. • Critic: least-squares loss using Gauss-Newton second-order approximation, the Gauss-Newton matrix G = E[JT J] where J is the Jacobian of the loss, is a positive semi-definite approximation of the Hessian and is equivalent to the Fisher matrix which allows applying K-FAC to the critic as well. A correction is used for the inaccuracies of the local quadratic approximation of the objective, by adding (λ + η)I a Tikhonov damping term to the curvature matrix FIM, before multiplying −∇L by its inverse, which corresponds to imposing a spherical trust-region on the update. Be a hidden layer k : si = Wiai–1, and ai = fact(si) fisher matrix for this layer under the approximation that activations and derivative are independent : Articles

33


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

 ∂L ∂L   ∂L ∂L  = F( i , j )( i ′ , j ′) E= a j′   E  a j ≈ ∂si ′   ∂si  ∂wij ∂wi ′j ′   ∂L ∂L  T T E(a j a j ′) E  =  E(aa ) ⊗ E([∇ s L][∇ s L] ) s s ∂ ∂  i i′  (38)

 ∂L  With = Ωk −1 Cov(a= ), Γ k cov   , in Kronecker  ∂s  Fk Ω k −1 ⊗ Γ k . In practice, a two difvectorized form= ferent Tikhonov damping terms are added to the Kronecker factors Wk–1 and Gk:

Fk′ = Ωk −1 + α i I ]⊗[ Γ k + β i I  = Ω ′k+1 ⊗ Γ ′k (39)

2018

achieve) from observed expert behavior (like using imitation learning to find the right policy) as in supervised learning (see Fig. 22).

Fig. 22. Inverse reinforcement learning process

(40)

Shaping reward must take into account the fact that positive rewards encourage to keep going to accumulate reward and avoid terminals unless they yield very high reward, while negative rewards push the agent to reach a terminal state as soon as possible to avoid accumulating penalties. If the staged reward function is becoming large and complex, this is a good sign you should consider using concept networks instead.

4.3.1. Credit assignment & Feedback Sparsity Reinforcement learning gives good results in many use cases and applications, but it often fails in areas where the feedback is sparse. Conceiving a reward function is a delicate task and generally, sparse discrete reward function is easier to define (e.g. get +1 if you win the game, else 0). However, sparse rewards also slow down learning, since the agent needs to take many actions before getting any reward, which is known as the credit assignment problem. To speed up reinforcement learning algorithms and avoid spending a lot of time in areas, that likely won’t help agent to achieve the assigned goal, it is usually mandatory to craft a continuous reward function, by shaping it smartly, depending on the environment and the goal to reach. Instead of having a sparse step function, we have a smooth continuous gradient function, which gives the agent information about the closeness to the goal. Reward shaping is done by replacing the original reward function R of an MDP M={S, A, P, γ, R} by R’ of transformed MDP M’={S, A, P, γ, R}, where R’=R+F with function F(s,a,s’): SxAxS → |R. To determine the right shape of the reward function F(s,a,s’), there are two relevant methods: • Craft function reward manually [63], [64] like in Robotic, where F become usually, a function of distance and time. • Use inverse reinforcement learning [65] by deriving a reward function (and the goals to

4.3.3. Complex Task An important issue in RL is the learning ability to solve complex tasks, the main approach is using the principle of “divide and conquer”, by using meta-learning principle, the goal is decomposed to a long chain of sub-goals, and learns to accomplish those sub-goals and recompose them, to define the overall solution. Many solutions have been proposed in that sense like:

Under the approximation that layers are independent:

0 Ω 0′ ⊗ Γ 1′ 0   = FNet  0 … 0   0 0 Ω L′−1 ⊗ Γ L′ 

 vec(Γ 1′−1 [∇W1 L]Ω 0′−1 )    −1 F= …   Net ∇L  vec(Γ ′−1 [∇ L]Ω ′−1 )  L WL L−1  

4.3. DRL challenges

34

N° 3

Articles

4.3.2. Slow learning DRL has well-performed in ATARI games and other real world tasks, but the pace of learning remain very slow, for instance, humans after 15 minutes tend to outperform DDQN after 115 hours. Many attempts has been made and are still made to bridge this gap, like the one-shot imitation learning [66], [67], whose goal is to learn in supervised mode, from very few demonstrations of any given task, and to be able to generalize to new situations of the same task, by learning to embed a higher-level representation of the goal without using absolute task and use transfer learning to communicate the higher level task, without retraining the model from scratch, another attempt has been made using Model-Agnostic Meta-Learning [68] where the agent called meta-learner trains the model or learner on a training set of large number of different tasks, so as to learn the common features representations of all the tasks, then, for a new task, the model with its prior experience provided by a good initialization (weights transfer) of its parameters, will be fine-tuned using the small amount of the new training data brought by that task with fewer number of gradient steps while avoiding overfitting that may happen when using a small dataset.


Journal of Automation, Mobile Robotics & Intelligent Systems

• Hierarchical Deep Reinforcement Learning [69] or H-DQN where meta-controller learns the optimal goal policy and provides it to the controller that learns the optimal action policy or sub-policy. The meta-controller works at a slower pace than the controller and receives external feedback from the environment and provides incremental feedback for the controller. • Concept network [70] where concepts are distinct aspects of a task that can be trained separately, and then combined using a selector concept (or metacontroller) to compose the complete solution.

4.3.4. Generalization and Meta-Learning Current AI systems excel at mastering a single skill with lower versatility level, the challenge is to generalize over unseen instructions and over longer sequences of instructions. For the last two years, a lot of research has been made on the meta-learning topic, whose goal is to make a model that better generalize. In optimization, we have the example of DeepArchitect [71] that allow to automatically choose the architecture and hyperparameters for complex spaces, in meta-learning we have the deep meta-reinforcement learning (RL²) that has been developed independently by Deepmind [72] and Openai [73], whose key ingredient in a Meta-RL system is a Recurrent Neural Network (RNN). The RNN-based agent is trained in supervised mode, to learn meta-policy that allows to exploit the structure of the problem dynamically and learn to solve new problems without retraining the model, but only by adjusting its hidden state instead of using backpropagation.

4.3.5. Variance & Biases Trade-off In traditional supervised learning we have: • Biased model generalizes well, but doesn’t fit the data perfectly (under-fitting) • high-variance model fits the training data perfectly but doesn’t generalize well for new data (overfitting) In RL, bias and variance measures show how close the reinforcement signal sticks to the true reward structure of the environment: • Bias: refers to good stability with inaccuracy for the value estimate. • Variance: refers to good accuracy with instability (noisy) for the value estimate. Assigning credit to an RL agent acting in an environment can be done with different approaches, each with different amounts of variance or bias, for example: • High-Variance Monte-Carlo Estimate: policies we are learning are stochastic because of a certain level of noise. This stochasticity leads to variance in the rewards received in any given trajectory • High-Bias Temporal Difference Estimate: By relying on a value estimate instead of a MonteCarlo rollout the stochasticity in the reward signal is reduced since the value estimate is relatively stable over time. However, we fall in another issue since the signal became biased, due to the fact

VOLUME 12,

N° 3

2018

that our estimate is never completely accurate. In addition, for DQN, Q-estimates are computed using the target network which is an old copy of the network, providing an older Q-estimates, with a very specific kind of bias. There is a number of approaches that attempt to mitigate the negative effect of too much bias or too much variance in the reward signal: • Advantage Learning (reduced variance): ActorCritic methods are used to provide a lower variance reward signal to update the actor. Aπ(st,at) = Qπ(st,at) -Vπ(st), indicates how much better the agent actually performed than was expected on average, with Q(s, a) Monte-Carlo sampled reward signal, and V(s) parameterized value estimate. The high variance of the actor is balanced by the low-variance feedback on the quality of the performance supplied by the critic. • Generalized Advantage Estimate: allows to balance between pure TD learning (bootstrapping method that add bias) and pure Monte-Carlo sampling (that add variance) by using a parameter λ. To produce better performance by trading off the bias of V(s) with the variance of the trajectory, we choose λ ϵ [0.9, 0.999]. ∞ Aˆ tλ = δ t + γλ Aˆ tλ+1 ⇒ ∑ k=0(γλ )k δ t +k =

λ  GAE λ= 0 : Aˆ t= δ t ⇒ TD Learning ⇒ ∞ λ = E λ 1= : Aˆ t ∑ k =0 γ k δ t +k ⇒ MC Sampling GA

(41)

• Value-Function Bootstrapping & Time Horizon: Bootstrapping allows estimation of the ValueFunction distribution using sampling methods. To make a compromise between Monte Carlo that uses all the episode steps for estimation and single-step TD methods that bootstrap, we act on the trajectories length to propagate the reward signal in a more efficient way. Time horizon corresponds to the number of steps of experience we collect before adding it to the experience buffer, it must be large enough to catch all the relevant behaviors within a sequence of an agent’s actions. When the time horizon threshold is reached before the end of an episode, a value estimate is used to predict the expected total reward from the agent’s current state. So, long time horizon leads to a less biased, but higher variance estimate and short time horizon leads to more biased, but less varied estimate. In cases where there are frequent rewards within an episode or episodes are extremely large, a smaller time horizon is more adapted.

4.3.6. Partial Observability Markov Decision Process (POMDP) In Full MDP case, the agent has access to all the information about the environment it might need in order to take an optimal action, but real world problems do not meet this standard. Environments that present themselves in a limited way to the RL agent Articles

35


Journal of Automation, Mobile Robotics & Intelligent Systems

are referred to as Partially Observable Markov Decision Processes (POMDPs) [74]. In a POMDP, the agent receives information that is spatially and temporally limited, so it partially describes the current state St, therefore, it is replaced by the observation Ot. The agent then attempts to predict what it has not sensed, by using other available information. The main trick used to deal with POMDP is to augment the DRL net with an RNN/LSTM layers [45] that we position between Convolutional layers and fully connected layers, to keep in memory a history of the visual features that compensate for the lack of information. The Markov property is broken since the agent is no more memoryless.

4.4. DRL & RL Applications in the Industry

In the industry, DRL applications are diverse [75], depending on the purpose of the use, it can be split into three categories, first usage is for control like in robotics, Factory automation, and Smart grids, then second usage is for optimization like Supply chain, Demand forecasting, Warehouse operations optimization (picking), and finally for monitoring and maintenance like Quality control, Fault detection and isolation and Predictive maintenance. DRL lifecycle is composed of two phases, training phase, where we use a rough simulation that run fast and when it reaches the accuracy threshold desired, we switch to a higher fidelity simulation and retrain the model until it gets the targeted accuracy. For the deployment phase, the trained model is used in ground truth and tuned on physical equipment in the real world.

Fig. 23. The development cycle of DRL in the industry Below some simulators used for RL/DRL in industry, see table 1 below: Table 1. Most known RL/DRL simulators Self-Drive-Fly TORCS/Speed Dreams DeepDrive Udacity Simulator Unreal Engine simulator Unity XVEHICLE FlightGear AirSim

36

Articles

Mechanic & Electric Matlab Simulink Sinumerik Wolfram SystemModeler OpenModelica

Robots & Drones Gazebo MuJoCo RobotStudio RobotExpert Ardupilot NVIDIA Robotics simulator

VOLUME 12,

N° 3

2018

Self-Drive-Fly

Mechanic & Electric

Robots & Drones

Logistics

Medical & Chemistry

Security & Networking

CHEMCAD ParmEd PharmaCalc SOFA SimTK ArtiSynth SimCyp

VIRL NeSSI2 NS3 CupCarbon INET Conflict Simulation Laboratory

Anylogistix Simutrans OpenTTD RinSim MovSim

4.6. Deep RL Hardware Neural network tasks like preprocessing input data, training the model, storing the trained model and deploying it, require intense hardware resource, and above all, training task is by far the most time and effort consuming, with the multiple forward and backward passes that are essentially matrices multiplications. The number of these operations can explode with a large network, for instance, VGG16 [30] a CNN of 16 hidden layers has ~140 million parameters (weights & biases). To reduce the time of training, we can parallelize these computations. Thus, we often have the reflex to think about the CPU, however, the latter has few cores (e.g. 24 cores for INTEL E7-8890 v4) with a huge and a complex instruction set that handles every single operation (Calculation, memory fetching, IO, interrupts, …). But the GPU contains by far, much more cores (e.g. 5120 for Nvidia Titan V and 4096 for AMD Radeon Vega 64), each of these cores has simpler instruction set and is specialized and optimized to do more calculation. In addition, Nvidia and AMD simplify the usage of GPU [76] for deep learning frameworks, by releasing and supporting high-level languages Cuda and OpenCL supported and included in these frameworks, helping researchers to write more efficient programs for their algorithms. Since GPUs [77] are optimized for video games and not deep learning, they have some downsides like its higher power draw. Here is 2 alternatives to GPU, The first is FPGA which stands for Field programmed gate array, it’s a highly configurable processor that allows tweaking the chip’s function at the lowest level, it can be tailored specifically for deep nets application, so it consumes much less power than GPU, but they need highly specialized engineer to be configured. The second is called ASIC (Application-Specific Integrated Circuit) that is custom-designed chips optimized for deep learning, for instance, those made by Google named TPU (tensor processor unit), and the Nervana Engine built by Intel. To summarize, in terms of performance and power efficiency we have: ASICs >> FPGA > GPU >> CPU.

4.7. Deep RL Frameworks

General framework libraries that can be used to develop deep RL algorithms are Gym and Universe of


Journal of Automation, Mobile Robotics & Intelligent Systems

OpenAI, DeepMind Lab of Google, Project Malmo of Microsoft. Regarding the Deep Learning frameworks, we have the most known ones mentioned in the following table: Table 2. Most known RL/DRL simulators Dev Tools

Supporters

Tens or flow

Google, Uber

Keras

Goosle, Kassle

Caffe2

Facebook, Twitter

Torch PyTorch CNTK

Paddle

Deep Learnings MXNet

Xeon

Power AI

Pros Community, ressources, documentation, CNN++, TensorBoard for visualization. Good for huge network

Slowness

Fast implementation and execution

RRN&GAN

Community, documentation, compatibility with tensorflow, CNTK and Theano as hish level API.

Facebook, Community, Twitter, Nvidia documentation, Fast implementation & execution, CNN++. Microsoft Baidu

RNN++ and NLP

JAVA Community

Use of Java, massively distributed

INTEL (Nervana)

Fast execution

AMAZON, Microsoft IBM

Cons

VOLUME 12,

N° 3

2018

• Powerful hardware (GPU & TPU) and high-level frameworks (Cuda & OpenCL) make it possible to achieve a significant gain in time and efforts. However, DRL algorithms still suffer from the same drawbacks inherited from deep learning, so we still suffer from long training time, slow learning pace, catastrophic forgetting (of old tasks when training on new tasks), opacity of Black-box algorithms (since the chain of reasons for the action choice is not humanly-comprehensible). In addition to this, we have RL drawbacks like credit assignment problem, reward sparsity, variance and bias tradeoff, complex task management, complexity of meta-learning mechanisms and partial observability of the environment. All these weak points are opportunities for improvement, and great challenges to overcome, which open widely the field of research for new ideas and breakthroughs that will one day lead to realizing the dream of seeing a perfectly autonomous and human-like intelligence in the real world.

AUTHORS

community support community support

CNN++, massively distributed

NLP

Compatibily with IBM Watson

community support

community support

5. CONCLUSION Since the birth of Artificial Intelligence in the 50s, researchers in AI, machine learning, cognitive science, and neuroscience have wanted to build systems that learn, think and act like humans. Deep reinforcement learning has made great steps towards the creation of artificial general intelligence (AGI) systems that can interact and learn from the environment, which leverage three main points: • Great idea and concepts: many of them were discussed in this paper like prioritized replay memory, Multi-step learning, reward shaping, imitation learning, meta-learning/generalization, Natural gradient with K-FAC. • High-level libraries/API (Keras, Tensorflow, Pytorch) and simulation environment (Openai gym, Universe and Mujoco) that provide excellent testbeds for RL agents and simplify development and research.

Youssef Fenjiro* – National School of Computer Scien­ce and Systems Analysis (ENSIAS), Mohammed V University, Rabat, Morocco. Email: fenjiro@gmail.com. Houda Benbrahim – National School of Computer Science and Systems Analysis (ENSIAS), Mohammed V University, Rabat, Morocco Email: benbrahimh@hotmail.com. *Corresponding author

REFERENCES

[1] “Sutton & Barto Book: Reinforcement Learning: An Introduction.” Available at: http://incompleteideas.net/book/the-book-2nd.html  [2] Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition. ISBN13: 978-0136042594  [3] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning”, Nature, vol. 521, no. 7553, May 2015, pp. 436– 444. DOI: 10.1038/nature14539.  [4] V. Mnih et al., “Human-level control through deep reinforcement learning”, Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. DOI: 10.1038/nature14236.  [5] A. D. Tijsma, M. M. Drugan, M. A. Wiering, “Comparing exploration strategies for Q-learning in random stochastic mazes”. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 2016, pp. 1–8. DOI: 10.1109/SSCI.2016.7849366.  [6] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” Jul. 2012. ArXiv:1207.0580 [Cs]. Articles

37


Journal of Automation, Mobile Robotics & Intelligent Systems

38

[7] R. Sutton, “Learning to Predict by the Method of Temporal Differences,” Mach. Learn., vol. 3, pp. 9–44, Aug. 1988. DOI: 10.1007/BF00115009  [8] K. M. Gupta, “Performance Comparison of Sarsa(λ) and Watkin’s Q(λ) Algorithms,” p. 8. Available at: https://pdfs.semanticscholar. org/ccdc/3327f4da824825bb990ffb693ceaf7dc89f6.pdf.  [9] G. A. Rummery, M. Niranjan, “On-Line Q-Learning Using Connectionist Systems,” 1994, CiteSeer. [10] Yuji Takahashi, Geoffrey Schoenbaum, Yael Niv, “Silencing the Critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an Actor/ Critic model”, Front. Neurosci., 15 July 2008, pp. 86–99. DOI: 10.3389/neuro.01.014.2008l [11] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search”, Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. DOI: 10.1038/nature16961 [12] S. Hölldobler, S. Möhle, A. Tigunova, “Lessons Learned from AlphaGo,” p. 10. S. H o ̈ lldobler, A. Malikov, C. Wernhard (eds.): YSIP2 – Proceedings of the Second Young Scientist’s International Workshop on Trends in Information Processing, Dombai, Russian Federation, May 16–20, 2017, published at http://ceur-ws.org. [13] David Silver, Deepmind, “UCL Course on RL” [14] Luis Serrano, A friendly introduction to Deep Learning and Neural Networks. https://www. youtube.com/watch?v=BR9h47Jtqyw [15] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv:1609.04747 [cs], Sep. 2016. [16] N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Netw., vol. 12, no. 1, pp. 145–151, Jan. 1999. DOI: 10.1016/S0893-6080(98)00116-6. [17] J. Duchi, E. Hazan, Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”, JMLR, vol. 12(Jul), 2011, pp. 2121−2159. [18] “Rmsprop: Divide the gradient by a running average of its recent magnitude – Optimization: How to make the learning go faster,” Coursera. [19] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” ArXiv1212.5701 Cs, Dec. 2012. [20] D. P. Kingma, J. Ba, “Adam: A Method for Stochastic Optimization,” ArXiv1412.6980 Cs, Dec. 2014. [21] J. Bergstra and Y. Bengio, “Random Search for Hyper-parameter Optimization”, J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012. ISSN: 1532-4435 [22] J. Snoek, H. Larochelle, R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”, p. 9. https://arxiv.org/ pdf/1206.2944.pdf [23] Yoshua Bengio, “Gradient-Based Optimization of Hyperparameters.” DOI: 10.1162/089976600300015187. [24] M. Sazli, “A brief review of feed-forward neural networks”, Commun. Fac. Sci. Univ. Ank., vol. 50, pp. 11–17, Jan. 2006. DOI: 10.1501/0003168. Articles

VOLUME 12,

N° 3

2018

[25] Salman Khan, Hossein Rahmani, Syed Afaq Ali Shah, A Guide to Convolutional Neural Networks for Computer Vision. DOI: 10.2200/S00822ED1V01Y201712COV015 [26] Hamed Habibi Aghdam, Elnaz Jahani Heravi, Guide to Convolutional Neural Networks A Practical Application to Traffic-Sign Detection and Classification, Springer 2017. [27] S. Ioffe, C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” ArXiv1502.03167 Cs, Feb. 2015. [28] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-Based Learning Applied to Document Recognition”. In: Proceedings of the IEEE, 1998, pp. 2278–2324. DOI: 10.1109/5.726791. [29] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems, 25, 2012, pp. 1097–1105. DOI: 10.1145/3065386. [30] K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ArXiv1409.1556 Cs, Sep. 2014. [31] C. Szegedy et al., “Going Deeper with Convolutions,” ArXiv1409.4842 Cs, Sep. 2014. DOI: 10.1109/CVPR.2015.7298594. [32] M. Lin, Q. Chen, S. Yan, “Network In Network,” ArXiv1312.4400 Cs, Dec. 2013. [33] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition,” ArXiv151203385 Cs, Dec. 2015. DOI: 10.1109/CVPR.2016.90. [34] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, “Densely Connected Convolutional Networks,” ArXiv1608.06993 Cs, Aug. 2016. [35] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic Routing Between Capsules,” ArXiv1710.09829 Cs, Oct. 2017. [36] S. Hochreiter and J. Schmidhuber, “Long Short-term Memory,” Neural Comput., vol. 9, pp. 1735–80, Dec. 1997. DOI: 10.1162/neco.1997.9.8.1735 [37] “Understanding LSTM Networks”. Colah’s blog. 27/08/2015. https://colah.github.io/posts/2015-08-Understanding-LSTMs/. [38] J. Yosinski, J. Clune, Y. Bengio, H. Lipson, “How transferable are features in deep neural networks?,” ArXiv1411.1792 Cs, Nov. 2014. [39] N. Becherer, J. Pecarina, S. Nykl, K. Hopkinson, “Improving optimization of convolutional neural networks through parameter fine-tuning”, Neural Comput. Appl., pp. 1–11, Nov. 2017. DOI: 10.1007/s00521-017-3285-0. [40] “Frame Skipping and Pre-Processing for Deep Q-Nets on Atari 2600 Games”, Daniel Takeshi blog, 25/11/2016 https://danieltakeshi. github.io/2016/11/25/frame-skipping-andpreprocessing-for-deep-q-networks-on-atari2600-games/. [41] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” ArXiv1509.02971 Cs Stat, Sep. 2015. [42] A. S. Lakshminarayanan, S. Sharma, B. Ravindran, “Dynamic Frame skip Deep Q Network,” ArXiv1605.05365 Cs, May 2016.


Journal of Automation, Mobile Robotics & Intelligent Systems

[43] S. Lewandowsky, S.-C. Li, Catastrophic interference in neural networks: Causes, solutions, and data, Dec. 1995. DOI: 10.1016/B978-012208930-5/50011-8 [44] A. Nair et al., “Massively Parallel Methods for Deep Reinforcement Learning”, ArXiv1507.04296 Cs, Jul. 2015. [45] M. Hausknecht, P. Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs,” ArXiv1507.06527 Cs, Jul. 2015. [46] H. van Hasselt, A. Guez, D. Silver, “Deep Reinforcement Learning with Double Q-learning,” ArXiv1509.06461 Cs, Sep. 2015. [47] T. Schaul, J. Quan, I. Antonoglou, D. Silver, “Prioritized Experience Replay,” ArXiv1511.05952 Cs, Nov. 2015. [48] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, N. de Freitas, “Dueling Network Architectures for Deep Reinforcement Learning”, ArXiv1511.06581 Cs, Nov. 2015. [49] M. Fortunato et al., “Noisy Networks for Exploration,” ArXiv1706.10295 Cs Stat, Jun. 2017. [50] M. Hessel et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” ArXiv1710.02298 Cs, Oct. 2017. [51] V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” ArXiv1602.01783 Cs, Feb. 2016. [52] J. Schulman, P. Moritz, S. Levine, M. Jordan, P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” ArXiv1506.02438 Cs, Jun. 2015. [53] M. Jaderberg et al., “Reinforcement Learning with Unsupervised Auxiliary Tasks,” ArXiv1611.05397 Cs, Nov. 2016. [54] H. Noh, S. Hong, B. Han, “Learning Deconvolution Network for Semantic Segmentation”, ArXiv1505.04366 Cs, May 2015. DOI: 10.1109/ICCV.2015.178. [55] Z. Wang et al., “Sample Efficient Actor-Critic with Experience Replay”, ArXiv1611.01224 Cs, Nov. 2016. [56] R. Munos, T. Stepleton, A. Harutyunyan, M. G. Bellemare, “Safe and Efficient Off-Policy Reinforcement Learning”, ArXiv1606.02647 Cs Stat, Jun. 2016. [57] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, P. Abbeel, “Trust Region Policy Optimization,” ArXiv1502.05477 Cs, Feb. 2015. [58] S. M. Kakade, “A Natural Policy Gradient,” p. 8. https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf [59] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, “Proximal Policy Optimization Algorithms,” ArXiv1707.06347 Cs, Jul. 2017. [60] Y. Wu, E. Mansimov, S. Liao, R. Grosse, Ba, “Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation,” ArXiv1708.05144 Cs, Aug. 2017. [61] J. Martens, R. Grosse, “Optimizing Neural Networks with Kronecker-factored Approximate Curvature,” ArXiv1503.05671 Cs Stat, Mar. 2015. [62] R. Grosse, J. Martens, “A Kronecker-factored approximate Fisher matrix for convolution layers,” ArXiv1602.01407 Cs Stat, Feb. 2016.

VOLUME 12,

N° 3

2018

[63] Bonsai “Writing Great Reward Functions” Youtube https://www.youtube.com/watch?v=0R3PnJEisqk [64] X. Guo, “Deep Learning and Reward Design for Reinforcement Learning,” p. 117. [65] A. Y. Ng, S. Russell, “Algorithms for Inverse Reinforcement Learning”. In: ICML 2000 Proc. Seventeenth Int. Conf. Mach. Learn., May 2000. ISBN:1-55860-707-2 [66] Y. Duan et al., “One-Shot Imitation Learning,” ArXiv1703.07326 Cs, Mar. 2017. [67] “CS 294 Deep Reinforcement Learning, Fall 2017”, Course. [68] C. Finn, P. Abbeel, S. Levine, “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” ArXiv1703.03400 Cs, Mar. 2017. [69] D. Kulkarni, R. Narasimhan, “Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation.” arXiv:1604.06057 Cs. [70] A. Gudimella et al., “Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks,” ArXiv1709.06977 Cs, Sep. 2017. [71] R. Negrinho and G. Gordon, “DeepArchitect: Automatically Designing and Training Deep Architectures,” ArXiv1704.08792 Cs Stat, Apr. 2017. [72] J. X. Wang et al., “Learning to reinforcement learn”, ArXiv1611.05763 Cs Stat, Nov. 2016. [73] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, P. Abbeel, “RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning,” ArXiv1611.02779 Cs Stat, Nov. 2016. [74] M. T. J. Spaan, “Partially Observable Markov Decision Processes,” Reinf. Learn., p. 27. DOI: 10.1007/978-3-642-27645-3_12 [75] Bonsai, M. Hammond, Deep Reinforcement Learning in the Enterprise: Bridging the Gap from Games to Industry”, Youtube. https://www. youtube.com/watch?v=GOsUHlr4DKE [76] Emine Cengil, Ahmet Çinar, “A GPU-based convolutional neural network approach for image classification”. DOI: 10.1109/IDAP.2017.8090194 [77] “Why are GPUs necessary for training Deep Learning models?”, Analytics Vidhya, 18-May2017.

Articles

39


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Mobile Ferrograph System for Ultrahigh Permeability Alloys Submitted: 10th July 2018; accepted; 14th September 2018

Tomasz Charubin, Michał Nowicki, Andriy Marusenkov, Roman Szewczyk, Anton Nosenko, Vasyl Kyrylchuk DOI: 10.14313/JAMRIS_3-2018/16 Abstract: This paper presents a mobile ferrograph system for measurement of magnetic parameters of ultrahigh permeability alloys. The structure of developed system is described, as well as exemplary measurement results of Co67Fe3Cr3B12Si15 alloy used in space applications are presented. Keywords: ferrograph, hysteresis, amorphous alloys, mobile measurement system

2. Developed System The system is used for NDT of materials by the means of change in behavior of magnetization characteristic. It has a modular mechanical and software structure, which allows for easy replacement of each part and ensures high flexibility for various applications. The software has been written in LabVIEW programming environment, in which the modularity is easy to achieve.

2.1. Mobile Ferrograph System

1. Introduction

40

Measurement of magnetic parameters of materials is an important issue in various applications, e.g. determining the type of a transformer core or verifying structure integrity of 3-D printed inductive elements. By measuring the full magnetic hysteresis loop, one can obtain results on power loss, permeability, or other parameters of a sample, which are essential when designing switching-mode power supply systems. Hysteresis measurement is usually performed with fluxmeters [1], ferroscopes, ferrographs [2] or hysteresisgraph systems [3]. Additionally, recently there are methods of nondestructive testing (NDT) developed, which allow for stress evaluation in constructional steels on the base of hysteresis curve measurements [4]. Contemporary hysteresisgraph systems are designed for stationary, laboratory operation, and therefore are unsuitable for NDT in steel constructions. The presented solution is a fully mobile, wheeled ferrograph system. Due its relatively light chassis and modular construction it was already used in various investigations [5]. The second problem presented in this paper are the ultrahigh permeability alloys. Modern magnetic materials include ferrites, amorphous alloys, nanocrystalline alloys, Heusler alloys etc [6], [7]. Some of these new materials exhibit very high relative magnetic permeability, in some instances higher than 500 000. Because this value is many orders of magnitude higher than typically encountered in e.g. electrical steels, special approach to measurement of the magnetic hysteresis curve is needed. Thus, we propose modifications which allow such measurements on the ferrograph system originally developed for NDT constructional steel testing.

Fig. 1. Schematic diagram of the developed Blacktower Ferrograph System Figure 1. presents a block diagram of the Blackto­ wer Ferrograph System. The core of the system is a PC equipped with an NI PCI-6221 DAQ card, wh ich is fed signals from a Lakeshore 480 fluxmeter and a KEPCO BOP36-6M U/I converter. The software controls the U/I converter, which passes current to the magnetizing


Journal of Automation, Mobile Robotics & Intelligent Systems

coils through the current connector (Fig. 1). The voltage from measurement coils is fed to the fluxmeter by low-noise connector. The system is also equipped with a continuity tester, to ensure that there is a separation between input and output connectors, because from experience, often the two are shorted through an unintended insulation breakdown. The system was designed and constructed as a mobile unit, allowing for additional measurement flexibility.

2.2. Measurement Stand for Ultrahigh Permeability Alloys The ferrograph system was expanded with new functionality to accommodate for measurements of ultrahigh permeability alloys, such as used in space applications. The permeability of such materials is so high, that even relatively weak Earth’s magnetic field (~40 μT) can significantly influence the measurement results (order of ±10%). Thus it was necessary to shield the sample from external field influence. In our approach it is done with three-axial Helmholtz coils (Fig. 2), with proper current power supplies and magnetometer (the latter omitted in schematic for clarity), which cancel out external field influence to the level lower than 0.1 μT.

VOLUME 12,

N° 3

2018

alloy Co67Fe3Cr3B12Si15 is a novel magnetic almost zero magnetostriction and remarkable soft magnetic properties. Magnetic cores made of Co67Fe3Cr3B12Si15 ribbons have a number of advantages, especially, high initial and maximum magnetic permeability. Moreover, the alloy is characterized by low core loss. Magnetic properties of this alloy can be improved by inducing magnetic anisotropy by the annealing in magnetic field. After such processing, the hysteresis loop changes its shape and becomes square or flat depending on the direction and magnitude of the induced magnetic anisotropy. Therefore, the possibility to measure its shape (Fig. 3) is very important, as it directly influences the fluxgate characteristics.

Fig. 3. Exemplary results of the measured quasi-static hysteresis loop of Co67Fe3Cr3B12Si15 alloy. Note the extraordinarily low coercive force of the material

4. Conclusions

Fig. 2. Schematic diagram of the ultrahigh permeability alloys measurement test stand Another issue is connected with magnetization of the sample – with so high a permeability, very low magnetizing fields have to be used. For standard magnetizing coils wound on the sample, U/I converter should have very low maximum current output. In our system it would significantly affect its performance, as an additional source of noise and errors. This problem can be solved with usage of straight magnetizing rod instead of magnetizing coils. What is more, for all necessary calculations, the resulting magnetizing field H can be calculated the same as for the magnetizing coil of 1 turn. It also has better field uniformity than magnetizing coils with low windings number.

3. Exemplary Results

The developed system was utilized in measurements of materials used in the construction of fluxgate sensors for space applications. The amorphous

The presented ferrograph system has many novel features, such as modular software and hardware architecture and mobile construction, allowing e.g. for NDT testing of steel elements. Moreover, it was modified for measurements of ultrahigh permeability alloys used for space applications. The shielding Helmholtz coils compensate external fields influence, while the magnetizing rod achieves better field uniformity. The additional advantage is separation of the sample from the current-carrying rod, thus ensuring that no self-heating will influence he measurement results. Calibration and adjustment of the system were carried out with certified standards, allowing for 0.1% accuracy in measurement of flux density B, and 0.01% accuracy in measurement of magnetizing field H.

Acknowledgement

This work was fully supported by the statutory funds of Institute of Metrology and Biomedical Engineering, Warsaw University of Technology

AUTHORS

Tomasz Charubin – Faculty of Mechatronics, Warsaw University of Technology, Warsaw, Poland. Articles

41


Journal of Automation, Mobile Robotics & Intelligent Systems

Michał Nowicki* – Faculty of Mechatronics, Warsaw University of Technology. Warsaw, Poland. Andriy Marusenkov – Lviv Centre of Institute for Space Research of National Academy of Sciences and State Space Agency of Ukraine. Roman Szewczyk – Faculty of Mechatronics, Warsaw University of Technology, Warsaw, Poland. Anton Nosenko – G.V. Kurdyumov Institute for Metal Physics, National Academy of Sciences of Ukraine. Vasyl Kyrylchuk – G.V. Kurdyumov Institute for Metal Physics, National Academy of Sciences of Ukraine. *Corresponding author

REFERENCES

[1] T. Kulik, et. al., “A high-performance hysteresis loop tracer”, J. App. Phys., vol. 73, 1993, 6855– –6857. DOI: 10.1063/1.352461. [2] S. Tumanski, “Handbook of Magnetic Measurements”, CRC Press: Boca Raton, 2011. DOI: 10.1201/b10979. [3] T. Charubin, et. al. “Analysis of Automated Ferromagnetic Measurement System”, Advances in Intelligent Systems and Computing, vol. 543, 2017, 593–600. DOI: 10.1007/978-3-319-48923-0_63. [4] D. Jackiewicz, et. al. “Influence of Stresses on Magnetic B-H Characteristics of X30Cr13 Corrosion Resisting Martensitic Steel”, Advances in Intelligent Systems and Computing, vol. 267, 2014, 607–614. DOI: 10.1007/978-3-319-05353-0_57. [5] D. Jackiewicz, et. al. “Investigation of the Magnetoelastic Villari Effect in Steel Truss”, Advances in Intelligent Systems and Computing, vol. 519, 2017, 63–70. DOI:10.1007/978-3-319-46490-9_9. [6] A. Krings, A. Boglietti, A. Cavagnino, S. Sprague, “Soft Magnetic Material Status and Trends in Electric Machines”, IEEE Transactions on Industrial Electronics, vol. 64, no. 3, 2017, 2405– –2414. DOI: 10.1109/TIE.2016.2613844. [7] V. Zhukova, et. al., “Magnetic properties and MCE in Heusler-type glass-coated microwires”, J. Supercond. Nov. Magn., vol. 26, 2013, 1415–1419. DOI: 10.1007/s10948-012-1978-2.

42

Articles

VOLUME 12,

N° 3

2018


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Comparative Study of PI and Fuzzy Logic Based Speed Controllers of an EV with Four In-Wheel Induction Motors Drive Submitted: 26th October 2018; accepted: 15th November 2018

Abdelkader Ghezouani, Brahim Gasbaoui, Nouria Nair, Othmane Abdelkhalek, Jemal Ghouili

DOI: 10.14313/JAMRIS_3-2018/17 Abstract: This paper presents the modeling, control and simulation of an electric vehicle with four in-wheel 15 kw induction motors drive 4WDEV controlled by a direct torque control DTC strategy, where two control techniques are presented and compared for controlling the electric vehicle speed: the first one is based on a classical PI controller while the second one is based on a fuzzy logic controller (FLC). The aim is to evaluate the impact of the proposed FLC controller on the efficiency of the 4WDEV taking into account vehicle dynamics performances, autonomy and battery power consumption. When the classical controller can’t ensure the electric vehicle stability in several road topology situations. To show the efficiency of the proposed new control technique on the traction system by 4WDEV. The vehicle has been tested in different road constraints: straight road, sloping road and curved road to the right and left using the Matlab / Simulink environment. The analysis and comparison of the simulation results of FLC and PI controllers clearly show that the FLC ensures better performances and gives a good response without overshoot, zero steady state error and high load robustness rejection, compared to the PI controller which is present an overshoot equal 7.3980% and a rise time quite important (0.2157 s with PI controller and 0.1153 s with FLC). As well as the vehicle range has been increased by about 10.82 m throughout the driving cycle and that the energy consumption of the battery has been reduced by about 1.17% with FLC. Keywords: electric vehicle, induction motor, PI controller, fuzzy logic controller FLC, direct torque control DTC, four in-wheel induction motors

1. Introduction In electric traction systems, the overall performance of an electric vehicle depends mainly on the type of drive motor used (as an indispensable part of the traction drive) for a four-wheel drive electric vehicle (EV4WD). Among the different types of engines exist in the literature; the induction motor seems to be the candidate that feels better the main characteristics of the propulsion [1], [2]. Due to their good performance (low purchase cost, simple construction, robust, does no. need maintenance, they support overloads prove to go up to 5 or 7 times the nominal torque [3], [4], the good dynamic performance of the

torque control). However, these advantages have long been inhibited by the complexity of the control. This complexity is mainly due to the following reasons: – The analytical model of the induction machine is non-linear. – Presence of parametric uncertainties and the need to take into account their variation over time. From this fact, various control techniques have been developed to give the induction motor with precision, flexibility of control and the quality of electromagnetic conversion. Direct torque control (DTC) is one of the most popular control techniques for induction motors [5], [6]. This technique is proposed by I. Takahashi and T. Noguchi [7] and Depebrock [8] in the late 1980s. The main advantages of this method are the very fast torque response to load torque changes, less dependencies to machine parameters and a simple control scheme [5], control without modulation of the width of the impulse (MLI), control of flux without using controllers of currents, control without speed sensor is possible since the method does no need accurate information on the rotor position angle [9]. In this article, a new method of speed control based on fuzzy logic controller is proposed for an electric vehicle with four in-wheels induction motor. Compared to a classical PI controller, the proposed approach has the advantages of simplicity, flexibility, and high accuracy. Modelling and simulation are carried out using the Matlab/Simulink tool to investigate the performance of the proposed system. The paper structure is organized as follows: the main components of the proposed pull chain are shown without section 2. Section 3 presents the DTC control strategy of induction motor, the mathematical model of 4WD electric vehicle and electronic differential; as well the design of the PI and FLC speed controllers is show in section 4. Regarding section 5, it shows the simulation results. In the end, section 6 conclusions.

2. 4WD Electric Vehicle Description

The traction chain of the four-wheel drive electric vehicle 4WDEV is shown in Fig. 1. The power structure in this traction chain is composed of four inwheel induction motors which are supplied by four three-phase inverters. The lithium-ion battery (Lithium-ion) is the main source of the vehicle. It is coupled to a DC-DC bidirectional power bus (Buck-Boost

43


Journal of Automation, Mobile Robotics & Intelligent Systems

Fig. 1. Electric vehicle with four in-wheels drives 4WD EV schematic diagram

Fig. 2. Direct torque control DTC block diagram 44

Articles

VOLUME 12,

N° 3

2018


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

converter) controlled by a PI type voltage regulator. The DC bus feeds the fourth induction motors through a voltage inverter (DC-AC converter). However, the drive motor shaft is coupled to the vehicle wheel via a power transmission line, each motor equipped with a fixed ratio gear and attached to the wheel constituting a driving wheel. This configuration is integrated in the wheels with fixed gear. The control method used for each in-wheel induction motor drive is the direct torque control (Fig. 2). The in-wheel motors are managed by an electronic differential. This system uses the position of the throttle and the angle of the steering wheel defined by the rotation of the wheel at the main position as inputs.

2.1. Traction Induction Motor Model

In This paper three-phase induction motors (IMs) are used. The induction motor model, with the stator currents and the stator flux as state variables, in the stationary (α, β) reference frame can be expressed by [10]:

1  disα  dt = −ηisα − ωr isβ + Kϕ sα + σ L ωr ϕ sβ + αVsα s   disα 1 = −ηisβ + ωr isα + Kϕ sβ − ωr ϕ sβ + αVsα  σ Ls  dt  dϕ  sα = Vsα − Rs isα  dt  dϕ sβ  = Vsβ − Rs isβ  dt  dωr 3p T p B (1) = (ϕ sα isβ − ϕ sβ isα ) − ωr − L  J 2 dt J J  With

R  R η = s + r  σ Ls σ Lr

 Rs 1 �α= , � K = σ σ L L Ls  s r

σ =1−

M2 Ls Lr

(2)

Where usα, usβ, φsα, φsβ, isα, isβ are respectively the stator voltage, stator flux and stator current vector components in (α, β) stator coordinate system, ωr is the rotor electrical angular, Ls, Lr, M are stator, rotor and magnetizing inductances respectively, Rs, Rr are respectively stator and rotor resistances, σ is the redefined leakage inductance, Te and TL is electromagnetic torque and load torque, J, B are the rotor inertia and fractional coefficient, p is the number of pairs poles.

2.2. Conventional Direct Torque Control Strategy for One in-Wheel IM

The conventional direct torque control strategy is developed in 1986 by Takahashi [7]. It is based on the direct determination of the control sequence applied to the switches of a voltage inverter. Fig. 2 shows the block diagram of the DTC technique. The measured speed of the motor is compared with the reference speed ωr*, the error obtained is processed

N° 3

2018

by a PI-type controller. The controller produces the reference torque value Te*. The reference flux value φs* is determined from the parameters of the induction motor. The torque Te, the stator flux φs and the flux angle θs of the induction motor are estimated using the measurements of the two stator phase currents and the DC-Link voltages (Udc). The estimated stator flux and the estimated torque are compared with their reference values φs* and Te* respectively. The obtained errors are applied to the two levels hysteresis controller for flux control and three levels hysteresis controller for torque control. The outputs of the stator flux and torque hysteresis controllers, torque and the stator flux sector (where Cφs is the stator flux error after the hysteresis block, CTe is the torque error after the hysteresis block and Ni (i = 1, …, 6) means the sector) are the inputs of a switching table. This table generates the convenient combinations of the (ON or OFF) states of the inverter power switches. There are seven possible switching combinations, with two corresponding to the zero voltage space vectors which are (000) and (111). (6 active states V1 to V6 and 2 states zero V0, V7). When the stator flux is in sector N, if the torque and flux errors increase, if the torque error increases but the flux error decreases VN+2 are selected. The estimated value of flux and its phase angle is calculated in expression 3, 4, 5 and 6, respectively [11].

 ϕ sα =   = ϕ sβ 

t

∫(V α − R i α )dt s

s s

0 t

∫(V β − R i β )dt s

0

(3)

s s

ϕs = ϕ + ϕ � sα

� sβ

(4)

 ϕ sβ  θ s = artg  (5)  ϕ sα   Where φsα, φsβ are the α and β axes stator Flux, φs is the stator Flux, θs is the phase angle. And the torque is controlled by three-level hysteresis. Its estimation value is calculated in expression (6).

3 Te = p (ϕ sα isβ − ϕ sβ isα ) 2

(6)

3. 4WD Electric Vehicle Dynamic Modeling 3.1. Vehicle Dynamic Based on the principals of vehicles mechanical and aerodynamics. The external forces acting on the vehicle in the longitudinal direction (Fig. 3) [12], [13], [14] are: the rolling resistance force FRR due to friction of the vehicle tires on the road; the aerodynamic drag force Faero caused by the friction on the body moving through the air, and the climbing force FC, that depends on the road slope, the acceleration force Facc. The total tractive resistive force FR of the 4WDEV is the sum of resistive force, as in (7) [14]

FR = FRR + Faero + FC + Facc

(7)

Articles

45


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Jij and Bij are the moment of inertia and friction coefficient of each motor, respectively and subscripts lf, rf, lr and rr mean left-front, right-front, left-rear and right-rear, respectively. Where Fig. 3. Forces exerted on the 4WD electric vehicle Where the force are given by: 1) The rolling resistance force (FRR) is defined by: FRR = mgC R cos (α )

2) The aerodynamic drag force (Faero) is given by:

(8)

1 2 Faero = ρair A f CdVveh 2

(9)

FC = ±mg sin (α )

(10)

3) The hill climbing force (FC) is: the force on the vehicle to move up or move upward with a slope

4) The force related to acceleration (Facc) is:

Facc = m

dVveh = mγ dt

(11)

Finally the resisting couple FR is given by

1 2 FR = mgC R cos (α ) + ρair A f CdVveh + mg sin (α ) 2 (12)

The final expression of total resistive torque TR is given by TR = Rw FR

1 2 = mgC R Rw cos (α ) + ρair A f Cd RwVveh 2 + mgRw sin (α )

(13)

Where m (kg) is the total mass of the vehicle, g (m/s2) is the acceleration of gravity; Cr is the tire rolling resistance coefficient and α (rad) is the road slope angle; ρair (kg/m3) is the mass density of air; Af (m2) is the frontal area of the vehicle; Cd is the aerodynamic drag coefficient and Vveh (m) is the vehicle speed; Rw (m) is the wheel radius.

3.2. Dynamics of the Driving Wheel

The dynamics of each in-wheel motor drive system may be expressed as J ij

46

dωr �ij dt

Articles

= Te−ij − Bij ωr �ij − TR−ij

{ij} = lf �rf

lr��rr (14)

TR−ij =

TR Rw = FR � �

Thin the speed of each in-wheel is given by 1   ω r _ lf = J (Te−lf lf   1 (Te−rf ω r _ rf = J rf    ω = 1 T  r _ lr J ( e−lr lr  1  ωr _ rr = J (Te−rr rr 

− Blf ωr _ lf − TR _ lf

)

− Brf ωr _ rf − TR _ rf

)

(15)

− Blr ωr _ lr − TR _ lr ) − Brr ωr _ rr − TR _ rr )

(16)

Each motor is equipped with a fixed ratio speed reducer and attached to the wheel constituting a driving wheel [15]. The gear is modelled by the gear ratio, the transmission efficiency and its inertia, i.e.

ωr −ij  ωwheel −ij = k gear  T  wheel −ij = Te−ij k gearηt

(17)

Where ηt efficiency of the gearbox; kgear is the gearbox coefficient. The total moment of inertia associated with the vehicle (Jij), in the motor referential is given by

 J ij = J wheel + JV + J m−ij  2  1  Rw  (18)  (1 − λ )  JV = m  2  k gear    Where Jwheel wheel is the shaft inertia moment including the motor and wheel inertia, Jv is the inertia moment of the vehicle, λ represent the slipping of the wheel (λ is usually low and can be neglected if the adhesion coefficient of the road is high).

3.3. Modeling of Electronic Differential System (EDS)

The EDS for Electric vehicle with four independent in-wheels motors is a very complex control system, it needs the control for different speeds simultaneously. Fig. 4(a) is presented the proposed electronic differential structure, where the left and right front wheels, the left and right rear wheels are controlled by using four in-wheel motors. The Induction motors are preferred due to the high-efficiency, high torque density, silent operation and the low maintenance favours of the electric vehicle applica-


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

a)

N° 3

2018

b)

Fig. 4. (a) Proposed Electronic Differential, (b) Kinematic Model of the 4WDEV driven during a curve

tion. Two inputs steering angle and throttle position collectively are decided the speeds of the right and left wheel (front and rear) to prevent the vehicle from slipping. For a right turn, the differential has to keep up a higher speed at the left front and left rear wheels than the right front and right rear wheels to prevent the tires from losing traction while turning. The Ackerman and Jeanted [14] shown Fig. 4(b) can be used. It shows the kinematic model of the proposed system in a left turning manoeuvre. The relevant parameters are shown in Table 1.

ωveh δ δ1 δ2

Lω dω R R1 R2 r

r1 r2

Reference speed of vehicle. Turning angle of left front wheel, ( ° )

Turning angle of right front wheel, ( ° ) Length of vehicle, (m) Width of vehicle, (m)

Steering radius of inside rear wheel.

Steering radius of center of front axle. Steering radius of inside front wheel.

Steering radius of outside front wheel.

Linear speed of left front in-wheel and right front in-wheel Linear speed of left rear in-wheel and right rear in-wheel.

Lw  R = tan δ ( )   dw R1= R − 2  dw  R2= R + 2 

(19)

The steering radius of two front in-wheels motor drive can also be calculated by the geometrical relationship

Steering radius of center of rear axle Steering radius of outside rear wheel.

vlf , vrf

From this model, the following characters can be calculated

Name

Steering Angle ( ° )

Name

vlr, vrr

Table 1.Definition of parameters in kinematic model Elements

Elements

 r = L2 + R 2 w   dw   2 2 2 r1 = Lw + R1 = Lw +  R − 2    r = L2 + R 2 = L2 +  R + dw  w w 2  2 2 

  

2

  

2

(20)

By applying the instantaneous center theorem. The angular velocity speeds of the two fronts and rear in-wheels motor drive are given by Articles

47


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

2   dw   1 +  cot (δ ) −   2Lw   ωlf = ωveh 2  1 + ( cot (δ ) )   2  dw   δ 1 cot + + ( )    2Lw   ωrf = ωveh 2  1 + ( cot (δ ) )    d tan (δ )  ω ω  1 − w =  lr veh   2Lw      dw tan (δ )  =  ωrr ωveh  1 + 2Lw     (21) According to equation (20) and equation (21), the speed reference of four in-wheels induction motors are expressed as

   * ωr _ lf      ωr*_ rf    ω *  r _ lr   * ωr _ rr 

* = k gear ωlf* = k gear ωveh

* = k gear ωrf* = k gear ωveh

 d  1 +  cot (δ ) − w  2 Lw  

2

1 + ( cot (δ ) )

2

 d  1 +  cot (δ ) + w  2 Lw  

2

1 + ( cot (δ ) )

2

 dw tan (δ )  * = k gear ωlr* = k gear ωveh  1 −  2Lw   d tan (δ ) * * = k gear ωlrr = k gear ωveh (1 + w ) 2Lw

(22) Where kgear is the gearbox ratio, ω*veh is the angular reference speed of the vehicle and ω*lf , ω*rf , ω*lr, ω*rr are the speed references of left front wheel, right front wheel, left rear wheel and right rear wheel respectively.

3.4. Synthesis of the Different Speed Controllers of One in-Wheel IM

48

The synthesis of a command must be allowed by the calculation of the instructions to be applied to the actuators so that the vehicle can perform a specified movement. A number of different type’s controller for speed control of an induction motor for this electric vehicle application that has been investigated. PI (proportional and integral) and Fuzzy Logic controllers were chosen for simulation. The structure of the speed control is shown as the external loop in Fig. 5. The steering angle δ and reference angular speed of the vehicle ω*veh are fed to Electronic Differential (ED). The ED algorithm produces the speed reference of the front and rear in-wheel motor (ω*r_lf , ω*r_rf , ω*r_lr, ω*r_rr). The reference and actual speed of each in-wheels motor are the inputs of the speed controller blocks. The speed error is used in PI and Fuzzy Logic speed controllers. The closed-loop speed controller generates the reference motor torque T*e_ij. Articles

N° 3

2018

3.4.1. PI Speed Controller Design If we want the effect of external disturbances to be zero and if the speed is constant. And assume all the initial conditions are zero. The Laplace transfer function of (14) can be given as:

G( s ) ≈

ωr _ ij ( s )

1 = Te−ij ( s ) J ij s + Bij

Therefore, the closed PI controller loop is F(s) =

kp s + kI

J ij s + (k p + Bij ) s + kI 2

(23)

(24)

The denominator in equations (24) can be rewritten as (k p + Bij ) k (25) s2 + s+ I J ij J ij So resonance frequency ωn and damping ratio ξ are given by

ωn =

k p + Bij kI �ξ= J ij 2 kI J ij

(26)

Therefore, kp and ki can be determined as:

k p = �ξωn J ij − Bij  � (27) kI = ωn J ij

To optimize dynamic performance and system stability, we opt for a closed-loop damping coefficient ξ of value equal to 0.7. The law of PI controller for the four in-wheel induction motors is:

(

)

t

(

)

Te*_ ij = k p ωr*_ ij − ωr _ ij + kI ∫ ωr*_ ij − ωr _ ij dt (28) 0

3.4.2. Fuzzy Logic Speed Controller Design (FLC)

In this section, the PI speed controller is replaced by the fuzzy logic controller (FLC). The schematic model of the proposed FLC is shown in Fig. 6. It can be seen that direct torque control (DTC) method is employed in the given block diagram. The proposed control system (as shown in Fig. 2) has two inputs: the first is the desired speed of the motor (ω*r_ij). The second input is the feedback signal, which represents the actual motor speed (ωr_ij). FLC was applied to this system to control the speed of the induction motor. Similar to PI speed controller, the speed error signal (e) is fed into FLC to determine the measured rotor speed (ωr_ij). The inputs variables of FLC are speed error (e) (i.e. equation 29) and rate of change in speed error (∆e) (i.e. equation 29) and the output variable is


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Fig. 5. General configuration of the four in-wheel motor drives speed control strategy

e = ω − ωr _ ij  ∆e = e(k ) − e(k − 1) * r _ ij

(29)

Where indices (k) and (k-1) indicate the present state and the previous state of the system, respectively.

a) a)

Degree ofDegree membership of membership

the reference torque value (T*e_ij) for the DTC which is shown in Fig. 7 and Fig. 8.

1 0.8 1 0.6 0.8 0.4 0.6 0.2 0.4

GN

MN

PN

ZE

PP

MP

GN

MN

PN

ZE

PP

MP

0 0.2 -1

-0.5

0

b)

In this control scheme, the Mamdani type, triangular membership function MFs (i.e. 7MFs) for the input and output variables, the max-min reasoning method, and the centroid method for the defuzzification are used [16]. The triangular-shaped membership functions for input (e and ∆e) and output (T*e_ij) variables are shown in Fig. 7 and Fig. 8, respectively. The proposed fuzzy sets (linguistic definition) and MFs for inputs and output variables are defined as follows: GN (Grand Negative), MN (Medium Negative), PN (Small Negative), ZE (Zero Error), PP (Small Positive), MP (Medium Positive), GP (Big Positive). Seven membership functions (MFs) are chosen for the inputs (e and ∆e) and seven for the output (T*e_ij) variable. All the MFs are normalized to be between [–1, 1].

b)

Degree of membership Degree of membership

Fig. 6. Block diagram of the proposed fuzzy logic speed control system with DTC

-1

1 0.8 1

GN GN

-0.5

0

GP

0.5

e

0

GP

MN

PN

ZE e

PP

MN

PN

ZE

PP

0.5

MP

MP

1

GP

1

GP

0.6 0.8 0.4 0.6 0.2 0.4 0 0.2 -1

-0.5

0 -1

-0.5

0

de 0

de

0.5

1

0.5

1

Fig. 7. Membership functions for the (a) inputs error (e) and (b) change in Error (∆e) Articles

49


Journal of Automation, Mobile Robotics & Intelligent Systems

GN

Degree of membership

1

MN

PN

ZE

PP

MP

VOLUME 12,

GP

0.8 0.6 0.4 0.2 0 -1

-0.5

0

0.5

Te*

1

Fig. 8. Membership functions for the output change of control (T*e_ij) The total number of possible linguistic rules used in the proposed fuzzy logic speed controller contains forty-nine (49) rules for each output. The resulting fuzzy inference rules for the output variable T*e_ij are shown in Table 2. Table 2. Fuzzy tuning rules

GN MN PN

∆e

ZE PP MP GP

GN

MN

PN

ZE

PP

MP

GP

GN

GN

MN

MN

PN

PN

ZE

PN

PN

ZE

PP

ZE

PP

GN

MN

MN

PN

PN

ZE

PP

MN

MN

PN

PN

ZE

PP

MN PN

PP

PN

PN

ZE

PP

PP

MP

ZE

MP

MP

MP

GP

GN

GP

Te*

0 -0.5

1

-0.5 de

-1

-1

0

-0.5

0.5

1

e

Fig. 9. Three dimensional plot of the control surface

3.5. Vehicle Reference Speed Profile

50

Before calculating the reference torque has been made of each motor wheel, it is necessary to define Articles

Event information

Vehicle speed km/h

01

0s < t < 4s

50 km/h

02

4s < t < 7.5s

curved road at right side

Climbing a slope 10%

30 km/h

7.5s < t < 10s

Acceleration and curved road

80 km/h

Fig. 10. Specify driving road topology

0.5

0

Time (Sec)

MP

The rules the of fuzzy logic system can be explained using examples: – If (speed error is MN) and (change in speed error is GN), then (reference torque variation is GN). – If (speed error is GN) and (change in speed error is GN), then (reference torque variation is GN). The fuzzy rules and surface viewer of the proposed controller are shown in Fig. 9.

0.5

Phase

PP

MP

MP

Table 3. Specified driving route topology

PP

PP

PP

2018

a speed profile that faithfully represents the movements that the vehicle will have to perform. The specified road trajectory is shown in Fig. 10. This trajectory is defined by three phases successively. In the first one the vehicle is moving on the curved road at right side with speed of 50 km/h, the second phase present the acceleration phase’s beginning with 80Km/h in curved road at left side, and finally the vehicle is moving up the slopped (climbing) road of 10% under 30 km/h, the speed road constraints are described in Table 3.

03

e

N° 3

4. Simulation Results To check and compare the effectiveness of the different speed controllers (PI and FLC) proposed in this study. In this section, numerical simulations were preformed using Matlab / Simulink environment, on the traction system by an electric vehicle propelled by four different 15 Kw induction motors integrated in the front and rear wheels (4WDEV) see the model in Fig. 2. The aims of the simulation carried out evaluated the efficiency of the different speed controllers (classical PI and FLC proposed) on the dynamics of the electrical vehicle, a comparison of which was made between the two. This system has been simulated by a reference vehicle speed given by the topology illustrated in Fig. 10. Table 4 summarizes the mechanical and aerodynamic characteristics of 4WD electric vehicles. The induction motor parameters are given in Table 5. In order to ensure and confirm the effectiveness of the DTC control strategy on the traction system by 4WDEV. The system has been subjected to a change in the reference speed according to the topology shown in Fig. 10. At this stage of operation two turns are imposed by the driver on the vehicle chassis by steering angle, one to the right (phase 01) at time t = 1.7 sec and the other at left t = 5.5 sec.


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

Table 4. Proposed 4WD electric vehicle parameters Parameters Name

Value

Wheel radius, Rw (m)

0.32

Vehicle mass, m (kg)

1300

Vehicle frontal area, Af (m2)

2.60

Aerodynamic drag coefficient, Cd

Wheels Speed [km/h]

1.5

Length of vehicle, Lω (m)

2.5

Table 5. Induction motor parameters Parameters Name

0.0651

Mutual Inductance, M (H)

0.06419

Rotor Resistance, Rr (Ohm)

0.2205

Motor- load inertia, J (Kg⋅m2)

0.102

Wheels Speed [km/h]

15

0.009541

Steering Angle [deg °]

0

0

2

4

6

P h as e 3

8

0

2

P ha s e 2

4

P ha s e 3

6

8

Rear Right Wheel

10

Slope

Rear Lef t Wheel

32

Front Right Wheel

30 28

80

8.5

10

Time [s]

Fig. 11. Steering angle variation The linear speeds of the front and rear wheels with PI and Fuzzy Logic controllers are shown in Fig. 12 (a) and (b), respectively. We assume that the turns are made at a constant speed, the driver gives a steering angle δ* which begins to be a steering angle of the front wheels. The electronic differential immediately acts on the fourth in-wheels IMs, decreasing the speed of the two wheels that are located inside the turn, and increasing the speed of the wheels located outside the turn. During the first pilot (phase 01), the two left front and rear wheels located outside the curved right turn are rotated at higher speeds than the two right

9

60 Left Turn

40 20 Rigth Turn P ha s e 1

10

P h as e 2

9.5

Rigth Turn

100

20

P h as e 1

9

Left Turn

40

Front Lef t Wheel

30

-20

8.5

60

120

Fig. 11 shows the curve of the steering angle of the front wheels given by the driver. The positive value corresponds to a right turn (δ = 20°), and the negative value corresponds to a left turn (δ = –8°).

-10

25

b)

2

Viscous friction coefficient, J (N⋅m⋅s)

30

Time [s]

0.2147

Rated power, J (Kw)

Front Right Wheel

80

0

0.0651

Number of pole pairs, p

35

P ha s e 1

Rotor Inductance, Lr (H)

Stator Resistance, Rs (Ohm)

Rear Lef t Wheel

20

Value

Rotor Inductance, Ls (H)

Slope

Front Lef t Wheel

5

Width of vehicle, dω (m)

Rear Right Wheel

100

1.2

Gear coefficient, kgear

side wheels (front and rear). On the other hand, it can be seen that the two front right and left rear wheels rotate at higher speeds than the two left side wheels during the second turn (phase 02) as shown in Fig. 12. 120

0.01

Air density, ρair (kg/ m2)

2018

a)

0.3

Tire rolling resistance coefficient, Cr

N° 3

0

0

2

P ha s e 2

4

6

P ha s e 3

8

10

Time [s]

Fig. 12. Four wheels speed variation in different phases (a) PI, and (b) FLC The comparative study of the speed response (Fig. 13 (a)) shows that the different controllers namely the classical PI and FLC, giving almost the same speed profile, but with a better rise time (convergence of actual speed to reference speed with minimum rise time), and zero Overshoot with zero static error in Fuzzy Logic controller, compared to the PI controller which present an overshoot equal 7.3980% and a rise time quite important. The effects of disturbances that appear clearly in the classical PI controller (where the vehicle is in a slope road phase 3). Table 6 shows the static and dynamic characteristics of all controllers. Table 6. Performances of the PI and FLC in the speed response Controller

PI FLC

Rise Time (sec)

Settling Time (sec)

Overshoot value

Peak Time (sec)

0.2157

0.3235

7.3980

0.3340

0.1153

0.1918

0.0023

0.5260 Articles

51


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

140 Speed with PI

Speed of Vehicle [km/h]

100

S lope

Speed with FLC

55 50

34 45

80

32 0.05

0.1

0.15

0.2

30 28

60

26 8.5

9

9.5

40

Table 7. Values of the vehicle resistive torque and aerodynamic torque in different phases

20 0

P ha s e 1

0

4

6

8

10

Phase

PI

speed Error with FLC

-50 0

2

4

6

8

10

a)

speed Error with PI

0 P ha s e 1

2

4

28.21

28.62

03

10.32

10.16

73.42

8

Fig. 13. Vehicle Linear speed (a) and error speed (b) variation in different phases using PI and FLC controllers a)

127.42

64.54

64.51

SOC with FLC

74.1 74 73.9

75.02

73.8

75

74

7.4

74.98 -0.1

0

0.1

Phas e 1

73.5 0

7.6

0.2

P ha s e 2

2

4

P ha s e 3

6

8

10

Time [s]

Aerody namics torque with FLC Aerody namics torque with PI

100

b) 30

50 78 76 74 72 70 68 66

40 35

0

30 25

20

4

4.1

0.06 0.08 0.1 0.12 0.14

-50

P ha s e 1

0

2

P ha s e 3

P ha s e 2

4

6

8

10

Time [s]

b)

6

6

4

4

2

2 0.2

10

0

0.4

8.4

Vehicle Resistiv e Torque with PI

8.6

Battery Power with FLC

-10 -20

Vehicle Resistiv e Torque with FLC

8.5

0 Battery Power with PI P ha s e 1

200

0

2

P ha s e 2

Time [s]

4

6

P ha s e 3

8

10

Fig. 15. State of charge SOC (a) and Power Battery (b) variation in different phase using PI and FLC controllers

150 100 130

90

50

128 126

85

124 80

0

122 0

-50

Battery Power [Kw]

Aerodynamic Torque [N.m]

82.63

127.68

74.5

150

Vehicle Resistive Torque [N.m]

82.91

10

Time [s]

0.1

4

0.2

0

2

4.1

P ha s e 2

P ha s e 1

4

6

4.2

P ha s e 3

8

10

Time [s]

Fig. 14. Aerodynamic torque (a) and vehicle resistive torque (b) variation in different phases using PI and FLC controllers 52

FLC

SOC with PI

75

6

PI

72.11

P ha s e 3

P ha s e 2

-50 0

FLC

01 02

0

Battery SOC [%]

Speed Error of Vehicle [km/h]

b) 50

Vehicle resistive Torque (TR)

Aerodynamics Torque (Taero)

P ha s e 3

P ha s e 2

2

Time [s]

50

2018

Aerodynamic torque is reduced with Fuzzy Logic control relatively with PI. 73.42 Nm with FLC and 72.11 Nm by PI (phase 2, see Figure 14 (a)). This value can be explained because of the large frontal area in the case of PI versus FLC. It can be seen that the overall resistive torque is improved in the FLC compared to the PI (See Figure 14 (b)). Table 7 summarizes this improvement.

a) 120

N° 3

Articles

Figure 15 (a) and (b) explain the variation of the state of charge and the power of the battery respectively of this driving cycle. Figure 15 (a) shows how the SOC in the lithium-ion battery (battery initialized to 75% at the start of the simulation). The latter varies with the driving cycle for control methods. The state of charge of the battery decreases rapidly at acceleration and on a slope. Energy consumption is low at Fuzzy logic speed controller relatively with PI. The SOC variation is between 75% and 73.83% (difference of 1.17%). Table 8 shows the variation


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

of SOC during the driving cycle. Figure 15 (b) shows the variation of battery power in different phases of travel where it can be seen that the battery provides approximately 5.24 Kw (PI controller) and 3.92 Kw (Fuzzy Logic controller) to achieve the desired speed in the first phase. The power is the same during phase 2 equal to 9.87 Kw with PI and 9.35 Kw with FLC. The power delivered (Figure 15 (b)) increases when in the slope of 10° (phase 3) is equal to 4.60 Kw by PI and 2.11 Kw with FLC. As a comparison, the Fuzzy Logic speed controller strategy reduces energy consumption compared to the PI. Figure 16 and Table 9 shows that the crossed distance by the vehicle is improved by the Fuzzy Logic speed controller compared to classical PI (541.84 m with PI and 552.66 m in FLC). Autonomy is increased by 10.82 m by the Fuzzy Logic speed controller. Table 8. Evaluation of Li-Ion battery SOC in different phases Phase

Begin Phase [sec]

End Phase [sec]

SOC [%] with PI

SOC [%] with FLC

01

0

4

74.57

74.67

03

7.5

10

73.64

73.83

02

4

7.5

73.82

Vehicle Distance Traveled [m]

250 240 230 220 210 200 190

300

550 3.5

4

4.5

545

9.96

9.98

10

REFERENCES

100 P ha s e 1

P ha s e 3

P ha s e 2

0 0

2

4

6

8

10

Time [s]

Fig. 16. The vehicle travelled distance in the different phase using PI and FLC controllers Table 9. Variation of battery power and distance travelled in different trajectory phases Phase

Battery Power Consumed [Kw]

Vehicle Driven Distance [m]

PI

PI

FLC

FLC

1

5.24

3.92

188.21

197.52

3

3.50

2.11

79.32

75.12

2

9.87

9.35

Acknowledgement

*Corresponding author

540

200

The research proposed in this paper has demonstrated the possibility of an improved four wheels vehicle stability which utilize four independent driving in-wheels for motion by using the Fuzzy Logic controller. The study of four wheels independent wheel control approach structure applied to the electric chain system using the intelligent speed control which ensure the driving on slope with high safety conditions. The results obtained by Matlab simulation proves that this structure permits the realization of Fuzzy Logic loop speed control which gives a good dynamic performances of electric vehicle. The proposed control, permits to control independently the driving in-wheels speeds with high accuracy in flat roads or curved ones in each case. The slope’s road does no. affect the performances of the driving motor wheels stability comparing with the PI classical controller.

Abdelkader Ghezouani*, Brahim Gasbaoui, Nourai Nair and Othmane Abdelkhalek – Faculty of Sciences and Technology, Department of Electrical Engineering Bechar. University B.P 417 Bechar (08000), Algeria. Jemal Ghouili – Department of Electrical Engineering, Moncton University (Canada).

Distance trav eled with FLC

400

5. Conclusions

AUTHORS

Distance trav eled with PI

500

2018

This work was supported by the Laboratory of Smart Grids & Renewable Energies (S.G.R.E).Faculty of Technology, Department of Electrical Engineering, and Bechar University, Algeria.

73.93

700 600

N° 3

274.31

280.02

[1] Kim J., Jung J., and Nam K., “Dual-inverter control strategy for high-speed operation of EV induction motors”, IEEE Trans. Ind. Electronics, 2004, vol. 51, no. 1, pp. 312–320. DOI: 10.1109/TIE.2004.825232.  [2] Gregory A. H., Kamal Y. T., “Modeling and simulation of a hybrid-electric vehicle drive train”. In: Proceedings of the American Control Conference, 1997, pp. 636–640.  [3] Baba A., “Optimisation du flux dans la machine à induction par une commande vectorielle: minimisation des pertes”, Thèse de Doctorat en Génie Electrique Pierre & Marie Curie Paris, 5–7 Janvier, 1997 (in French).  [4] Triqui N., “Motorisation Asynchrone pour Véhicule Electrique,” Institut Polytechnique de Lorraine Nancy Paris, 1997 (in French).  [5] Casadei D., Profumo F., and Tani A., “FOC and DTC: Two viable schemes for induction motor torque control,” IEEE Trans of Power ElectronArticles

53


Journal of Automation, Mobile Robotics & Intelligent Systems

ics (S0885 – 993), 2002, vol. 17, pp. 779–787.  [6] Buja G. S., Kaźmierkowski M. P., “Direct Torque Control of PWM Inverter Fed. AC Motors– A survey,” IEEE Trans Power Electronics, vol. 51, 2004. DOI: 10.1109/TIE.2004.831717.  [7] Takahachi I., Noguchi T., “A New Quick-response and High Efficiency Control Strategy of an Induction Motor” , IEEE Transaction on Industrial Applications, 1986, vol. 5, pp. 820–827.  [8] Depenbrock M., “Direct self-control of inverter-fed machine”, IEEE Trans. Power Electron, 1988, vol. 3, pp. 420–429.  [9] Heath H., Seth S. R., “Speed-Sensorless Vector Torque Control of Induction Machines Using a Two-Time-Scale Approach” , IEEE Transactions on Industry Applications, 1998 January-February, vol. 34, no. 1. [10] Kaźmierkowski M.P., “Control Strategies for PWM Rectifier/Inverter-Fed Induction Motors”, Industrial Electronics ISIE Proceedings of the 2000 International Symposium IEEE, 2000, vol. 1, pp. TU15-TU23. [11] Ghezouani A., Gasbaoui B., Ghouili J., Benayed A. A., “An Efficiency No Adaptive Backstepping Speed Controller Based Direct Torque Control”, Journal of Automation, Mobile robotics & Intelligent Systems, 2017, vol. 11, no. 1, pp. 56–63. DOI: 10.14313/JAMRIS_1-2017/8. [12] Husain I., et al., “Design, modeling and simulation of an electric vehicle system”, SAE Technical Paper Series, 1999, pp. 01–1149. [13] Ehsani M., et al., “Propulsion system design of electric and hybrid vehicle”, IEEE Trans. Ind. Electron, 1997, vol.45, pp. 19–27. [14] Gillespice T., “Fundamentals of vehicle dynamics”, Society of Automotive Engineers, ISBN 1-56091-199-9. [15] Hori.Y., “Future Vehicle Driven by Electricity and Control – Research on Four-Wheel-Motored UOT Electric March II,” IEEE Transactions on Industrial Electronics, 2004, vol. 51, pp. 954–962. [16] Aissaoui H., Abid M., Tahour A., and Zeblah A., “A Fuzzy Logic Controller for Synchronous Machine,” Journal of Electrical Engineering, 2007, vol. 58, no. 5, pp. 285–290.

54

Articles

VOLUME 12,

N° 3

2018


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Sliding Mode Control for Longitudinal Aircraft Dynamics Submitted: 20th July 2018; accepted: 18th September 2018

Olivera Iskrenovic-Momcilovic

DOI: 10.14313/JAMRIS_3-2018/18 Abstract: The control of the longitudinal aircraft dynamics is challenging because the mathematical model of aircraft is highly nonlinear. This paper considers a sliding mode control design based on linearization of the aircraft, with the pitch angle and elevator deflection as the trim variables. The design further exploits the decomposition of the aircraft dynamics into its short-period and phugoid approximations. The discrete-time variable structure system synthesis is performed on the base of the elevator transfer function short-period approximation. This control system contains a sliding mode controller, an observer, based on nominal aircraft model without finite zero and two additional control channels for the aircraft and for the aircraft model. The realised system is stable and robust for parameter and external disturbances. Keywords: aircraft dynamics, control, elevator, longitudinal, sliding mode

1. Introduction Aircraft dynamics characterizes the motion of an aircraft in the atmosphere. The response of the aircraft to aerodynamic, propulsive and gravitational forces, and to control inputs from the pilot determine the attitude of the aircraft and its resulting flight path [1]. In the past literature the special attention is dedicated the aircraft dynamics stability. The concept of aircraft dynamic stability studies what is doing with the aircraft in one time period, when it took out the balanced position. The longitudinal aircraft motion is the aircraft response on the disturbances [2]. To date, flight control widely uses linear control techniques. One of the reasons is the existence of numerous tools for assessing the robustness of the linear feedback controller [3]. Another reason is that flight control techniques are developed primarily for commercial aircrafts that are designed and optimized for flying along very specific trajectories [4]. However, in recent years, PID controllers have been used to improve the dynamic characteristics of the aircraft flight [5–7]. PID controllers are widely used, partly because they are effective and partly because of their simple structure and robust performance in a wide range of operating conditions [8]. Often, the fuzzy logic controller is used alone or to optimize the design of the PID controller [9–12].

Sliding mode control (SMC) is a nonlinear approach which is inherently robust against matched uncertainty [13]. The application of SMC to flight control has been pursued by several others authors [13– –19] The most commonly designed non-linear controller, which is designed based on the linearization of the aircraft. The design exploits the short-period approximation of the linearized flight dynamics [14]. Today, a controller is created based on combination of the traditionally PD controller and a sliding mode controller [18, 19]. All that control systems are obtained for continual time domen, while there is little attempt at this realization in a discrete domain [20, 21].

2. Longitudinal Aircraft Dynamics

The aircraft is a dynamic system influenced by control and external disturbance. The control is realized by correcting the position and path of the aircraft. In this way allows the aircraft motion in the desired direction. The transfer function of the aircraft can be obtained by using the equations of aircraft motion. The equations of aircraft motion for the aircraft can be derived by applying Newton’s laws of motion, After that linearization, the equations of aircraft motion are obtained the following ∑ form [22]: ∑

∑ where: m – mass of aircraft, ∑ , Fx – external forces in x direction, ∑ Fy – external forces ∑ in y direction, , Fz– external forces in z direction, ∑ ∑ Mx – rolling moment, My – pitching moment,∑ Mz – yawing moment, u – linear velocity vT in x direction, v – linear velocity vT in y direction, ∑ ∑ ∑

(1)

55


∑ ∑

Journal of Automation, Mobile Robotics & Intelligent Systems

The aircraft∑motion (1) can be divided into two parts: – aircraft longitudinal motion, ∑ – aircraft lateral motion. The aircraft should have a straight and balanced flight, which can be distributed by deflection of the ∑ elevator. This deflection changes My, causes rotation about y axis nd changes F ∑ x and Fz, but does not changes Mx, Mz, and Fy. Therefore, the following relations apply P = R = V = 0 so ΣFy, ΣMx and ΣMz equations can ∑ be eliminated. This leaves equations of aircraft longitudinal motion:

,

(2)

∑ ∑ Let the axes xE, yE and zE are earth reference axes. The axes x0, y0 and z0 are equilibrium aircraft axes and the axes x, y and∑z∑∑ are the distributed aircraft axes. Let us: ∑ – γ is the flight path∑angle, that is the angle, measured ∑ ∑ between in the vertical plane, , the horizontal and the velocity vector ∑ of the aircraft, – α is the angle ∑ of attack, that is the angle between the velocity vector and the wing chor ∑ axes xE and x in the vertical – Θ is the angle between ∑ , ∑ , plane, ∑ ), (4) – θ is the pitch ∑ angle, ∑ that is the angle between the equilibrium vector vector of velocity ∑ U0 and the , change u. ∑ The axis x is could be aligned with the longitudi∑ , nal axis of the aircraft. Making these substitutions the equations (2) become: ∑ ∑ ( )

( ) ∑

( )

( )=0

,

( )+

(3)

( )=0 − ∑ there are negligible perturbaLet's assume that tions of disturbances about the equilibrium state ∑ ( ) ( )=0 − and negligible angles between ∑ , the equilibrium and ∑ , disturbed axes. These assumptions allow for lineari∑ ), (4)Thus the equazation of aircraft longitudinal motion. tions (3) can be written ∑ as follows: ,

56

,

( ) ∑

( ) ,

∑ Articles

( )

, ,VOLUME ), (4)12,

N° 3

2018

∑ , ), (4) If the aerodynamic∑constants, of the aircraft C** will be defined and be aplied Laplace transform, the longitudinal equations of motion for the aircraft (4) can be obtained as: ( ) ( ) ( )=0 ( ) ( ) ( )=0

w – linear velocity vT in z direction, P – angular velocity ω in x direction, Q – angular velocity ω in y direction, R – angular velocity ω in z direction, Ix –moment of inertia in x direction, Iy –moment of inertia in y direction, Iz –moments of inertia in z direction, ∑ Iz. Jxz – product of Ix and

∑ ∑ ∑ ∑

∑ ∑

), (4)

)=0

(4)

( )=0

( )+

( ) ( )

− −

( )+ ( )+

– –

( )

( )

( )=0 ( )=0

(5)

( )=0

( )=0

The characteristic equation of the system (5) is:

)=0 ) =0 where: ωzp, ωzs – natural frequencies, εp, εs – damping factors. There are two types of oscillations: – the short-period oscillations with (Fig. 1), – the phugoid oscillations (Fig. 2). The periods and the damping of these oscillations varies from aircraft to aircraft because they depend on flight conditions’ oscillations.

Fig. 1. Short-period oscillations

Fig. 2. Long-period oscillations The short-period oscillations cause a change in the αn and θ with negligible change un and the phugoid oscillations cause a change in the θ and un with negligible change αn. Phugoid oscillations represent an change of potential and kinetic energy. In the beginning, the aircraft has a sinusoidal flight path in vertical plane. When an aircraft flying from the highest point of the path down, it increases the speed to the lowest point, and when it flies up to the highest point of the path, it reduces the speed. This is repeated until an even flight is established. The phugoid oscillations period is very large so the flight of the aircraft can be successfully carried out.

3. Aircraft Elevator Control

Longitudinal aircraft dynamics is controlled by elevators. They are flight control surfaces, usually at the rear of an aircraft, which control the aircraft’s pitch, and therefore the angle of attack and the lift of the wing. The aircraft is considered to be in straight and level non-accelerated flight and then to be distributed by deflection of the elevator. By aircraft longitudinal motion (Fig. 3) the control magnitudes are: α – angle of attack, θ – pitch angle, u – variation of flight velocity


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

along the longitudinal axis x, and control input is: δe – elevator deflection.

Fig. 3. Longitudinal aircraft motion Based on the aircraft aerodynamic constants values [22] is obtained the transfer function of the aircraft elevator:

=

=

(

−1.31 (

(

)

)

−1.31 (

(6)

)

)

The transfer function (6) can be approximated ) with the function: = −1.39 ( =

−1.39 (

)

(7)

The short-period approximation (7) is particularly good in the vicinity( of ) =the natural frequency of shortterm oscillations. All this allows us to use this function for the realization of aircraft elevator control. 0.605 ≤ ≤ 1.525 ( )= 0.605 ≤ ( ) = 0.1 ℎ(

)

( ) = 0.1 ℎ(

)

( )=

( )= Fig. 4. Aircraft elevator control ( )=

≤ 1.525

= 1.39

In Fig. 4. given a block diagram of the system for aircraft elevator control, which will be analyzed. The discrete-time( variable structure system synthesis is )= = 1.39 ) elevator transfer funcperformed on the base of (the tion short-period approximation (7), which has stable finite zero. This control system contains a sliding mode controller, an observer, based on nominal air0 1 ( ) 0 and two0 additional craft model without 0 0 finite zero 1 0 control channels for the aircraft the aircraft 0 −1.325 −0.805and for 1.39 model. In the aircraft channel the in0 control 1 0 is introduced 0 tegrated (I) action, and in the observer control chan0 0 1 0 nel proportional-integral (PI) parameters ( ) ( )1 action. The1.39 0 −1.325 −0.805 of the action in the observer control channel are chosen so that the total nominal transfer function of the 0

( ) 1

( ) 0.002

0

N° 3

2018

control signal to the output of the aircraft and the observer, without feedback the observation error, identical. In addition, in order to achieve better robustness to external disturbance in the observation error channel is introduced also linear (PI)2 action. More specifically, (PI)2 action is introduced for increasing the ability of the observer to observed changes slowly disturbances f(t). Except that, in addition to acting on the observer is carried out and further action on the input of the plant from the observation error channel. Without this action, the system is very slow to release external disturbances or may occur oscillation. In fact, the introduction (PI)2 action in the observer control channel and I action in the plant control channel was increased the equivalent plant order in n + 1. Sliding mode is organized in order subspace and the design process VSC introduced PI action is not essential [15]. In the considered system, due to the presence and activities in front of the plant is irrelevant the discontinuity of the control signal. Breaking control is integral and it becomes constant at the input of ) ( −1.31 the plant, problems associated with stable ) ( = and all the −1.31 ) = ( breaking control therefore zero and are not relevant. ) ( −1.31 ( ) It is assumed=that are ( the aircraft parameters ) non-stationary, but with the speed of change is much smaller than the dynamics of the process that takes place in a control system. In order to validate the pro( ) −1.39 ( structure ) control law = of −1.39 posed combination variable = ) −1.39 ( with flexible working regimes = of linear control law and PI-type discrete-time VSC is designed and sim) as a third order ( aircraft −1.31 ulated on the PC to control = ( ) plant with stable finite stable zero (6):

0.605 ≤ 0.605 ≤

( )= ( )=

=

−1.39 (

0.605 ≤

( ) = 0.1 ℎ( ( ) = 0.1 ℎ(

( )=

≤ ) 1.525 ≤ 1.525

(8)

≤ 1.525

) ) ( ) = 0.1 ℎ( ) At the input of the plant is introduced I action, so that the extended transfer function of the plant becomes: ( )= ( )= ( )= 0.605 ≤( ) = ≤ 1.525

( ) = 0.1 ℎ( ) ( )= = 1.39 ( ) =aircraft can be seen as =a1.39 The extended plant with( ) = = 1.39 out stable finite zero, which is added to PI action with constant parameters. In addition the expanded aircraft is introduced as the reduced transfer function of ( )= ( ) ​​of the parameters withaircraft with nominal values ( ) out finite zero. This reduced a ircraft(describes the ) transfer function:

0 1 0 0 0 0 0) = 1 0 1 0 (0 = 1.39 (9) 0−0.805 0 0 0 1 1 0 0 −1.325 1.39 0 1 0 0 −1.325 0−0.805 1.39 0 −1.325 −0.805 1.39 Continuous model of reduced aircraft (9) in the canonical controllable form: ( ) ( ) ( ) 57 Articles ( ) ( ) ( ) ( )


( )= (( )) = =

�� �(�) + 0.40008 � � �(�) = −20008 �� + 0.09996 �� (�) �−0.20008 �� + = min (�) −0.40008 −0.09996 |�(�)|, |�(�)| � = −� ( �25000 50 + 20 (�) � = −� � �(�) �� ��� ���(�) �� |�(�)| − min � ,�(�) � + �|�(�)|� ���(�(�)) VOLUME 12,0.09996 N° 3 �2018 �(�) = −20008 + 0.40008 �� + |�(�)| � (�) � |�(�)| ���� ��� min � , � + �|�(�)|� ���(�(�)) − min � , � + �|�(�)|� ���(�(�)) (�) (�) �− = 200 = 5 � ��� � = −�� ��� � ��� �(�) �� (�)

= 1.39 = = 1.39 1.39

Journal of Automation, Mobile Robotics & Intelligent Systems

where:

( )=

( ((

) ))

= 1.39

(10)

0 1 0 ( )0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 −1.325 −0.805 1.39 0 0 1 0 0 −1.325 −0.805 1.39 0 −1.325 −0.805 1.39 This reduced model of the plant can be realized 0 1 0 0 by computer (discretly). According to the theorem on 0 0 1 0 selection, was chosen sampling time T = 0.4 ms. By 0 transformation −0.805 1.39 (−1.325 ) ( ) for the selected δ applying the sam( ) ( ) ( ) (model ) pling time, a discrete-time of the system (10) becomes [23]:

( ) ( ) (11) 0 1 0.002 0 1 0.002 0 0 0 0.9996 0.002 0 1 0.002 0 0 0.9996 0.002 0 −1.3249 −0.7999 1.3899 0 0 0.9996 0.002 0 1.3899 0 −1.3249 −1.3249 −0.7999 −0.7999 1.3899 0 1 0.002 Let the sliding hyperplane defined by [24]: 0 0 0 ( ) 0.9996 0.002 (( )) 0 −1.3249 −0.7999 1.3899 (12) where:

where:

( 1) � ���� − �� = � ���� − 1�� > 0� = 1,2, � � > 0� = 1,2, � �� = ����� � �−1 � � �� 0� �� =��1,2, � �� =�� = � �� � >��� � ��� �� ����=� � � � � �� 1 � � −1 � � ��� �� �= ��� ����� >� ���� 0� ��=� 1,2, � = ����= � � � � 1����0�1 � � � � �� � �� �� � � � = � � � � � � 0� � � � � � 1� �������0��1� 01 � ��� �� = � � � � � 1 0 0 � = ��� �� �� �� �� � ���� 1 0� – coefficients of the characteristic 1 ��1� ��0� polynomial 0 � � � � ) det( det(�� . � 1+ ���0.� ) − � = � + � � � + � � � = ��� � �� �� � �� � � � � �� � � � � chosen, + ��� � � 1 + ���elements �0 + �0 �� . of the If α1 =det(�� 1 and−α2�=� )1=are the � � ) det(�� − � = � + � � � + � � � + � �� . vector cδ (12) become: � � � � �� = �−0.20008 −0.40008 −0.09996 1� � ��)�� − −0.40008 � � ��det(�� = ��−0.20008 −0.09996 − � = � + � � � + � � � + � � . � � � � � = > 0� = 1,2, � � � � � = �discrete-time −0.20008 −0.40008 For��the model of −0.09996 reduced aircraft �� �����(�) =� control −������ �(�) ��� = �� (11) can be synthesized in the form: ��� (�) =−0.40008 −�� �� �(�)−0.09996� �� = �−0.20008 ��� ��� 1 |�(�)| (�) � = −� � �(�) �� ,�� + �|�(�)|� − min � =���|�(�)| �� �� ��� ��� �� ������(�(�)) 1 0� � � − min � , � + �|�(�)|�� ���(�(�)) |�(�)| � 1 0 0 , �=+−� �|�(�)|� ���(�(�)) (13) − min � ��� (�) � �� �(�) � �(�) = −0.208 |�(�)| �� (�) − 0.20048�� (�) (�) (�) � � = −0.208 � − 0.20048� , � + �|�(�)|� ���(�(�)) −�(�) min � � � −� �numbers � + ��� � that + �� 0 ≤ �+� �� . �) = where: + α,det(�� β – (real βT < 1 and |�(�)|,such min 25000 50 + 20 �|�(�)| =(−0.208 �� (�)Control −50 0.20048� � (�) |�(�)|, |�(�)| +α < 50 min 25000 + 20 α < 0. Let�(�) and β < 50. (15) is: (�) �(�) = −20008 �(�) + 0.40008 � + 0.09996 |�(�)| ��� (�) + min ( 25000 50 ��+ 20 �(�) �=�(�) −20008 �(�) +|�(�)|, 0.40008 + 0.09996 � � (�) (�) = −0.208 � − 0.20048� � � = −0.20008 −0.40008 −0.09996 � � � �(�) = −20008 �(�) + 0.40008 �� + 0.09996 �� (�) |�(�)|, 50 + 20 |�(�)| + min ( 25000 ���� ��� ���� (�) = 200 ���� �� (�) = 5 ��� � 200 = 5 � �� (�) 0.40008 ���(�) + 0.09996 (�)+= ��(�) −�� ��� �(�) ��� (�) = �� �(�) =�−20008 � � ���� ��� ���� (�)|�(�)| = 200 �� (�) = 5 � � − min � , � + �|�(�)|� ���(�(�)) ���� ��� ���� (�) =�200 �� (�) = 5 4. An Illustrative Example � � (� − 1) 1.6 10���� (�) = � (� − 1) 1.6 10 To���ensure �the of2.9984� the system to parame�+ −robustness 2.9992 − �0.9992 (�) (�) ==� −0.208 ���(�) −��0.20048� �����(�) � −plant �the ter changes of �the and external disturbance, (� 2.9992 + 2.9984� − 0.9992 − 1) 1.6�10 ��� (�) = (�25000 6.4� 10 � |�(�)|, |�(�)| ���� + min 50 + 20 is introduced the2.9992 feedback signal observation ��� (�) = by − � + 2.9984� − 0.9992 er� ���� 10 � − 2.9992 � �6.4 � �� + 2.9984� − 0.9992 (�) =�the (� rors� between outputs of10 the plant and the model, − 1) 1.6 ��� ��� � � �(�) −20008 �(�) + 0.40008 �� + 0.09996 �� (�) �� − 2.9992 �6.4 +10 2.9984� − 0.9992 = � �= (�) �D as shown in Fig. 5. (PI) and activities were chosen (�) 2 = ���� − 2.9992 � + 2.9984� − 0.9992 � ��� � � − 2.9992 � + 2.9984� − 0.9992 in the following� form: � ��� − �10 �(�) �= 1 + ���� (�)�� (�) 6.4 ����(�)���� (�)(14) ���� ��� (�)1 (�) = ����� (�)� (�) (�)���� (�)(14) − � �(�) = + � � � 5 0.9992 � ��� � �=��� −200 2.9992 ��� � (�) =− � � + 2.9984� � � �(�) = 1 + ���� (�)�� (�) − ���� (�)���� (�)(14) � ��� �(�) = �� � �+ ���� � ���+ � + �� � + �� (15) �+(�)� �(�) = �� � +(�)� ������(�) − +� ���� �� � + �� (15) �(�) ��� (�)(14) 58 Articles= 1 + ���� � ��� �(�) = �� � + ���� � + � + �� � + �� (15) 1.6 10�� (� − 1) � �

���� ��� the For�(�) the=selected time (�) T = 0.4 ms, −0.208 �sampling − 0.20048� |�(�)| � (�) � (�) �min = 200 =5 �� ���(�(�)) ��� (�) − � , � + �|�(�)|� � � (�) (�) �(�) = −0.208 � − 0.20048� transfer function (8) and− 0.20048� (9) are: (�) �(�) = −0.208 ��� 50 � |�(�)|, + min ( 25000 + 20 |�(�)| �� (�)

|�(�)|, + ((25000 50 + 1) 20 |�(�)|, |�(�)| +min min 25000 50 20|�(�)| 1.6 10�� �(�) = −20008 �(�) + 0.40008 ��(� +−+ 0.09996 �� (�) ���� (�) = � � +− (�) (�) �(�) = −0.208 � 0.20048� �(�) = −20008 �(�) + 0.40008 � + 0.09996 − 2.9992 � 2.9984� − 0.9992 � � � (�) �(�) = −20008 �(�) + 0.40008 ��� + 0.09996����(�) �� ��� (� − 1) 1.6 10 |�(�)| 6.4 1050 + 20 ��� min ( 25000 |�(�)|, � ���+ � (�)(�) = = � � −���� �� �� (�) = 5 � ��� � 2.9992 � + 2.9984� 0.9992 ��� (�) =� �200 − 2.9992 � + 2.9984� −�− 0.9992 � ���� ��� ���� ��� (�) �(�) =�−20008 0.40008 ��(�) + 0.09996 (�) =�(�) 200 + � ���� = 55 � �� � �� (�) = ��� ��� (�) = 200 �� 6.4 10 � � The�system be stable, it is necessary and ��� (�) =would � � − 2.9992 � � +�2.9984� − 0.9992 sufficient the equation the sys(�)������ (�) − ���� (�)����of (�)(14) �(�) = that 1+� ��� ���characteristic ���� (�) = 200 �� (�) = 5 tem: � �� � 1.6 10 (� − 1) � (�)� (�)(14) (�) = �����(�) �� (�)� (�) �� − � = 1 + � 10 ��� ��� ���− (� − 1) 1.6 10 (� (14) � ��1.6 + 2.9984� −1) 0.9992 � � �−� 2.9992 �(�) = � (�) = + � � ��� +� � + �� � + �� (15) � ���� ��� (�)� = � �� − ��� � 2.9992 � + 2.9984� − 0.9992 ��� � + 2.9984� − 0.9992 � − 2.9992 6.4unit 10 has �all its roots inside the circle |z| = 1 in the z ���� (�) = � ��� ��� � 6.4 10 � ��� �� – plane. Checking the root of the characteristic equa6.4 � �−� �2.9992 � 1.6 �� = �(�) ���� �+ 2.9984� + 10 � �1) � + �� (15) (�+−− 10 �0.9992 (�) = � �� �� �+��� (�)be = ���� ��� � (�) ��� = � � tion (14) can done by applying some of the criteria − 2.9992 �+ � , � =���0, � .2.9984� �−1 − �� = �� ���� − −�2.9992 2.9992 +2.9984� −0.9992 0.9992 + 2.9984� − 0.9992 � � of stability, for example by applying��� Jury stability test. � 10 6.4 � ������ ����� (�) −(14) (�)����written (�)(14)in the � �(�)characteristic = �1 + ����� � (�)� The ���can = � ��equation ������=(�) ��� = � , � = 0, ���.�.�be −−21 � , � = 0,� �(�)� � � � � (�)� (�)(14) − + ��� � (�) − 2.9992 � + 2.9984� − 0.9992 (�)� (�) (�)� (�)(14) − � �(�) = = 11��� +�� ����� ��� ��� form:�(�) ��� � ��� ��� ������ ���� ������ � ,+ 0,0, =������+ ������ ��� ��, � �==+ 2 � �= ��� �(�) �= �����. �.+�−�−��(15) (15) ��� ��� ��� � (�)� (�)(14) �� (�)� (�) ��� ��� − � �(�) = 1 + � �(�) = � � + � � + � + � � + � (15) ��� � ��� ��� �(�) = � � + � � + � + � � + � (15) � ��� � � � ��� � � Now it�gets Jury’s form �� ��coefficients �� in ��the �� �scheme � �� ����� � = � �, � = � �, � = � � � = � � , � = 0, � . � − � � � � � � � � � � � [25]: � � � ���� � � � � � ���� � �� ���� �� , � =� 0, ��. � − 1 � � �� = � ���� �� ��� + � ���� �(�) = �,���� +� �,� (15) ��� ���1�+ ��� �� = ����� = �, � = � ��.+ =�− � � � = =� ����� � ��� �� �, � =�0, 0,� � .�� − �1� �� � � ������ ����(1) � � � > 0, �(−1) �(−1) , � = 0, � . � −>20 �� = �� ��� ���� �� ������� ����� ���� = = �� �� ���� �= = 0, 0,� �..��− −22 (16) ���� � ,���,,�= ���� ����=����� � 0, � . � − 1 �������� �(1) >�0, (−1) �(−1) > 0 �, � �� = �� � = 0, � . � − � ��� ���� �� ������� ����� ������ ���� = = ���� = 0, 0,� �..��− −�� � ��,,�� = ���� =��������� ��. � −��2 ��� ������ ���, � = 0, � ��� � �� = �� � �, ��� = �� �� �, ��� = �� �� � � � ��� � � �� �� � ��� � ��� � ��� ���� = = ��� �,�,���������� = ����� � = ����� ��� �,�,���� = = ����� ��� �� � � � � � �� = �� � � , � = 0, � . � −� � � ���

�� �>� 0, (−1)����(−1) �� > 0 �� �� �(1) The �� necessary = �� � and �, �� sufficient = �� ��� conditions �, �� = �� that � equa> 0, (−1) �(−1) � �(1) � � ��(−1) > � 0, (−1) less >0 0� �and tion (15) has all�(1) roots>modulo than one that

the system is stable are:

�(1) > 0, (−1)� �(−1) > 0 (17) |�� | < ��� �, |�� | > ����� � |�� | < ��� �, |�� | > ����� � |��|| < |���� |, |�| �> |> > �� � |� |� �, |� ��|� ��� |��� | > |����� |, �|�� | > |� �| The characteristic equation errors of this system |�� | > |���� |, |�� | > |�� | (15) is � �� − 2.9992� �� � 2.998408� �� � − 2.9992� � 2.998408� −0.999216256� 2.998408� �� 0 � � − 2.9992� ���0.00000128 −0.999216256� � 0.00000128 �0 (18) −0.999216256� � 0.00000128 � 0 Based on the relation (16) is obtained Jury’s 0.007192 > conditions 0 7.996824(17) > 0 are scheme coefficients and 0.007192 > 0 7.996824 > 0 |0.000000128| <1>0 0.007192 > 0 7.996824 |0.000000128| < 1 | 0.001567| > |0.001559||−0.999999| |0.000000128| < 1 > |0.999216| (19) | 0.001567| > |0.001559||−0.999999| > |0.999216| | 0.001567| > |0.001559||−0.999999| > |0.999216| Since all the conditions (19) are met and the characteristic equation (18) has all roots modulo less than one, was the stability of the circuit of observation error. Simulation results are presented in the form of a diagram step response of the aircraft (Fig. 6 and Fig. 7), control (Fig. 8) and switching functions (Fig. 9). Computer simulation shows that the system is robust when changing the values ​​of the parameters of the aircraft in the given boundaries (Fig. 7). It also has good properties of eliminating e x ternal disturbances (Fig. 6).


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

5. Conclusions

Fig. 5. Step response of nominal aircraft with load disturbance

It presented a new sliding mode control design for longitudinal aircraft dynamics. The design exploits the short-period approximation of the linearized aircraft dynamics. The control has a very simple structure: a sliding mode controller, an observer, based on nominal aircraft model without finite zero and two additional control channels for the aircraft and for the aircraft model. The robustness of the method to modeling uncertainty and disturbances, was demonstrated through extensive simulation, and the simulation results showed that the method outperforms, without any scheduling requirement, the transient and steady-state performance of a conventional gain-scheduled model-following controller. The realised system is stable and robust for parameter and external disturbances.

AUTHOR

Olivera Iskrenovic-Momcilovic – University of Novi Sad, Faculty of Education, Sombor, Serbia, E-mail: oljkaisk@yahoo.com

REFERENCES Fig. 6. Step response of aircraft for different values of parameters

Fig. 7. Control of the nominal aircraft with load disturbance

Fig. 8. Switching function of the nominal aircraft with load disturbance

[1]

D.A. Caughe, “Introduction to aircraft stability and control course notes for M&AE 507”, Sibley School of Mechanical & Aerospace Engineering Cornell University Ithaca, New York, USA. 2011. [2] P. Kvasnic, “Visualization of aircraft longitudinal-axis motion”, Computing and Informatics, vol. 33, 2014, 1168–1190. [3] C. Roos, C. Doll, J.M. Biannic, “Flight control laws: recent advances in the evaluation of their robustness properties”, Journal of Aerospace Lab, vol. 4, 2012, 1–9. [4] D. Pucci, “Analysis and Control of Aircraft Longitudinal Dynamics with Large Flight Envelopes“, Journal of Latex class file, vol. 14, no. 8, 2015, 1–16. [5] S.N. Deepa, G. Sudha, “Longitudinal control of aircraft dynamics based on optimization of PID parameters”, Thermophysics and Aeromechanics, vol. 23, no. 2, 2016, 185–194. DOI: 10.1134/S0869864316020049. [6] G. Sudha, S.N. Deepa, “Optimization for PID control parameters on pitch control of aircraft dynamics based on tuning methods”, Applied Mathematics and Information Sciences, vol. 10, no. 1, 2016, 343–350. DOI: 10.18576/amis/100136. [7] N. Ives, R. Pacheco, D. Castro, R. Resende, P. Américo, A. Magalhães, “Stability control of an autonomous quadcopter through PID control law”, International Journal of Engineering Research and Application, vol. 5, no. 5, 2017, 7–10. [8] M. Salem, M. Ali, S. Ashtiani, “Robust PID controller design for a modern type aircraft including handling quality evaluation”, American Journal of Aerospace Engineering, vol. 1, no. 1, 2014, 1–7. DOI: 10.11648/j.ajae.20140101.11. Articles

59


Journal of Automation, Mobile Robotics & Intelligent Systems

[9]

[10]

[11] [12] [13]

[14] [15] [16] [17]

[18] [19] [20] [21]

60

R. Zaeri, A. Ghanbarzadeh, B. Attaran, Z. Zaeri, “Fuzzy logic controller based pitch control of aircraft tuned with Bees algorithm”. In: Proceedings of the 2nd International Conference on Control Instrumentation and Automation (ICCIA), Shiraz University, Iran, 2011. DOI: 10.1109/ICCIAutom.2011.6356745. V.G. Nair, M.V. Dileep, K.R. Prahalad, “Design of fuzzy logic controller for lateral dynamics control of aircraft by considering the cross-coupling effect of yaw and roll on each other”, International Journal of Computer Applications, vol. 47, no. 13, 2012, 43–48. DOI: 10.5120/7252-0368. K.D.S. Raj, G. Tattikota, “Design of fuzzy logic controller for auto landing applications”, International Journal of Scientific and Research Publications, vol. 3, no. 5, 2013, 1–9. S. Udayakumar, R. Kalpana, “A supporting fuzzy logic controller based on UAV navigation”, International Journal of Multidisciplinary Research and Modern Education, vol. 1, no. 2, 2015, 208–210. J. Liu, X. Wang, Advanced sliding mode control for mechanical systems design – Analysis and MATLAB simulation, Springer, Berlin, 2011. DOI: 10.1007/978-3-642-20907-9. S. Seshagiri, E. Promtun, “Sliding Mode Control of F-16 Longitudinal Dynamics”. In: Proceedings of the American Control Conference, Washington, USA, 2008. H. Alwi, C. Edwards, “Fault tolerant longitudinal aircraft control using non-linear integral sliding mode”, IET Control Theory & Applications, vol. 8, no. 17, 2014, 1803–1814. DOI: 10.1049/iet-cta.2013.1029. L. Melkou, A. Rezoug, M. Hamerlain, “PID-terminal sliding mode control of aircraft UAV”, Proceedings of the UKSim-AMSS 8th European Modelling Symposium. Pisa, Italy, 2014. DOI: 10.1109/EMS.2014.97. N.B. Ammar, S. Bouallegue, J. Haggege, “Modeling and sliding mode control of a quadrotor unmanned aerial vehicle”. In: Proceedings of the 3rd international conference on automation, Control, Engineering and Computer science (ACECS’16), Hammamet, Tunisia, 2016. M.T. Hamayun, C. Edwards, H. Alwi, A. Bajodah, “A fault tolerant direct control allocation scheme with integral sliding mode”, International Journal of Applied Mathematical and Computer Science, vol. 25, no. 1, 2015, 93–102. DOI: 10.1515/amcs-2015-0007. C. Aguilar-Ibañez, “Stabilization of the PVTOL aircraft based on a sliding mode and a saturation function”, International Journal of Robust and Nonlinera Control, vol. 27, no. 5, 2017, 843–859. S. Lona, A. Kumar, “Discrete sliding mode control for the lateral dynamics of a UAV with minimum control surfaces”, International Journal of Industrial Electronics and Electrical Engineering, vol. 4, no. 1, 2016, 46–50. S. Govindaswamy, T. Floquet, S.K. Spurgeon, “Discrete time output feedback sliding mode tracking control for systems with uncertainties”, International journal of Robust and Nonlinear Control, vol. 4, no. 15, 2014, 2098–2211.

Articles

VOLUME 12,

N° 3

2018

[22] H.J. Blakelock, Automatic control of aircraft and missiles. John Wiley & Sons, New York, USA, 1985. [23] G. Golo, C. Milosavljevic, “Robust discrete-time chattering-free sliding mode control”, Systems and Control Letters, vol. 4, 2000, 19–28. [24] C. Milosavljevic, “Variable structure systems of quasi-relay type with proportional-integral action”, Facta universitatis: Mechanics, Automatic Control and Robotics, vol. 2, no. 7, 1997, 301– –314. [25] R.M. Stojic, Digitalni sistemi upravljanja (Digital control systems), Publ.: Nauka, Belgrade, 1990 (in Serbian).


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

Comparative Study of ROS on Embedded System for a Mobile Robot Submitted: 16th August 2018, accepted: 25th October 2018

Min Su Kim, Raimarius Delgado, Byoung Wook Choi

DOI: 10.14313/JAMRIS_3-2018/19 Abstract: This paper presents a comparative study of Robot Operating System (ROS) packages for mobile robot navigation on an embedded system. ROS provides various libraries and tools in developing complex robot software. We discuss the process of porting ROS to an open embedded platform, which serves as the main controller for a mobile robot. In the case of driving the robot, ROS provides local path planners such as the base, elastic band, and timed elastic band. The local planners are compared and analyzed in terms of accuracy in tracking the global path conducted on a robot model using Gazebo, 3D simulation tool provided by ROS. We also discussed the difference in performance of deploying ROS packages on a personal computer and on the embedded environment. Experiments were performed by controlling two different mobile robots with results showing that tracking error is highly dependent on the goal tolerance. This study will serve as a promising metric in improving the performance of mobile robots using ROS navigation packages. Keywords: Robot Operation System, mobile robot, embedded system, navigation, SLAM, path planner

1. Introduction Mobile robots are widely used in various fields, especially in scientific, industrial, and governmental sectors. However, the combination of devices and software are getting more complex and difficult to develop. Robot Operating System (ROS), one of the most popular robotic framework, is an open source meta OS that provides control algorithms and supports different hardware devices for mobile robot operation [1]. ROS focuses on software modularization and easy redistribution. As a result, the development of robots has become easier and innovative software are conveniently shared within the community [2], [3]. Development time and expenses are very essential factors to consider in robot distribution. In the case of commercial robots, cheaper price has been proven to improve market stability and helped in increasing sales [2]. On the contrary, expensive robots have trouble in selling and are very hard to access. In this paper, we utilize a low cost and high efficient open embedded platform for the main controller of mobile robots using ROS [4]. This minimizes the development costs

and enhances portability as embedded systems are cheaper and smaller in comparison to the widely used personal computers. However, software development on an embedded environment is difficult because all the software must be compatible with each other and with the embedded platform itself. The availability of technical documents and support is very limited. This paper provides the detailed procedure of successfully porting ROS on a Raspberry Pi 3 (RPi3), one of the leading open embedded platform used in robotic applications [5]. Controlling a mobile robot requires considerable amount of computation. ROS provides a navigation package that contains global and local path planning algorithms [4]. The local path planners included in ROS are defined as base [6], elastic band (EBand) [7], [8], and timed elastic band (TEB) [9]. Simultaneous localization and mapping (SLAM) is made easier with ROS [10]–[12].To utilize the navigation package, distance sensors is attached to a mobile robot such as a laser rangefinder (LRF) to detect the obstacles within the environment and measure the distance between the detected obstacle and the robot to perform an avoidance scheme. In this paper, the performance of the local planners is analyzed and compared in terms of tracking a given global path on a robot model designed using Gazebo, the 3D simulation tool included in ROS. We designed a robot model based on a commercial mobile robot using SolidWorks. As Gazebo requires high 3D graphics that could not be supported by the RPi3, simulations were performed on a desktop computer. Actual driving experiments were conducted on two commercial mobile robots. The RPi3 serves as the main controller, responsible for acquiring sensor data,measuring the position of the mobile robot within the environment, calculating the distance between the robot and the obstacles, and driving the mobile robots.

2. ROS Basic Concepts

ROS is often called a “meta” OS. Although ROS is not a traditional operating system, it provides a variety of services [1]. ROS processes are called nodes which are independent with each other managed by a master node. Message passing between nodes is classified into: a topic or a service. Unlimited number of nodes can either subscribe or publish on the same topic. Topics are

61


Journal of Automation, Mobile Robotics & Intelligent Systems

usually used for continuous data streams such as sensor data and robot status. On the other hand, a service only provides communication between a host and a client service node. It is recommended to use services for remote procedure calls that terminates quickly. A robot software based on ROS is divided into hardware-independent and device-specific parts as shown in Fig. 1. The hardware-independent part is composed of ROS core, other ROS native software, and algorithms shared by different developers to the ecosystem. The device-specific part contains the local information of the robot, which includes the connected sensors, kinematics, and other necessary information to operate the robot [5]. The only task of a user is to create the devicespecific node according to the specifications of the robot in hand. The hardware-independent part does not require any modification on the code itself, the user is only advised to change the configurations according to the required functionality.

VOLUME 12,

N° 3

2018

ed by the user, which supplies the odometry information, facilitates the sensor stream, and processes velocity commands sent to the physical components of the robot [4], [13]. The navigation stack should be configured in accordance to the physical characteristics and dynamics of the mobile robot to perform at a high level. The software architecture of the navigation stack is shown in Fig. 2 [13]. The main component of the navigation stack is move_base package, which consists of software for path planning, map building, and recovery behaviors in case the robot gets stuck. The urg_node is a node created to acquire sensor data from an LRF and publishes it as a topic called ‘/scan’. The mobile_node is a device node for a specific mobile robot as mentioned earlier.

Fig. 2. Software architecture of the navigation stack

Fig. 1. An Example of using Hardware-Independent Software and Device-Specific Drivers As an example, a device-specific robot driver can be used with a variety of hardware-independent ROS packages such as teleop_twist_keyboard and move_base. The teleop_twist_keyboard is a package for remote operation using keyboard, and the move_base is a package for navigation. Both packages publish messages which contain linear and angular velocities of mobile robot. The device-specific robot driver receives these messages and converts them to joint space velocities for actual operation of the mobile robot. The speed of the point space for robot operation varies depending on the robot’s kinematics, so the related software should be changed accordingly, but in ROS, adding a small device-specific software can utilize various hardware-independent software without modification. Some robot manufacturers provide the device-specific node for easier handling of their users [5]. Otherwise, you have to create your own.

3. ROS Navigation Stack

62

The navigation stack is a hardware-independent software included in ROS to simplify navigation control of mobile robots. Adevice node should be creatArticles

Navigation stack operates in several steps for the robot to reach a specific position within the environment. The move_base acquires data from the ‘/scan’ topic of urg_node and creates a global and local costmap which calculates the position of the mobile robot and the obstacles. The global planner generates the shortest path available for the mobile robot to reach the target position and the local planner is responsible for tracking the global path. Velocity commands generated by the local planner is published as the ‘/cmd_vel’ topic and is subscribed by the mobile_node. Feedback control is realized with the mobile_node calculating the position of the mobile robot using encoder data and publishing it as the ‘/tf’ and ‘/ odom’ topics received by the move_base.

3.1. Global Planner

The navigation stack contains of both global and local path planners. The global planner calculates the shortest available path from the current position of the mobile robot to the specified target position.However, the actual path that the robot is that of the local planner. Thus, this paper focuses more on analyzing the tracking accuracy of the mobile robot with the different local planners available in ROS. The global planner is configured with the default parameters as explained in [4], [6], [14].

3.2. Local Planner

The local planner tracks the global path and performs feedback control considering the actual position and movement of the mobile robot within the environment. There are several types of availalble local planners, including the base local planner [4], [6], [15], elastic band (EBand) [7], [8], and timed elastic band (TEB) [9].


Journal of Automation, Mobile Robotics & Intelligent Systems

3.2.1. Base Local Planner Fig. 3 shows the basic concept of the base local planner. Possible trajectories in the velocity space are generated discretely sampled. Forward simulation is performed for each sampled trajectory for a short period to predict the outcome. The simulation results are scored according to metrics that incorporates characteristics such as proximity to obstacles, proximity to the target position, proximity to the global path, and speed. Trajectories that fails to meet any of the metrics are discarded. The trajectory with the highest score is selected and is published as the velocity commands for the mobile robot [6]. These steps are repeated until the mobile robot reaches the target position.

VOLUME 12,

N° 3

2018

3.2.3. TEB Local Planner Fig. 5 is a planner supplemented by adding temporal parameters to the Eband approach. As a whole, it follows the characteristics of EBand and optimizes every moment of trajectory deformation and minimizes the target cost function instead of generating and applying force [9].

Fig. 5. TEB local planner In this study, various global planners and local planners are applied to mobile robots, and the movement according to the type of planner is compared and analyzed, and the navigation path according to the parameters is experimented.

4. Simulation

Fig. 3. Base local planner 3.2.2. EBand Local Planner The basic concept of the EBand local planner is illustrated in Fig. 4. EBand searches for a path as it extends to both sides with external force, imitating the elastic behavior of a rubber band, and generates a path by shrinking the inside and reducing the search path by the pulling force. If additional, obstacles are encountered or detected, modify the path to the same principle, including obstacles [7], [8].

Fig. 4. Eband Local Planner

Prior to the actual experiment using the robot, a simulation experiment was conducted to select an appropriate local planner. In the simulation, we observed the navigation of the robot using three different local planners in two situations: with an obstacle and without obstacles. The local planners in evaluation were base local planner, EBand, and TEB local planner. Among the parameters that can customize the behavior of each planner, only those related to the physical limits of the mobile robot were configured and the rest were set to default value.

4.1. Robot Model

In this study, various types of simulations were performed in a 3D environment using Gazebo, the built-in simulation tool available in ROS. All objects in the Gazebo environment are required to be defined in the Unified Robot Description Format (URDF), including the mobile robot, sensors, and obstacles. The mobile robot model is designed using the computer aided designing tool, SolidWorks, as it offers a URDF converter for easier integration to ROS and Gazebo. The 3D model of the mobile robot and the designed simulation environment in Gazebo is shown in Fig. 6. The necessary nodes to utilize the ROS navigation package are created. We used an LRF model based on the Hokuyo URG-04LX-UG01LRF to detect obstacles and analyze the movements of the robot. As mentioned in section 2, ROS provides a node for the LRF called urg_node which acquires sensor data from the hardware and publishes them as the topic called ‘/ scan’. The data flow of the urg_node is shown Articles

63


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

velocity also prevent the oscillations. Since the small oscillation may occur even when the robot moves at the minimum speed of 0.1 m/s, the angular velocity was changed to reduce the x variation of the robot in the Cartesian space. Fig. 6. Modeling in SolidWorks (Left) and simulation environment in Gazebo (Right)

Fig. 7. LRF node flow in Fig. 7 [10].The mobile_node that specifies the kinematics of the mobile robot, facilitates the sensor stream, and processes velocity commands is created for a two-wheel differential drive mobile robot. The the size is configured as 0.58 in width and 0.44 in length. The diameter of the wheels is 240, and the distance between the center of each wheel is 380.

a) without obstacles

4.2. Results of Simulation

64

4.2.1. Base Local Planner Fig. 8 a) shows the results of simulation using the base local planner in an environment without obstacles. It shows the changes in robot position, linear velocity and angular velocity after setting the goal position to x = 3 m, y = 0 m. After the navigation started, the linear velocity was accelerated to 0.5 m/s and the robot moved forward. The linear speed then deaccelerated to 0.1 m/s, which is the minimum operating speed of the robot. After approximately 11 seconds from the start, the angular velocity changed, and the position of the robot in the y axis changed. Because of the changes in the linear velocity and the angular velocity, the mobile robot reached x = 2.98 m, y = 0.9 m near the target position of x = 3 m, y = 0 m, which ended the navigation procedure. Reduction in linear velocity after reaching 0.5 m/s and the changes in angular velocity after 10 seconds is observed. The linear velocity deceleration appears to reduce the velocity at which the robot reaches the target position. If the robot is actuated constantly at a high speed when it reaches the target, an oscillation may occur, violating the acceleration limit. The changes in angular Articles

b) with an obstacle Fig. 8. Base local planner On the other hand, Fig. 8 b) shows the result of simulation with an obstacle. It shows changes in robot position, linear velocity and angular velocity for the same target position. After the navigation started, the robot moved forward to a point 0.5 m away from the obstacle. A deviation occurs to avoid the obstacle and reached the point x = 3.07 m, y = 0.06 m close to the target position without a collision. When approaching the obstacle, we can observe that a slight deceleration occurs when the robot approaches the obstacle to reduce the possibility of collision.


Journal of Automation, Mobile Robotics & Intelligent Systems

4.2.2. EBand Local Planner Fig. 9 a) shows the results of simulation using the EBand local planner in the environment without obstacles. It shows changes in robot position, linear velocity and angular velocity after setting the goal position to x = 3 m, y = 0 m.

VOLUME 12,

N° 3

2018

Fig. 9 b) shows the result of simulation using EBand in the environment with an obstacle. It shows changes in robot position, linear velocity and angular velocity after setting the goal position to x = 3 m, y = 0 m. The robot moved directly towards the obstacle resulting to a collision.

4.2.3. TEB Local Planner

Fig. 10 a) shows the results of simulation using TEB in the environment without obstacles. It shows changes in robot position, linear and angular velocity.

a) without obstacles

a) without obstacles

b) with an obstacle Fig. 9. EBand local planner After the navigation started, the robot started moving forward while its heading angle was slightly shaking. After 6 seconds from the start, the linear velocity of the robot was decelerated and the decelerated velocity was maintained until the robot reached the target position. After reaching x = 3.03, y = 0.00, the robot stopped for a while. And then, an oscillation occured while adjusting the heading angle.

b) with an obstacle Fig. 10. TEB local planner The goal position is set to x = 3 m, y = 0 m. After the navigation started, the linear velocity of the robot accelerated to 0.5 m/s, and the robot moved forward. And then, it stopped at x = 2.89, y = 0. Fig. 10 b) shows the results of simulation using teb local planner in the environment with an obstacle. It shows changes in roArticles

65


Journal of Automation, Mobile Robotics & Intelligent Systems

bot position, linear velocity and angular velocity after setting the goal position to x = 3 m, y = 0 m. In the environment with an obstacle, the TEB planner was not able to compute the velocity of robot on time due to high computation load. As a result, the robot failed to follow the path created by the global planner and could not reach the target. The results of the simulation experiments show that the EBand and TEB local planners both produced exemplary results in tracking a target position in an environment in case of no obstacles. But, base local planner is the only local planner that can reach the goal in the existence of an obstacle, and others are not.

5. Experimental Specifications

The system structure of the experimental testbed including the hardware devices and mobile robots are shown in Fig. 11. We have selected an ARM-based a embedded platform, RPi3, for its portability and low cost in comparison to desktop computers. The latest available development environment is shown in the in Fig. 12. The RPi3 is installed with Ubuntu 16.04 with the Linux kernel version of 4.1.21-v7+. ROS Kinetic Kame was selected as it is the latest stable version available in the ROS repository. In this study, we only tackle the basic features of ROS without considering the strict scheduling deadlines required in robust control of robotic applications. ROS was implemented in a straight forward manner following the user guide in [16]. In more advanced control applications that requires real-time environment for an embedded platform, compatibility of ROS with the other software components is a huge issue that is complex owing to the limited availability of systematic documentations and technical support.

VOLUME 12,

Base local planner in Section 4 is applied to the mobile robot in consideration of the various planners tested through simulation. The performance difference between two mobile robots is shown according to the presence of obstacles. Change the size of the variable of the ‘xy_tolerance’ according to the parameter setting. The performance of the global planner is not significantly different. Therefore, only the local planner is applied to the experiment and compared and analyzed. The specifications of each robot are shown in the following Table 1. Table. 1. Robot Specifications Robot Name

Wheel diameter Distance between wheels Width

Length

Tetra DS IV

Stella B2

380 mm

289 mm

0.58 m

0.41 m

240 mm

0.44 m

Articles

150 mm

0.32 m

In the experiment, we used the navigation stack to move the mobile robot from one point to another. The distance between the starting point and the destination is 3, and the obstacle is installed at the starting point of 1.5 m. Fig. 13 shows two mobile robots and experimental environment. The left side of the figure is Tetra DS IV and the right side is Stella B2.

Fig. 13. The experimental environment

66

2018

6. Experiment & Results

Fig. 11. System Structure to Control Two Mobile Robots

Fig. 12. Software architecture of the main controller

N° 3

Fig. 14. Path of mobile robots


Journal of Automation, Mobile Robotics & Intelligent Systems

Fig. 14 shows the motion of the mobile robot. After setting the target position to , the mobile robot moved forward. And the difference value of xy_tolerance [15] parameter was changed. The smaller the tolerance value is, the closer the target value is reached. Due to the nature of the local planner, the direction of detour is not constant because the calculation is changed every moment at the designated location. You can also see that the Stella B2 has a cleaner line out of the path than the Tetra DS IV. And the closer the shape of the mobile robot is to the circle, the better it follows the planned path.

7. Conclusion and Future Work

Robotics is an important and exciting area of research that requires the integration of various devices and complex software. Even for the embedded system, the various libraries and tools provided by ROS make it easier to access and integrate hardware and software. Moreover, various devices can be connected to ROS through ROS drivers and reduce the complexity of the development of the function by using various software components of ROS. Local planners for SLAM and navigation functions of ROS packages were evaluated both on PC simulation and embedded system for various mobile robots. This study showed the usefulness of robot development through ROS and control of the mobile robot can be implemented more easily and quickly. Finally, we conducted comparative study for the navigation of mobile robots according to planners when using ROS package. Results showed that developer should be more careful about using ROS packages and much effort is needed to get the desired results. Detailed research will be carried out later to obtain better results when using ROS packages.

ACKNOWLEDGEMENT

This study was financially supported by Seoul National University of Science and Technology.

AUTHOR

MinSu Kim – Dept. of Electrical and Information Engineering, Seoul National University of Science and Technology, Nowon-gu, 01811, Seoul, Republic of Korea. E-mail:min50190@seoultech.ac.kr Raimarius Delgado – Dept. of Electrical and Information Engineering, Seoul National University of Science and Technology, Nowon-gu, 01811, Seoul, Republic of Korea. E-mail:raim223@seoultech.ac.kr ByoungWook Choi* – Dept. of Electrical and Information Engineering, Seoul National University of Science and Technology, Nowon-gu, 01811, Seoul, Republic of Korea. E-mail:bwchoi@seoultech.ac.kr *Corresponding author

VOLUME 12,

N° 3

2018

REFERENCES  [1] “ROS.org | About ROS.” http://www.ros.org/ about-ros/.  [2] M. Quigley et al., “ROS: an open-source Robot Operating System,” ICRA workshop on open source software. Vol. 3. No. 3.2. 2009.  [3] Y. S. Pyo, ROS Robot Programming. RubyPaper Press, 2015.  [4] K. Zheng, “ROS Navigation Tuning Guide,” arXiv preprint arXiv:1706.09068, 2017.  [5] S.-Y. Jeong et al., “A Study on ROS Vulnerabilities and Countermeasure,” Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 2017, pp. 147–148. DOI:10.1145/3029798.3038437.  [6] P. Marin-Plaza, A. Hussein, D. Martin, A. de la Escalera, “Global and Local Path Planning Study in a ROS-Based Research Platform for Autonomous Vehicles,” Journal of Advanced Transportation, vol. 2018, 2018, pp. 1–10. DOI:10.1155/2018/6392697.  [7] S. K. Gehrig, F. J. Stein, “Elastic bands to enhance vehicle following,”Intelligent Transportation Systems, 2001. Proceedings. 2001 IEEE. IEEE, 2001, pp. 597–602. DOI:10.1109/ITSC.2001.948727.  [8] S. Quinlan, O. Khatib, “Elastic bands: connecting path planning and control,” Robotics and Automation, 1993. Proceedings, 1993 IEEE International Conference on. IEEE, 1993, pp. 802–807. DOI:10.1109/ROBOT.1993.291936.  [9] M. Keller, F. Hoffmann, C. Hass, T. Bertram, and A. Seewald, “Planning of Optimal Collision Avoidance Trajectories with Timed Elastic Bands,” IFAC Proceedings Volumes, vol. 47, no. 3, 2014, pp. 9822–9827. DOI:10.3182/20140824-6-ZA-1003.01143. [10] M. G. Ocando, N. Certad, S. Alvarado, A. Terrones, “Autonomous 2D SLAM and 3D mapping of an environment using a single 2D LIDAR and ROS,”Robotics Symposium (LARS) and 2017 Brazilian Symposium on Robotics (SBR), 2017 Latin American. IEEE, 2017, pp. 1–6. DOI:10.1109/SBR-LARS-R.2017.8215333. [11] R. Reid, A. Cann, C. Meiklejohn, L. Poli, A. Boeing, and T. Braunl, “Cooperative multi-robot navigation, exploration, mapping and object detection with ROS,” Intelligent Vehicles Symposium (IV), 2013 IEEE. IEEE, 2013, pp. 1083–1088. DOI:10.1109/IVS.2013.6629610. [12] J. M. Santos, D. Portugal, R. P. Rocha, “An evaluation of 2D SLAM techniques available in Robot Operating System,”Safety, Security, and Rescue Robotics (SSRR), 2013 IEEE International Symposium on. IEEE, 2013, pp. 1–6. DOI:10.1109/SSRR.2013.6719348. [13] “navigation/Tutorials/RobotSetup – ROS Wiki.” http://wiki.ros.org/navigation/Tutorials/RobotSetup. [14] M. Lundgren, “Path Tracking for a Miniature Robot,”Masters, Department of Computer Science, University of Umea (2003). p. 9. [15] “base_local_planner – ROS Wiki.” http://wiki. ros.org/base_local_planner?distro=melodic. [16] “kinetic/Installation –ROS Wiki.” http://wiki.ros. org/kinetic/Installation. Articles

67


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

N° 3

2018

A Wavelet Based Watermarking Approach in Concatenated Square Block Image for High Security 26 August 2018; accepted: 15 November 2018 A WAVELET BASED WSubmitted: ATERMARKING APPROACH IN CONCATENATED SQUARE BLOCK IMAGE FOR HIGH SECURITY th

th

B. Sridhar Submitted: 26thAugust B. 2018; accepted: 15th November 2018 Sridhar

DOI: 10.14313/JAMRIS_3-2018/19 Abstract: DOI: 10.14313/JAMRIS_3-2018/20 Watermarking in digital contents has gained the more Abstract: attraction in research community. In this approach Watermarking in digital iscontents hasin gained the more copyright information concealed to the concatenated attraction in research this approach square region of an community. image underInwavelet domain,copyinitially right information concealed an in to the concatenated original image isisundergoing alternative pixel sharing square region of anone image wavelet domain,the initially approach and of under the shares undergo circular original image undergoing an alternative sharcolumn shift isfurther, concatenates those pixel shares. Next, ingsquare approach andisone of theby shares undergo the circular region obtained capturing the half of the row column concatenates those value shift in thefurther, last part of first share and shares. the firstNext, part of square region obtained byacapturing the half of thethe second shareiswhich forms square image. To enrich rowrobustness value in the of first share and the firstis part oflast thepart technique, watermarking under of second share which a square image. To enrich consideration only informs the folded square under wavelet. theFurther, robustness of the technique, watermarking is underthe the reverse process is carried out to generate consideration only in theTofolded under original wavelet.and watermarked image. show square ownership, Further, the reverse process carried outthe to generate the watermarked image haveisundergone same operation watermarked To show ownership, and and acquireimage. the copyright information.original Experimental watermarked imagethat have same operaresults indicate theundergone proposed the approach is robust imagethe processing attacks. tionagainst and acquire copyright information. Experimental results indicate that the proposed approach is robust against image processing attacks. Keywords:authentication,concatenation, copyright, watermarking, wavelet Keywords: authentication, concatenation, copyright, watermarking, wavelet

1. Introduction

In the fast 1. Introduction

68

growing of multimedia communications, the digital transmission of data is In the fast During growingtheof distribution multimedia communicaenormous. unknown can tions, theacquired digital transmission of data is enormous. easily the data and claim ownership. Hence, During the distribution unknown can easily acquired the safeguard of the intellectual property is an theimportant data and claim ownership. theworld safeguard consideration forHence, today’s [1–2]. of the intellectual propertyisis aansuitable important consider-for Digital watermarking approach ation for today’s [1–2].information[3–6]. Digital watermarking protecting the world multimedia Spatial and Transform domain are the the twomultimedia embedding is a suitable approach for protecting approaches which are utilized in the watermarking information [3–6]. Spatial and Transform domain are techniques[7].In spatial domain, ownership the two embedding approaches which are utilized in information into a host image is easily installed the watermarking techniques [7]. In spatial domain,by altering information the pixel into values using in-bit ownership a host directly image is easily substitution. But in transform based approaches stalled by altering the pixel values directly using bit copyright mark embedded onlyapproaches in the transform substitution. But inistransform based copcoefficients. Hence transform based approach yright mark is embedded only in the transform coeffi- is robust and stable [8]. cients. Hence transform based approach is robust and stable [8]. Discrete Wavelet Transform deserves an excellent Discrete Wavelet Transform deserves an excellent property because to their spatial restriction, property because to their spatial restriction, frequency distribution, multi-level of frequenresolution cy characteristics distribution, multi-level of resolutionmore characterisand computationally significant ticsthan and other computationally significant thanusing otherthe transform more methods [9]. By transform methods [9]. By using the sub component filters the DWT of the two dimensional images is obtained by sub sampled the low, middle, and high

subcomponent filters the DWT of the two dimensional images is obtained by sub sampled the low, middle, frequency components. At a low levelAtresolution and highsub frequency sub components. a low level provides the image content where the high frequency resolution provides the image content where the high part contains part edge components. Anycomponents. watermarking frequency contains edge Any technique has to technique be evaluated on the following watermarking hasbased to be evaluated based on features [10]. features [10]. the following Capacity

Invisibility Robustness Fig. 1. Features of watermarking system Fig.1.Features of watermarking system

A copyright concealing scheme is saidis to be robust A copyright concealing scheme said to be it can ableittocan preserve thepreserve secret message under varrobust able to the secret message under various attacks like filtering, compression ious attacks like filtering, compression or cropping.or cropping. A watermarking technique haspropgood A watermarking technique has good invisibility invisibility property if human can unable to notice erty if human can unable to notice the changes in thethe changes in the cover mediumthe after concealing cover medium after concealing watermark. Thethe watermark. The above three requirements carry above three requirements carry a trade-off triangle a shown in Fig.1. If we as trade-off shown intriangle Fig. 1. Ifaswe achieve the two outachieve of threethe two out of three properties, then the third one should properties, then the third one should be traded off. be traded off. The remaining sections are arranged as follows, remaining sections are arranged as follows, relatedThe work involves in section 2. The proposed aprelated work involvesinsection II.Theproposed proach and the experimental results are discussed in approach and the experimental results are section 2 and 4. Finally, conclusion is placed in secdiscussedinsection III and IV. Finally, conclusion is tion 5. placed insection V.

2. Related Works 2. Related Works

The extensive literature gathered and related The extensive literature gathered and related with the performance improvement of with image thewatermarking performance techniques improvement of image inspected watermarkis critically and ingexhibited techniques is critically andsummary exhibitedofinthe in this section. inspected Further, the this section. Further, the summary of theatreview review of literature is also furnished the endofoflit-the erature is also furnished at the end of the review. review. Delaigle et al. [11] addressed a watermarking Delaigleet al.[11] addressed technique on the account of the Human Perception a watermarkingtechnique on the account of the Human System. Initially, m- number of binary sequences were Perception System. Initially, mnumber binary created and combined on a random carrier ofsignal. sequences were created and combined on a random This copyright is treated as the ownership identity, carrier signal. concealed This copyright is treated and it is properly in accordance withasthethe ownership identity, and it is properly contrast between the original image and theconcealed moduinaccordance the contrast between the original lated image. Thewith concealed copyright information is image and the modulated image. The concealed merged with the cover data to generate the watercopyright information is merged with the cover data marked image. These technique is robust to noise attacks, JPEG coding and rescanning. Chen Yongquiang et al. in 2009 [12] demonstrated a transform domain based watermarking approaches on color image to fulfill the features of


Journal of Automation, Mobile Robotics & Intelligent Systems

VOLUME 12,

watermarking like, security, intangibility and robustness. In this approach, a 2D chaotic stream encryption technique is adopted to scrambled a gray watermark. In order to enhance the imperceptibility properties of the watermarked image, Genetic algorithm is adopted to conceal the watermark data into the original color image. Yingkun Hou and Chunxia Zhao in 2010 [13] addressed the semi sub sampled wavelet transform (SSWT) based watermarking techniques. It comprises two section non sub sampled tight frame transform and the difficulty sampled wavelet transform (WT). Concealing the ownership information into the approximation level of SSWT, the imperceptibility and robustness of watermarking technique can be worthfully enhanced by compare with previous watermarking schemes. Sridhar and Arun (2012) [14] addressed a wavelet based multiple image watermarking techniques, in this approach the original gray image is sectioned into odd and even rows of images, further remove the zero rows in the respective shares. Watermark is implant in to deinterlace images under wavelet domain. After watermarking merge the two watermarked images in to single image by presenting some zero rows on the two watermarked images. Results achieved better PSNR value and it is robust against many geometrical attacks. In order to enhance the security of the system, the proposed technique is address to implant the watermark information into the concatenated square region of an image. Hence the degree of authentication is high.

3. Proposed Approach

Our proposed scheme adds the watermark into the concatenated middle square region of two shares. Initially original gray image [ A]mxn is subject to the alternative pixels sharing such as and [ A′]mxn and [ A′′]mxn . Circular column shift is employed under this equation 1.

 A′′(i , j + 1), if Ac (i , j ) =  if  A′′(i ,1),

j =1 to n − 1 j =n

(1)

At the end horizontal concatenation is experic ′ | Amxn enced to these shares C = [ Amxn ]mx 2n as shown in equation 2.

 a11  0 C =  a31   0

0 a22 0 a42

a13 0 a33 0

0 a24 0 a44

a14 0 a34 0

0 a21 0 a41

a12 0 a32 0

0   a23  (2) 0   a43 

Further, square region is obtained by capturing the half of the row value in the last part of first share and half of the row value in the first part of second share which forms a square image. This marking region is obtained by using the equation 3. Folding process can be carried out before conceal the copyright information in equation 4 and 5.

N° 3

2018

 n m  n   m  D C  1 : m,   −   + 1 :   +    = 2  2   2   2  

(3)

f (i, = j ) D(i, j ) + D( m + 1 − i, j ) a f (i, j ) =  13  a33

 X  hl  f2  f hl  1hl  f1

f2lh f2hh f1hl f1hl

a44 a24

a14 a34

a41   a21 

X = f2ll + α × w f1lh f1lh f1hh f1hh

(5)

f1lh   f1lh  IDWT  →fW hh  f1  f1hh 

Image Sharing  1. for i from 1 to (Rows/2) do  2. for j from 1to Columns do  3. if (i mod 2)! = 0 && ( j mod 2)==0; (or)  4. if (i mod 2)= = 0 && ( j mod 2)!=0;then  5. Move the row into (Rows+1-i)  6. else  7. endif  8. endif  9. end for 10. end for

(4)

(6)

(7)

     n   m  n m = [H ]mx 2n  C  1 : m,1 :   −    | f W | C  1 : m,   +   + 1 : n   (8)  2   2  2  2     

  n  [Fws ]mxn = H  1 : m,1 :     2  

 n  c [S ws ]mxn = H  1 : m,   : n  2  

 S c (i , n), if S ws (i , j ) =  c ws S ( i , j 1), if −  ws

(9) (10)

j =1 (11) j= 2 to n

W = (i , j ) Fws (i , j ) + S ws (i , j ) (12)

Watermarking process is carried out only in the folded region in equation 6. Where α is the scaling factor, which reduces the weight of the watermark. Inverse transform is enabled by using equation 7. Unfold the folding image by using above algorithm. From equation 8 to 12 explains the steps to obtain the watermarked image. Figure 2 shows the procedure of proposed approach. In the extraction side again do the same operation of watermarked image and original image and make the difference to release the ownership. Articles

69


the folded region in equation 6. Where α is scaling of watermarked image.Figure 2 shows thethe procedure proposed approach. the extraction side again do factor, which reduces the In weight of the watermark. Journal of Automation, Mobile & Intelligent Systems same operation ofRobotics watermarked image 7. and Inverse the transform is enabled by using equation original image and make the difference to release Unfold the folding image by using above algorithm.the ownership. From equation 8 to 12 explains the steps to obtain the watermarkedAlternative image.Figure 2 shows of Alternative Original the procedure Pixels Share1 In the extraction Image proposed approach. side Pixels againShare2 do the same operation of watermarked image and original image and make the difference to release the Column Shift ownership. Alternative Pixels Share1

Original Concatenated Image Image

Crop the square region

watermark information like rice image of size 256×256 is utilized.Figure 3 displays the original cover image, 12, N° 3 2018 concatenated image, squareVOLUME image, folded image, watermark image, and watermarked cover image.

(a)

Alternative Pixels Share2

(a) (b)

Column Shift

Folding an Concatenated Image image DWT

Crop the square region ∑

(b)

(c)

(d)

α xWatermark Image

Folding an IDWT image Unfold the square

DWT region

(c)

(e)

Remove Concatenation &α xWatermark ∑ Column shift of one share Image

(f)

Combined Shares IDWT

Unfold Watermarked the square Image region

Fig. 2. Proposed image watermarking scheme Fig.2.Proposed Image watermarking scheme Remove Concatenation & Column shift of one share Experimental Results

4. 4. Experimental Results

In this experiment MATLAB software is utilized. A standard image like MATLAB the cameraman of size In this experiment softwareimage is utilized.A Combined Shares standard like the cameraman size 512×512 image is consider as a cover imageimage and tooffetch the watermark information like rice image of size 256×256 is Watermarked utilized. Figure Image3 displays the original cover image, concatenated image, square image, folded image, watermark image, and watermarked cover image. Fig.2.Proposed Image watermarking scheme

4.1. Invisibility Test

The excellence of this 4. Experimental Results

approach is to genearate the invisibility of the watermark. To justify the imperIn thisceptibility, experiment software is utilized.A MeanMATLAB Square Error (MSE) and Peak Signal standard toimage like the cameraman image of size Noise Ratio (PSNR) are the two vital parameters. The MSE is the cumulative squared error between the original image O(i,j) and the watermarked image W(i, j). The average MSE of this proposed method is 0.0849. PSNR is employed to calculate the quality of the watermarked image. PSNR of our proposed approach is 46.5498 dB. Equation 13 and 14 shows the formula of MSE and PSNR. 70

Articles

(d)

(g) Fig. 3 (a) Original Image; (b) Concatenated shares; (c) Square image; (d) Folded square image;(e) Watermark image, (f) Watermarked image &(g) Extracted (e) (f) Watermark 4.1 Invisibility Test The excellence of this approach is to genearate the invisibility of the watermark. To justify the imperceptibility, Mean Square Error(MSE) and Peak

(g)

Fig. 3. (a) Original image; (b) Concatenated

Fig. 3 (a)(c)Original Image;(d) (b)Folded Concatenated shares; (c) shares; Square image; square image; (e) Watermark image, (f) Watermarked image,Watermark Square image; (d) Folded square image;(e) (g) Extracted image, (f)Watermark Watermarked image &(g) Extracted Watermark

( )

4.1 Invisibility Test

x −1 y −1

1 = MSE (O(i , j ) − W (i , j ))2 (13) ∑∑ The excellence of xy this approach is to genearate the =i 0=j 0

invisibility of the watermark. To justify the imperceptibility, Mean Square Error(MSE) and Peak

 2552  PSNR = 10log10    MSE 

(14)

The correlation co-efficient is another measure used to estimate the robustness of the watermarking algorithm against the possible attacks.


Journal of Automation, Mobile Robotics & Intelligent Systems

r=

∑∑(O x

y

xy

VOLUME 12,

 2  2  ∑∑(Oxy − O )  ∑∑(E xy − E )   x y  x y 

(15)

4.2. Attacks in Image Watermarking

Most common attacks of image watermarking are noise attacks, geometrical attacks and filtering attacks. In our method, different noises like Salt & Pepper, Speckle, Gaussian and Poisson with default noise density are introduced in watermarked image with default noise density and measure the robustness of the system. Table 1. PSNR values of the proposed approach with different bands LL

LH

HL

PSNR(dB)

PSNR(dB)

PSNR(dB)

0.01

46.5498

46.0187

46.0185

0.03

46.4752

45.8608

0.05

46.3951

45.5642

0.02

46.5198

0.04

46.4467

0.06

46.3608

0.07

46.3025

0.08

46.2641

0.09 0.1

46.1971

50

46.1369

45.9651

45.9651

45.7398

45.7492

45.3861 45.1488 44.9334 44.6603

44.4027

2018

Table 2. Performance values of the proposed approach

− O )(E xy − E )

Correlation coefficient value may be one or zero based on the watermarked and original images is identical or not. Equation 15 shows the formula to generate the correlation coefficient between two images. Similarity value between the watermarked and original image is 0.9594.

Scaling Factor

N° 3

Watermarked Image

Watermark Image

PSNR (dB)

PSNR (dB)

No Attacks

46.5498

33.0321

Speckle

43.9021

27.0934

Attacks

Salt & Pepper

38.4125

Gaussian

26.2347

35.0126

Poisson

Median Filtering

Low Pass Filtering

25.9093

31.2262

26.4957

35.0126

26.6457

40.3973

23.2759

Also some additional evaluation median filtering and low pass filtering also attack to the watermarked image and measure the performance of the algorithm. Table 1 and 2 shows the PSNR values and performance of an algorithm with different attacks. Figure 4 displays the comparison sketch between the wavelet and Singular Value Decomposition (SVD). It is finding that wavelet based approach gained the benefits of high PSNR than SVD. Also proposed approach registered more PSNR value 46.5498 dB than Pinki Tanwar, et al. 37.676 dB.

5. Conclusion

In this paper, the optimal and robust region marking for high security is presented. Here, concealed the information only in the folded square region of concatenated shares under wavelet transform. Hence the robustness of this system deserves high. PSNR and Correlation coefficient of this method are 46.5498 dB and 0.9594. Also,this proposed approach is simple, efficient and with less complexity. In future the enhancement of this algorithm will be extended to the video.

45.8627 45.5748 45.3983 45.1680

AUTHOR

44.9585

B. Sridhar – Department of Electronics and Communication Engineering, MLR Institute of Technology, Hyderabad, INDIA-500043, email: sridharbece@gmail.com.

44.6929

44.4363

DWT SVD

48 46

PSNR(dB)

44 42 40 38 36 34 32 30 28 0.01

0.02

0.03

0.04

0.05

0.06

Scaling factor

0.07

0.08

0.09

0.1

Fig. 4. Comparison sketch between DWT and SVD Articles

71


Journal of Automation, Mobile Robotics & Intelligent Systems

REFERENCES

72

[1] Qiao Li, and I.J.Cox, “Using perceptual models to improve fidelity and provide resistance to valumetric scaling for quantization index modulation watermarking”, IEEE Transaction on Information Forensics and Security, vol. 2, no. 2, 2007, 127–139.  [2] Alessandro Piva, Tiziano Bianchi, Alessia De Rosa, “Secure Client-Side ST-DM Watermark Embedding”, IEEE Transactions on Information Forensics and Security, vol. 5, no. 1, 2010, 13–26.  [3] A. Piva, F. Bartolini, M. Barni, “Managing copyright in open networks”, IEEE Transactions on Internet Computing, vol. 6, no. 3, 2002, 18–26.  [4] B.Sridhar, C. Arun, “An Enhanced Approach in Video Watermarking with Multiple Watermarks Using Wavelet”, Journal of Communications Technology and Electronics, Springer, vol. 61, no. 2, 2016, 165–175.  [5] M. Vidyasagar, M. Potdar, Song Han, Elizabeth Chang, “A Survey of Digital Image Watermarking Techniques”, 3rd International Conference on Industrial Informatics (INDIN), 2005, 709–713.  [6] Yiwei Wang, John F. Doherty, Robert E. Van Dyck, “A Wavelet-Based Watermarking Algorithm for Ownership Verification of Digital Image”, IEEE Transactions on Image Processing, vol. 11, no. 2, 2002, 77–88.  [7] Frank Y. Shih, Scott Y.T. Wu, “Combinational image watermarking in the spatial and frequency domains”, Pattern Recognition, vol. 36, 2003, 969–975.  [8] M.A. Suhail, M.S. Obaidat, “Digital watermarking-based DCT and JPEG model”, IEEE Transaction on Instrumentation and Measurement, vol. 52, no. 5, 2003, 1640–1647.  [9] Mehdi Khalili, “A Novel Secure, Imperceptible and Robust CDMA Digital Image Watermarking in Jpeg-Ycbcr Channel Using DWT2”, International Journal of Enterprise Computing and Business Systems, vol. 1, no. 2, 2011. [10] C.H. Huang, J.L. Wu, “Attacking Visible Watermarking Schemes”, IEEE Transactions on Multimedia, vol. 6, no. 1, 2004, 16–30. [11] J. Delaigle, C. de Vleeschouwer, B. Macq, “Psycho visual Approach to Digital Picture Watermarking”, Journal of Electronic Imaging, vol. 7, no. 3, 1998, 628–640. [12] Chen Yongqiang, Zhang Yanqing, Peng Lisen, “A Novel Optimal Color Image Watermarking Scheme”, 3rd International Conference on Genetic and Evolutionary Computing, 2009, 121–124. [13] Yingkun Hou, Mingxia Liu, Zhengli Zhu, Deyun Yang, “Semisubsampled Wavelet Transform Based Image Watermarking with Strong Robustness to Rotation Attacks”, Journal of Multimedia, vol. 5, no. 4, 2010, 385–392. [14] B. Sridhar, C. Arun, “On Secure Multiple Image Watermarking Techniques using DWT”, 3rd International Conference on Computing Communication & Networking Technologies (ICCCNT), 2012, 1–4. Articles

VOLUME 12,

N° 3

2018

[15] Pinki Tanwar and Manisha Khurana, “Improved PSNR and NC in Digital Image Watermarking Using RDWT and SVD”, International Journal of Advanced Research in Computer Science and Software Engineering Research, vol. 6, no. 5, 2016, 955–959.


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.